Various programming paradigms exist these days – machine code, procedural programming, object-oriented programming, and functional programming. One such interesting paradigm is the case of object-oriented programming, which proposed a near real-world representation of data through classes, objects and various real-world features like inheritance, encapsulation, and polymorphism. When it comes to technology, learning from nature and adapting approaches close to real-world simulations has always led to tremendous success.
Storing data for long term use also has a similar segmentation, starting out with simple file based storage mechanisms and evolving to relational databases and the newer databases called NoSQL databases. The growth of NoSQL databases has led to some interesting use-cases such as native graph and time-series storage mechanisms.
Graphs are everywhere
Graphs are hidden in data models everywhere we look from social networks and the world-wide-web to genealogy and patient journeys. While their structure may be hidden from users, we have been using them for solving various problems. For instance, the shortest path algorithms that find paths between two points; the traveling salesman problems that optimize resources, these are all problems that are now being tackled through graph databases.
But beyond “backend” tinkering, graphs are now moving out of the shadows and taking center stage in a lot of applications. This piece covers the evolution of data storage technologies leading to graph data storage we use today.
Graph concepts have been in existence since the beginning of programming and problem solving, but this has always stayed on a spectrum of complex data structures inaccessible or hard to use for the longest time with very specific applications.
Beginning - Circa 2006
Graphs as a technology took off at a larger scale when Tim Bernes-Lee conceptualized a huge database of information around 2006 called the Linked-Data. This was the notion of connected information from websites using Resource Description Framework (RDF) formats. This was the basis of graph storage. Similar to object-oriented programming, this structure was able to show how people, organizations and various other entities are linked to each other and the nature of their relationships
From this point, various data points came into formed a web of linked data from various websites. Schema.org, Google structured data all enabled developers publish structured data on webpages. Though slow to take off, over the last 13 years graph databases have been applied to a variety of applications.
A turning point in showing the importance of applications from Linked Data or Graph linked data was DBpedia. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets. This became one of the largest databases of its time, allowing users to gather linked information in an easy to use format through languages like SPARQL.
Alongside came a commercial-community driven data set called Freebase, which boosted awareness of having real-world data in a linked fashion.
The same year saw a graph database that disrupts storage of graphs - neo4j. This was not an overnight product. Neo4j was developed as a relational database approach first that then transformed to a graph solution.
Google’s acquisition of Metaweb (the creators of Freebase) is a key point in making graphs mainstream. Google announced that they are going to use the data sets to power “Knowledge Graph”. This is the box on the right of your Google searches that gives additional context and links to your search queries.
Google Knowledge Graph is a silent piece of technology that has touched everyone who uses Google search. Powered by a massive graph it helps users get information about people, places, books, events, etc., that are connected to each other in one way or the other. Every quick result summary and fact sheet you see are all powered by this Knowledge Graph that helps users to take decisions quickly instead of reading through all the search results.
Facebook launched Facebook Graph Search. Predictably this leads to concerns around privacy and personal data security. The graph was so powerful that it enabled people to ask very complex questions at many degrees leading to exposure of data to unauthorized people. Facebook soon clamped down the features on the graph search to fix privacy issues. But the controversy also made people aware of the power this type of data storage mechanics. Hidden relationships can be gleaned, insights that aren’t available via traditional databases.
Crunchbase launches Crunchbase 2.0 with neo4j.
Whereas CrunchBase 1.0 was all about the data, CrunchBase 2.0 was about connections. As Jean Villedieu, founder of startup Linkurious (a Neo4j visualization partner) states in his blog on Medium(also a Neo4j customer), the Business Graph has big implications. It’s the business world’s answer to Facebook’s Social Graph, which as we know has fundamentally changed the way we interact online.
Through mapping every member, company, job, and school, we’re able to spot trends like talent migration, hiring rates, and in-demand skills by region. These insights help us connect people to economic opportunity in new ways. And by partnering with governments and organizations around the world, we help them better connect people to opportunities.
Microsoft acquires LinkedIn. With around 600m professionals on the network generating a massive economic graph, this acquisition yet again proves the validity of the power of graphs.
Larger organizations start to adopt Graph and specifically Knowledge Graphs. Thomson Reuters announces their knowledge graph. Powering financial services markets with Graph-based data becomes mainstream with this launch.
Over the last few years, technology has evolved in terms of storage of Graph data. This has helped business leverage real-world relationships the way it should be. But it would be an understatement to say that Graph for business has only evolved based on the technologies that are mentioned in the article. Technologies like Tinkerpop, NetworkX have also boosted the growth of how we consume large scale datasets. Querying the graph has significantly been improved with technologies like Gremlin, GraphQL, Cypher etc. Various visualization technologies that are specifically aimed at graphs have also evolved and become mainstream, Gephi, Cytoscape, d3js to name a few.
Over the next couple of years, we will see the marriage of ML/AI with graph technologies boosting the Knowledge Graph market to a new level which will help business in ways we have not imagined before.
The post is part 1 from a series of articles about Graph-based data, databases, and algorithms.
Next – Part 2: Why healthcare data is prime for Graph
At Compile.com, we deal with a variety of datasets both big and small. Often, there is a need to run analysis on top of 3rd party datasets that we haven’t ingested to see if it’s worth the effort. This particular …
Tech companies are getting serious about their intention to remake health care by leveraging AI and machine learning. But instead of the “big bang” change, perhaps this disruption will occur in small steps.