Graph databases are a kind of database that uses graph structure for semantic queries on nodes and edges, as well as properties to model and persist data. Such databases excel at querying data that is related to each other via a long chain of connections. In traditional relational SQL databases, this would be typically modelled as a large number of tables. Here at Cycode, we were searching for a graph database to organize our data and to enable complex detections and insights. We needed these capabilities for our own internal use and also as a solution for giving our customers a flexible way to query their data.
In the process, we experimented with several graph databases, until we found the one that fit us the most. This blog post will compare the four graph databases that we tried: AWS Neptune, Neo4J, ArangoDB and RedisGraph, and explain why we eventually chose ArangoDB for our needs.
But first, let’s understand what a Graph Database is.
What is a Graph Database?
A graph database (sometimes referred to as GDB or GraphDB) is a database that uses graph structures to represent and store data, enabling semantic queries of the data points. The graph is the element that links together the data in “relationships”, and then retrieves them, from nodes, edges, and properties.
Here’s what it looks like:
Graph databases are not a new concept. In fact, they were already introduced 20 years ago, in the early 2000s. However, they recently gained traction because they provide a good solution for use cases such as social networking, recommendation systems and fraud detection. Another strong graph use case is a Knowledge Graph. Knowledge Graph enables storing data in a graph model and using graph queries to intuitively explore highly connected datasets. For Cycode, a relevant use case is the need to visualize and understand the relations between an organization’s source code management systems, build systems and large numbers of cloud resources that make up their development lifecycle.
Graph databases are designed to effectively query many connected data points, unlike the “classic” relational SQL database where joining multiple tables can incur a significant performance penalty. This makes them more flexible and effective to use in such cases.
Why We Chose to Use a Graph Database
When looking at the challenges of securing the development lifecycle, it was clear to us that there was no “one-size-fits-all” solution. Organizations use different tools and processes that define their development lifecycle. Therefore, we needed a solution that would allow us to model all of the lifecycle’s parts (such as SCMs, build system and cloud environments), define the initial building blocks, and then allow our customers to easily fine-tune it to exactly fit their needs.