What is a graph database, and what problems is it well suited for?

6 minadvancednosqlgraph-databasedatabase-choice

Quick Answer

A graph database stores data as nodes (entities) and edges (relationships between them), with both nodes and edges able to carry properties, and is purpose-built for efficiently traversing and querying connections — especially multi-hop, variable-depth paths that are expensive to express as repeated relational joins or recursive CTEs. It excels at social networks, recommendation engines, fraud detection, and knowledge graphs.

Detailed Answer

The core idea

Nodes represent entities, edges represent relationships between them, and — critically — both can carry their own properties. Traversing from one node to its neighbors (and their neighbors, and so on) is the fundamental, highly optimized operation, unlike a relational database where each additional "hop" typically means another JOIN.

// Cypher (Neo4j's query language)
CREATE (alice:Person {name: 'Alice'})-[:FOLLOWS]->(bob:Person {name: 'Bob'})
CREATE (bob)-[:FOLLOWS]->(carol:Person {name: 'Carol'})

// Find everyone Alice can reach within 2 hops of "FOLLOWS"
MATCH (a:Person {name: 'Alice'})-[:FOLLOWS*1..2]->(reachable)
RETURN reachable.name;

Why relational databases struggle with this class of query

Modeling the same "who can Alice reach within N hops" question relationally requires either a recursive CTE (see that question) — which works, but re-executes a join per level of recursion and can get slow as depth or fan-out grows — or, for genuinely deep/variable-depth traversal (unknown number of hops, needed at low latency), doesn't scale well at all. Graph databases store adjacency information (which nodes connect to which) in a form optimized for direct traversal — often literally following in-memory pointers between connected nodes — rather than re-computing joins from scratch on every query.

Well-suited use cases

  • Social networks — friend-of-friend suggestions, mutual connections, degrees of separation.
  • Recommendation engines — "customers who bought X also bought Y," especially multi-hop recommendations ("people similar to you, who liked things similar to what you liked").
  • Fraud detection — tracing chains of transactions/accounts to detect rings of related fraudulent activity that would be invisible looking at any single transaction in isolation.
  • Knowledge graphs — modeling richly interconnected facts (e.g., "this drug interacts with this condition, which is treated by this other drug...") where the relationships themselves are as important as the entities.
  • Network/IT infrastructure mapping — dependency graphs between services, where "what breaks if this node goes down" is fundamentally a graph-traversal question.

Where a graph database is the wrong tool

Simple, mostly-tabular data with few or shallow relationships gains little from a graph model and loses the mature tooling, familiar query language, and broad ecosystem support relational (or even document) databases offer. Graph databases are a specialized tool for a specific shape of problem — heavily interconnected data queried primarily via traversal/path-finding — not a general-purpose replacement for relational modeling.

Being able to identify why a recursive CTE or repeated self-joins become painful at scale, and articulating that a graph database's storage/traversal model directly targets that pain point, shows a level of understanding beyond just naming Neo4j as "the graph database."