What is a graph database, and what problems is it well suited for?

Detailed Answer

The core idea

Nodes represent entities, edges represent relationships between them, and — critically — both can carry their own properties. Traversing from one node to its neighbors (and their neighbors, and so on) is the fundamental, highly optimized operation, unlike a relational database where each additional "hop" typically means another JOIN.

// Cypher (Neo4j's query language)
CREATE (alice:Person {name: 'Alice'})-[:FOLLOWS]->(bob:Person {name: 'Bob'})
CREATE (bob)-[:FOLLOWS]->(carol:Person {name: 'Carol'})

// Find everyone Alice can reach within 2 hops of "FOLLOWS"
MATCH (a:Person {name: 'Alice'})-[:FOLLOWS*1..2]->(reachable)
RETURN reachable.name;

Why relational databases struggle with this class of query

Modeling the same "who can Alice reach within N hops" question relationally requires either a recursive CTE (see that question) — which works, but re-executes a join per level of recursion and can get slow as depth or fan-out grows — or, for genuinely deep/variable-depth traversal (unknown number of hops, needed at low latency), doesn't scale well at all. Graph databases store adjacency information (which nodes connect to which) in a form optimized for direct traversal — often literally following in-memory pointers between connected nodes — rather than re-computing joins from scratch on every query.

Well-suited use cases

Social networks — friend-of-friend suggestions, mutual connections, degrees of separation.
Recommendation engines — "customers who bought X also bought Y," especially multi-hop recommendations ("people similar to you, who liked things similar to what you liked").
Fraud detection — tracing chains of transactions/accounts to detect rings of related fraudulent activity that would be invisible looking at any single transaction in isolation.
Knowledge graphs — modeling richly interconnected facts (e.g., "this drug interacts with this condition, which is treated by this other drug...") where the relationships themselves are as important as the entities.
Network/IT infrastructure mapping — dependency graphs between services, where "what breaks if this node goes down" is fundamentally a graph-traversal question.

Where a graph database is the wrong tool

Simple, mostly-tabular data with few or shallow relationships gains little from a graph model and loses the mature tooling, familiar query language, and broad ecosystem support relational (or even document) databases offer. Graph databases are a specialized tool for a specific shape of problem — heavily interconnected data queried primarily via traversal/path-finding — not a general-purpose replacement for relational modeling.

Being able to identify why a recursive CTE or repeated self-joins become painful at scale, and articulating that a graph database's storage/traversal model directly targets that pain point, shows a level of understanding beyond just naming Neo4j as "the graph database."

What is a graph database, and what problems is it well suited for?

Quick Answer

Detailed Answer

The core idea

Why relational databases struggle with this class of query

Well-suited use cases

Where a graph database is the wrong tool

Related Resources

Related Questions