How do you decide between a relational database and a NoSQL database for a new project?

Start from the data's actual shape and the application's real consistency/query needs, not from technology trends: relationships that need strong consistency and ad-hoc querying favor relational; highly variable or nested data, extreme write scale, or a query pattern that's almost always "fetch one whole object" favor NoSQL. Most real systems end up polyglot — a relational core for transactional data, with NoSQL stores layered in for the specific workloads that genuinely benefit from them.

Compare PostgreSQL, MySQL, and SQL Server — what are the meaningful differences?

**PostgreSQL** is open-source, standards-compliant, and known for its extensibility (custom types, extensions like PostGIS/pgvector) and advanced features (transactional DDL, sophisticated indexing options). **MySQL** is open-source, historically simpler and very widely deployed (especially in web/LAMP-stack contexts), with InnoDB as its default storage engine. **SQL Server** is Microsoft's commercial offering, deeply integrated with the Windows/.NET ecosystem and enterprise tooling, with strong built-in BI/reporting/analytics features. All three are mature, fully capable relational databases for the vast majority of workloads — the differences that matter most in practice are ecosystem fit, licensing cost, and specific advanced features, not raw core SQL capability.

Tell me about a time you had to optimize a slow-performing query or database. What was your approach?

A strong answer follows a clear, methodical narrative: how the problem was noticed (a specific symptom — a slow dashboard, a timeout, a customer complaint), how the actual bottleneck was diagnosed (using `EXPLAIN ANALYZE` or equivalent, not guessing), what the root cause turned out to be, what specific change was made, and how the fix was verified and its impact measured. Interviewers are listening for a diagnostic process, not just a final answer.

How do you approach designing a schema for a new feature from scratch?

Start from the actual entities and relationships in the business domain (not from what tables you imagine you'll need), model cardinality and constraints explicitly, normalize to a sensible baseline (usually 3NF) before considering any denormalization, and validate the design against the concrete queries the feature will actually need to run — a schema that looks clean on paper but can't efficiently answer the real query patterns isn't done yet.

What questions would you ask before choosing a database technology for a new system?

Ask about the data's shape and relationships, expected scale (both current and projected growth) for reads/writes/storage, consistency requirements (what's the real cost of stale or lost data for this specific data), the team's existing operational expertise, budget/licensing constraints, and any compliance/regulatory requirements. The goal is gathering enough concrete information to make a reasoned choice, rather than defaulting to whatever's most familiar or currently trendy.

General, Behavioral, and Database Choice

Judgment questions about picking the right database technology and communicating tradeoffs.

Difficulty

Open as page

This question is really asking whether you reason from first principles about data and access patterns, or default to whatever's currently fashionable. A strong answer walks through concrete decision criteria rather than a blanket preference.

Questions worth asking before choosing

How relational is the data, really? Genuine many-to-many relationships queried from multiple angles favor a relational model with real joins; data that's naturally self-contained per record (a user profile with nested preferences) favors a document model.
What consistency guarantee does the business logic actually need? Financial balances, inventory that must never oversell, anything where "briefly stale" causes real harm — favors relational (or a NoSQL system explicitly configured for strong consistency). Social counters, activity feeds, caches — tolerate eventual consistency fine.
How variable is the schema across records? A fixed, well-understood structure fits relational tables comfortably; wildly varying attributes per record (a multi-category product catalog) fit documents more naturally.
What's the expected write scale and growth trajectory? Genuinely massive, horizontally-distributed write volume (far beyond what a scaled-up relational primary plus read replicas can handle) points toward systems built for it from the ground up (Cassandra-style wide-column, or a relational system with mature sharding tooling like Vitess/Citus).
What does the team already know, and what's the operational maturity of the tooling around each option? A team deeply experienced with PostgreSQL, with mature migration/backup/monitoring tooling already in place, pays a real (and often underestimated) cost adopting an unfamiliar NoSQL system for marginal technical benefit — operational familiarity is a legitimate factor, not just a technical purity question.

Why "it depends" is genuinely the right answer, done well

This isn't hedging — it's recognizing that "relational vs. NoSQL" is really a bundle of several independent decisions (consistency model, schema flexibility, query pattern, scale characteristics), and different parts of the same system can reasonably land on different answers. A candidate who says "always use Postgres" or "NoSQL scales better so use that" without qualification is skipping the actual analysis; a candidate who walks through 2-3 concrete factors and reaches a specific recommendation for the scenario at hand is demonstrating real judgment.

A worked example

"For an e-commerce platform: orders, payments, and inventory go in a relational database — they need strong consistency (an order total must be right, inventory must not oversell) and benefit from real joins for reporting. But the product catalog, with wildly different attributes per category (books have authors/ISBNs, electronics have wattage/warranty terms), fits a document store better. Session data and shopping carts fit a key-value store like Redis — fast, ephemeral, no need for durability guarantees as strong as the order data." This kind of concrete, per-domain reasoning is what distinguishes a strong answer from a generic one.

Related Resources

MongoDB: SQL vs NoSQL

Open as page

PostgreSQL

Licensing: fully open-source (PostgreSQL License, a permissive license), no per-core/per-instance licensing cost.
Standards compliance and features: widely regarded as closely following the SQL standard, with advanced capabilities like transactional DDL (schema changes can be rolled back — see the DDL/DML question), rich indexing options (GiST, GIN, BRIN, partial and expression indexes), native JSON/JSONB support with indexing, and a strong extension ecosystem (PostGIS for geospatial data, pgvector for vector similarity search, TimescaleDB for time-series).
Concurrency model: MVCC (see that question), generally praised for read-heavy workload performance and its handling of complex queries.
Where it's chosen: teams wanting maximum standards compliance, extensibility, and avoiding licensing costs; increasingly the default choice for new greenfield projects across many industries, including where advanced data types (geospatial, vector embeddings for AI workloads) matter.

MySQL

Licensing: open-source (GPL), with a commercial/enterprise edition (MySQL Enterprise) offering additional support and features; owned by Oracle.
History and ecosystem: extremely widely deployed, historically the "M" in the LAMP stack, with enormous community familiarity, hosting-provider support, and tooling maturity for typical web application workloads.
Storage engines: pluggable storage engine architecture, with InnoDB (ACID-compliant, row-level locking, the modern default) as the standard choice; the older MyISAM engine (table-level locking, no transactions) still exists but is rarely the right choice for new work.
Where it's chosen: teams prioritizing broad ecosystem familiarity/tooling, particularly common in traditional web application stacks (WordPress, many PHP/web frameworks default to MySQL-compatible databases).

SQL Server

Licensing: commercial, with per-core or per-server licensing costs (a free Express edition exists with resource limits); owned by Microsoft.
Ecosystem integration: deep integration with the Windows Server and .NET ecosystems, Active Directory authentication, and Microsoft's broader enterprise/BI tooling (SQL Server Reporting Services, SQL Server Integration Services, Power BI).
Tooling: SQL Server Management Studio (SSMS) is a mature, widely-used GUI management tool; strong built-in support for complex analytical/BI workloads out of the box.
Where it's chosen: organizations already standardized on Microsoft/.NET infrastructure, or with existing enterprise licensing agreements and BI tooling investments that make the ecosystem fit decisive.

What's mostly similar across all three

Core relational SQL capability — joins, transactions, indexing fundamentals, window functions (all three now support them), stored procedures — is broadly comparable across all three for the overwhelming majority of application workloads. The meaningful differentiators for a real decision are usually: licensing cost model, existing team/organizational familiarity, specific advanced feature needs (PostgreSQL's extensibility, SQL Server's BI integration), and hosting/cloud-provider ecosystem fit — not "which one is fundamentally more capable at SQL," since that gap has narrowed significantly over the past decade.

Related Resources

PostgreSQL vs MySQL Comparison

Open as page

This is a behavioral question with a technical spine — the interviewer wants to hear a real, specific story with a clear diagnostic arc, not a generic list of optimization techniques recited in the abstract.

A strong structure for the answer (STAR-shaped, technically grounded)

Situation: What was the concrete symptom? "A customer-facing dashboard was timing out for accounts with a large order history" is much stronger than "a query was slow" — specificity signals a real memory, not a fabricated generic story.

Task: What was actually asked of you, and why did it matter? (e.g., "this was blocking a major customer's renewal, so it had real urgency.")

Action — this is where the technical depth should show:

How did you first investigate? ("I pulled the slow query log / ran EXPLAIN ANALYZE on the actual query with representative parameters, rather than guessing.")
What did the execution plan actually reveal? ("The plan showed a sequential scan on a 40-million row table, and the estimated vs. actual row counts were wildly different, which pointed at stale statistics as well as a genuinely missing index.")
What was the root cause, specifically? (a missing index, a non-sargable predicate, an N+1 pattern, a join fanning out unexpectedly — see the relevant topics in this stack for the vocabulary to describe this precisely.)
What did you actually change, and why that fix specifically (as opposed to some other plausible option)?

Result: What was the measured outcome? ("Query time dropped from 12 seconds to 80ms; we verified with EXPLAIN ANALYZE again post-fix, and confirmed via monitoring that the dashboard's p99 latency improved correspondingly over the following week.") Concrete before/after numbers are far more convincing than "it got a lot faster."

What separates a strong answer from a weak one

Weak: "I added an index and it got faster." (No diagnostic process, no reasoning, sounds rehearsed/generic.)
Strong: names the actual tool used to diagnose (EXPLAIN ANALYZE, a slow query log, an APM trace), describes what the evidence actually showed, explains the reasoning connecting the evidence to the fix, and quantifies the result.

Common technical themes to be ready to discuss in depth if asked to go deeper

Missing/wrong indexes, non-sargable predicates, N+1 query patterns, join row-explosion, stale statistics, lock contention, or a genuine schema/data-model issue requiring more significant rework (denormalization, a materialized view, caching). Being able to go two or three "why" questions deep into any of these — not just name-drop them — is what actually distinguishes real hands-on experience from memorized talking points.

Preparing for this question

Have at least one real, specific story ready (even a modest one from a side project counts, if you don't have significant production experience) with actual before/after numbers if at all possible — a specific, well-reasoned example beats a more impressive-sounding but vague one every time.

Open as page

A concrete process to describe

1. Understand the domain and real requirements first. What entities exist, what are their genuine relationships (one-to-many? many-to-many?), and — critically — what are the actual business rules and invariants that must hold (can an order exist without a customer? can a product belong to multiple categories?). Skipping this and jumping straight to CREATE TABLE is the most common mistake, producing a schema that technically works but doesn't reflect the domain's real rules.

2. Sketch the conceptual/ER model before the physical schema. Identify entities, attributes, and relationship cardinality (see the ER-diagram question) — this is where you decide whether something needs a junction table (many-to-many), a foreign key (one-to-many), or genuinely belongs embedded in another entity.

3. Choose keys deliberately. Surrogate vs. natural key per entity (see that question), and think about what other tables will need to reference.

4. Normalize to a sensible baseline (usually 3NF), and add constraints that encode real business rules — NOT NULL, CHECK, foreign keys with deliberate ON DELETE behavior (see that question) — rather than relying purely on application-layer validation, which every future code path would need to remember to replicate.

5. Validate against actual query patterns before calling it done. Write out (or at least sketch) the 3-5 most important queries this feature will need — the schema needs to support them reasonably efficiently, not just look structurally correct. This is the step most likely to reveal a needed index, or a case where a slight denormalization is actually justified for a genuinely hot read path.

6. Consider evolution, not just the current requirement. Is there an obvious, likely-near-term extension (e.g., "will a product need to belong to multiple categories eventually, even if not on day one?") that would be far more painful to retrofit than to accommodate now? This is a judgment call, not a mandate to over-engineer for hypothetical future needs — the goal is avoiding obviously short-sighted choices, not speculative generality.

7. Plan the migration path, even for a brand-new feature — how will this schema be deployed, and if requirements shift shortly after launch (which they often do), how painful would evolving it be?

What interviewers listen for

A methodical process that starts from the domain and validates against real queries — not a candidate who recites normal forms from memory without connecting them to an actual design decision, and not a candidate who jumps straight to schema syntax without first understanding what the data actually represents and how it'll really be used.

Open as page

This question tests whether a candidate gathers information deliberately before making an architectural decision, rather than pattern-matching to a favorite tool. A strong answer organizes the questions into clear categories.

Questions about the data itself

What does the data actually look like — fixed and well-structured, or highly variable/nested per record?
What are the real relationships between entities — how relational is this data, genuinely (not just "everything is technically related to everything")?
How will this data be queried — simple key lookups, complex ad-hoc joins/aggregation, full-text search, graph traversal, geospatial queries?

Questions about scale and growth

What's the expected read/write volume today, and realistically in 1-2 years — not a wildly speculative "what if we go viral" number, but a grounded estimate?
Is the workload read-heavy, write-heavy, or roughly balanced?
How much total data storage is expected, and how fast does that grow?

Questions about consistency and correctness requirements

What's the actual cost of a stale read, or (worse) a lost write, for this specific data? (A financial balance and a "like" counter have very different answers.)
Are there multi-entity invariants that must be enforced atomically (e.g., "total allocated seats never exceeds capacity")?

Questions about team and organizational context

What does the team already have deep operational experience running and troubleshooting in production? (Adopting an unfamiliar technology has a real, often underestimated, ongoing cost.)
What's the existing infrastructure/cloud-provider ecosystem, and what does it make easiest to operate reliably?

Questions about constraints

What's the budget for licensing/managed-service costs?
Are there compliance requirements (HIPAA, PCI-DSS, data residency laws) that constrain where/how data can be stored, encrypted, or audited?
Are there existing systems this needs to integrate with that constrain the choice (e.g., an existing data warehouse pipeline that expects a specific format)?

Why asking questions (rather than jumping to an answer) is itself the right signal

A candidate who immediately says "use Postgres" or "use MongoDB" without first asking any of the above is skipping the actual analysis a real architectural decision requires — the "right" database is entirely dependent on answers to these questions, and a senior engineer's value in this conversation is in knowing which questions actually change the recommendation, not in having a single favorite tool for every situation.