How do you model a many-to-many relationship in a relational schema?
Quick Answer
Relational tables can only express one-to-one and one-to-many relationships directly, so a many-to-many relationship requires a **junction table** (also called an associative or bridge table) with foreign keys to both sides, typically with a composite primary key (or a surrogate key plus a `UNIQUE` constraint on the pair). The junction table can also carry attributes specific to the relationship itself, like `enrolled_at` on a student/course enrollment.
Detailed Answer
Consider students and courses: one student can take many courses, and one course can have many students — a genuine many-to-many.
The junction table
CREATE TABLE students (id SERIAL PRIMARY KEY, name VARCHAR(100));
CREATE TABLE courses (id SERIAL PRIMARY KEY, title VARCHAR(200));
CREATE TABLE enrollments (
student_id INT NOT NULL REFERENCES students(id),
course_id INT NOT NULL REFERENCES courses(id),
enrolled_at TIMESTAMP NOT NULL DEFAULT now(),
grade CHAR(2),
PRIMARY KEY (student_id, course_id) -- composite PK: a student can't enroll in the same course twice
);
The composite primary key (student_id, course_id) both links the two tables and enforces that a given pairing can only exist once. If duplicate enrollments should be allowed (e.g., retaking a course in a different term), you'd instead use a surrogate id primary key plus a separate UNIQUE (student_id, course_id, term) constraint that includes whatever column distinguishes legitimate repeats.
Querying through the junction table
-- All courses a given student is enrolled in
SELECT c.title
FROM courses c
JOIN enrollments e ON e.course_id = c.id
WHERE e.student_id = 42;
-- All students enrolled in a given course, with their grade
SELECT s.name, e.grade
FROM students s
JOIN enrollments e ON e.student_id = s.id
WHERE e.course_id = 7;
Why the junction table is also the right place for relationship-specific attributes
A many-to-many relationship often has data that belongs to the pairing, not to either side alone — enrolled_at and grade above only make sense in the context of "this student in this course," not on students or courses individually. This is a strong signal you need a real junction table rather than trying to shoehorn the relationship into either parent table.
Indexing note
Beyond the composite primary key (which already indexes student_id first), it's usually worth adding a secondary index on course_id alone (or (course_id, student_id)) if you frequently query "students in this course" — a composite primary key (student_id, course_id) doesn't efficiently support lookups that start from course_id alone.