What are dataclasses, and how do they compare to plain classes, `namedtuple`, and `attrs`?
Quick Answer
`@dataclass` (PEP 557, Python 3.7+) auto-generates `__init__`, `__repr__`, and `__eq__` (and optionally `__lt__`/ordering, immutability, hashing) from type-annotated class attributes, removing the boilerplate of writing them by hand. Compared to `namedtuple`, dataclasses are mutable by default, support default values/methods naturally, and are regular classes (support inheritance); `attrs` predates dataclasses and offers more features (validators, converters) at the cost of a third-party dependency.
Detailed Answer
The boilerplate dataclasses remove
# Without dataclasses
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
return f"Point(x={self.x!r}, y={self.y!r})"
def __eq__(self, other):
if not isinstance(other, Point):
return NotImplemented
return (self.x, self.y) == (other.x, other.y)
# With dataclasses -- same behavior, generated automatically
from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
@dataclass reads the class's type annotations and generates __init__,
__repr__, and __eq__ for you. Point(1, 2) == Point(1, 2) is True,
and repr(Point(1, 2)) prints Point(x=1, y=2) — both for free.
Useful options
from dataclasses import dataclass, field
@dataclass(order=True, frozen=True)
class Money:
cents: int
currency: str = "USD"
tags: list = field(default_factory=list) # mutable default, done safely
m1 = Money(100)
m2 = Money(200)
m1 < m2 # True -- order=True generates __lt__ etc. (field-by-field)
m1.cents = 500 # FrozenInstanceError -- frozen=True makes it immutable
field(default_factory=...) solves the mutable-default-argument problem
for dataclass fields specifically — you can't write tags: list = []
directly (dataclasses explicitly raise an error for mutable defaults
without a factory).
Comparison to alternatives
| Plain class | namedtuple | @dataclass | attrs | |
|---|---|---|---|---|
| Boilerplate | you write everything | none, but limited | none | none |
| Mutable | yes (your choice) | no (tuple-based) | yes by default, frozen=True opts out | either |
| Methods | yes | yes (limited) | yes | yes |
| Inheritance | full | awkward | full | full |
| Validators/converters | manual | no | manual (__post_init__) | built-in |
| Dependency | none | none (stdlib) | none (stdlib, 3.7+) | third-party |
namedtuple is best for genuinely tuple-like, immutable records accessed
positionally as well as by name; @dataclass is the standard modern
choice for "a class that's mostly data plus a few methods"; attrs
remains popular when you need richer validation/conversion features than
@dataclass's __post_init__ conveniently provides.
Interview-ready summary: @dataclass auto-generates __init__,
__repr__, and __eq__ from annotated fields, removing the most common
class boilerplate while staying a normal, mutable, inheritable class —
reach for namedtuple for lightweight immutable tuples, and attrs when
you need validators/converters beyond what @dataclass provides out of
the box.