Ordinarily, implementing data access with plain JPA means writing a class that injects an EntityManager and hand-writes methods like findById, save, deleteById — mostly repetitive boilerplate that's nearly identical across every entity type.
Spring Data JPA eliminates that boilerplate: you declare a repository as a plain interface, extending a base interface like JpaRepository<Entity, IdType>, with no implementation class at all:
public interface OrderRepository extends JpaRepository<Order, Long> {
// no implementation needed — save(), findById(), findAll(), deleteById(), etc. all work already
}
@Service
class OrderService {
private final OrderRepository repository; // just inject the interface directly
OrderService(OrderRepository repository) { this.repository = repository; }
Order getOrder(Long id) {
return repository.findById(id).orElseThrow();
}
}
How this actually works under the hood: when the ApplicationContext starts, Spring Data JPA's infrastructure scans for interfaces extending its repository marker interfaces (enabled via @EnableJpaRepositories, which Spring Boot's auto-configuration wires up automatically once spring-boot-starter-data-jpa is present). For each one found, it creates a dynamic proxy implementing that interface at runtime — the proxy's invocation handler delegates standard CRUD method calls to SimpleJpaRepository, a single, generic, concrete implementation class provided by Spring Data that performs the actual work via the underlying JPA EntityManager, parameterized by the specific entity/ID types your interface declared.
This is exactly the same underlying idea as Spring AOP's proxy mechanism (a runtime-generated implementation standing in for an interface) applied specifically to eliminate repetitive data-access boilerplate — JpaRepository already gives you save, findById, findAll, deleteById, count, and pagination/sorting support, all without a single line of implementation code from you.
Related Resources
Spring Data JPA can derive a working query purely from a repository method's name, by parsing it against a defined grammar at application startup:
public interface UserRepository extends JpaRepository<User, Long> {
List<User> findByLastName(String lastName);
List<User> findByLastNameAndAge(String lastName, int age);
List<User> findByAgeGreaterThan(int age);
List<User> findByLastNameOrderByAgeDesc(String lastName);
boolean existsByEmail(String email);
long countByStatus(String status);
}
How the parsing works: Spring Data splits the method name after the introductory keyword (findBy, existsBy, countBy, deleteBy, ...) into segments separated by And/Or, matching each segment against the entity's property names (case-insensitively, and it also understands nested properties like findByAddress_City). Comparison keywords (GreaterThan, Between, Like, IsNull, Containing, StartingWith, ...) modify how a given property is compared. OrderBy<Property><Direction> appends a sort clause.
At application startup, Spring Data validates each declared repository method's name against the entity's actual properties — an invalid property name or malformed keyword combination fails immediately with a clear error, rather than silently doing the wrong thing at runtime.
Internally, each derived method compiles down to an equivalent JPQL query (or, for MongoDB/other Spring Data modules, an equivalent native query for that store) executed the same way a hand-written @Query-annotated method would be.
Trade-off: derived query names are convenient and self-documenting for simple filters, but get unwieldy and hard to read once a query needs more than 2-3 conditions (findByLastNameAndAgeGreaterThanAndStatusInOrderByCreatedAtDesc is technically valid but painful) — at that point, an explicit @Query annotation (or the Criteria API/Querydsl for fully dynamic queries) is usually clearer.
Both let a Spring Data repository method run a query, but differ in how explicit the query itself is:
Derived query methods infer the query entirely from the method name:
List<Order> findByStatusAndCreatedAtAfter(String status, Instant after);
Great for simple, property-based filters — but the method name grows unwieldy fast for anything more complex, and can't express things like joins across unrelated entities, aggregate functions, or arbitrary custom logic.
@Query lets you write the query explicitly, as JPQL (Spring Data JPA's default) or, with nativeQuery = true, as raw SQL specific to your database:
public interface OrderRepository extends JpaRepository<Order, Long> {
@Query("SELECT o FROM Order o JOIN o.customer c WHERE c.region = :region AND o.total > :minTotal")
List<Order> findHighValueOrdersByRegion(@Param("region") String region, @Param("minTotal") BigDecimal minTotal);
@Query(value = "SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days'", nativeQuery = true)
List<Order> findRecentOrders(); // raw SQL, using a Postgres-specific INTERVAL expression
}
When to reach for @Query instead of a derived method name:
- The query involves joins across multiple entities, aggregate functions (
COUNT,SUM,AVG), orGROUP BY/HAVINGclauses that a method-name-derived query can't express cleanly. - You need database-specific SQL features not portable through JPQL — this requires
nativeQuery = true, at the cost of losing JPQL's database-portability and some of Spring Data's automatic result-mapping conveniences. - The equivalent method name would be long and hard to read, even if technically expressible via derivation.
Rule of thumb: derived query methods for straightforward, readable property-based lookups; @Query (JPQL first, native SQL only when genuinely necessary) once the query's complexity would make a derived method name unreadable or when SQL-specific features are required.
@Transactional wraps a method's execution in a database transaction: if the method completes normally, the transaction commits; if it throws an unchecked exception (a RuntimeException or Error), the transaction rolls back by default (checked exceptions do not trigger a rollback unless explicitly configured via rollbackFor).
@Service
class OrderService {
@Transactional
void placeOrder(Order order) {
orderRepository.save(order);
inventoryService.reserveStock(order); // if this throws, the save() above is rolled back too
}
}
Propagation controls how a @Transactional method behaves when it's called from a context that's already inside a transaction:
REQUIRED(default): join the existing transaction if one exists, otherwise start a new one.REQUIRES_NEW: always suspend any existing transaction and start a completely independent new one — useful when a sub-operation (e.g., writing an audit log entry) must commit regardless of whether the outer transaction eventually rolls back.NESTED: runs within a savepoint of the outer transaction (database-dependent support), allowing partial rollback to that savepoint without rolling back the entire outer transaction.- Others (
SUPPORTS,MANDATORY,NOT_SUPPORTED,NEVER) handle rarer cases of "run with or without a transaction" / "must already be in one" / "must not be in one."
Isolation controls what a transaction can see of other concurrent transactions' changes — READ_COMMITTED (common default), REPEATABLE_READ, SERIALIZABLE (strongest, least concurrent), READ_UNCOMMITTED (weakest) — trading consistency guarantees against concurrency/throughput, and ultimately constrained by what the underlying database actually supports for each level.
The classic self-invocation pitfall: @Transactional (like most Spring annotation-driven behavior) is implemented via an AOP proxy wrapping the bean. Calling a @Transactional method on this, from another method in the same class, bypasses the proxy entirely — the call goes directly to the real object, so the transactional advice never runs:
@Service
class OrderService {
void processOrder(Order order) {
saveOrder(order); // calls the REAL object directly, not through the proxy — no transaction runs!
}
@Transactional
void saveOrder(Order order) { ... }
}
Fix: move the @Transactional method to a separate bean (a common, clean solution), or inject a self-reference proxy (@Lazy self-injection, or AopContext.currentProxy() with exposeProxy = true) if restructuring genuinely isn't feasible — though extracting the logic into a separate collaborator bean is almost always the cleaner fix.
Related Resources
The N+1 select problem is one of the most common Spring Data JPA performance pitfalls: fetching a list of N parent entities executes 1 query for the parents, and then, if code subsequently accesses a lazily-loaded association on each one, triggers N additional queries — one per parent — to fetch each one's related data separately, instead of retrieving everything in a single, efficient join.
List<Order> orders = orderRepository.findAll(); // 1 query
for (Order order : orders) {
System.out.println(order.getCustomer().getName()); // triggers 1 lazy-load query PER order — N+1 total!
}
For 100 orders, that's 101 round trips to the database instead of one well-formed join — a significant, often silent performance problem that gets worse linearly as data grows.
Solutions:
1. JOIN FETCH in a JPQL query — explicitly eager-fetches the association in the same query, for that specific query only (doesn't change the entity's default fetch behavior elsewhere):
@Query("SELECT o FROM Order o JOIN FETCH o.customer WHERE o.status = :status")
List<Order> findByStatusWithCustomer(@Param("status") String status);
2. @EntityGraph — declaratively specifies which associations to eagerly fetch for a given repository method, without hand-writing JPQL:
@EntityGraph(attributePaths = {"customer", "lineItems"})
List<Order> findByStatus(String status);
3. Changing the association's default fetch type (a blunter, less flexible fix) — switching @ManyToOne/@OneToOne to FetchType.LAZY explicitly (they default to EAGER, ironically the opposite direction from the N+1 concern for those specific association types) or reconsidering whether an association should be eager at all — but a blanket EAGER setting risks over-fetching data that most queries don't actually need, so targeted per-query solutions (JOIN FETCH/@EntityGraph) are usually preferable to a blanket entity-level change.
Detecting the problem in the first place: enabling Hibernate's SQL logging (spring.jpa.show-sql=true, or better, a proper SQL statement counter/logging tool in tests) during development is essential — the N+1 problem is easy to miss entirely in casual manual testing with small datasets, since the extra queries are individually fast; it only becomes visibly painful at production scale.