What is the difference between sequential and parallel streams, and when should you avoid parallel streams?
Quick Answer
A sequential stream processes elements one at a time on the calling thread; a parallel stream (via .parallel() or Collection.parallelStream()) splits the source and processes chunks concurrently across threads in the common ForkJoinPool, then merges results. Parallel streams should be avoided for small datasets (overhead exceeds benefit), I/O-bound or blocking operations (they starve the shared ForkJoinPool used JVM-wide), sources that don't split efficiently (like LinkedList), and operations with side effects or that depend on encounter order.
Detailed Answer
stream.parallel() (or collection.parallelStream()) splits the underlying data source into chunks, processes each chunk concurrently across multiple threads (using the JVM-wide common ForkJoinPool, sized to the number of CPU cores by default), and merges the partial results — versus a sequential stream, which processes every element one at a time on the calling thread.
long count = list.parallelStream()
.filter(x -> isExpensiveCheck(x))
.count();
When parallel streams help: large datasets (thousands+ elements), CPU-bound per-element work (heavy computation, not I/O), and a data source that splits efficiently (arrays, ArrayList, ranges) — genuine multi-core speedup is possible in exactly this profile.
When to avoid parallel streams:
- Small collections — the overhead of splitting work and coordinating threads easily exceeds any benefit; sequential is often faster for anything under a few thousand elements.
- I/O-bound or blocking operations (network calls, file access, database queries) — these threads block, but they're borrowed from the shared common
ForkJoinPool, which is used JVM-wide by all parallel streams andCompletableFuture.supplyAsynccalls by default. Blocking those threads can starve unrelated parallel work elsewhere in the application. - Poorly splittable sources —
LinkedList(no efficient random access, splits poorly) gains little or nothing from parallelism, unlikeArrayListor arrays. - Operations with side effects on shared mutable state — e.g.,
list.parallelStream().forEach(x -> sharedList.add(x))on a non-thread-safe collection is a race condition; parallel streams don't make your lambda's side effects thread-safe. - Order-dependent operations —
forEachon a parallel stream doesn't guarantee encounter order (useforEachOrderedif order matters, though that reduces the parallelism benefit).
Practical guidance: default to sequential streams; reach for .parallel() only after profiling shows a genuine CPU-bound bottleneck on a large enough dataset, and be mindful that it shares the same common pool as other parts of the application.