Collections Framework

Difficulty

The framework is organized around two root interfaces:

Collection<E> — a group of elements:

  • List<E> — ordered, allows duplicates, indexed access (ArrayList, LinkedList, Vector)
  • Set<E> — no duplicates (HashSet, LinkedHashSet, TreeSet)
  • Queue<E> / Deque<E> — ordered for processing (FIFO/LIFO) (ArrayDeque, PriorityQueue, LinkedList)

Map<K, V> — key-value pairs, deliberately not a subtype of Collection (a map entry is a pair, not a single element): HashMap, LinkedHashMap, TreeMap, ConcurrentHashMap.

Collection            Map
├── List               ├── HashMap
│   ├── ArrayList       ├── LinkedHashMap
│   └── LinkedList      └── TreeMap
├── Set
│   ├── HashSet
│   ├── LinkedHashSet
│   └── TreeSet
└── Queue / Deque
    ├── ArrayDeque
    └── PriorityQueue

Note: Collections (plural, in java.util) is an unrelated utility class of static helpers (Collections.sort, Collections.unmodifiableList, Collections.emptyList) — easy to confuse with the Collection interface by name alone.

Most implementations also come in a concurrent flavor under java.util.concurrent (ConcurrentHashMap, CopyOnWriteArrayList, ConcurrentLinkedQueue) for thread-safe access without external synchronization.

Both implement List, but with very different internal structures and performance profiles:

ArrayListLinkedList
Backing structureresizable arraydoubly-linked list of nodes
get(index)O(1)O(n) — must walk from the nearer end
add/remove at endamortized O(1)O(1)
add/remove in the middleO(n) — shifts elementsO(1) once you have the node/iterator position, O(n) to find it
Memory overheadcompact (just the array + slack)extra per-element node objects (prev/next pointers)
Cache localitygood (contiguous memory)poor (nodes scattered on heap)

In practice, ArrayList is almost always the better default: even "insert in the middle" workloads are often faster with ArrayList in real benchmarks because array shifting is a cheap, cache-friendly System.arraycopy, whereas LinkedList's node-chasing defeats CPU cache prefetching. LinkedList mainly earns its keep when you need a Deque (stack/queue operations at both ends) or frequent insert/remove via an existing ListIterator without index lookups.

LinkedList also implements Deque, so it's commonly used as a stack or queue, though ArrayDeque is now generally preferred for that purpose (lower overhead, no null elements allowed which avoids ambiguity).

HashMap<K,V> is backed by an array of buckets (Node<K,V>[] table), where each bucket can hold multiple entries that hashed to the same slot.

Put/get, step by step:

  1. Compute key.hashCode(), then apply HashMap's internal spreading function (h ^ (h >>> 16)) to mix high bits into the low bits — this reduces collisions for keys whose hash codes differ mainly in upper bits.
  2. Compute the bucket index as hash & (table.length - 1) (equivalent to hash % capacity since capacity is always a power of two, but much faster).
  3. Within that bucket, walk the entries and use equals() to find a matching key (a get returns its value; a put with an existing key replaces it, otherwise a new entry is appended).

Collision handling: each bucket starts as a linked list of entries. Since Java 8, if a single bucket grows beyond a threshold (default 8) and the table is large enough, that bucket is converted into a balanced red-black tree, bounding worst-case lookup within a bucket from O(n) to O(log n) — this mainly guards against pathological hash collisions (e.g., maliciously crafted keys).

Resizing: the map tracks size vs. capacity * loadFactor (default load factor 0.75). Once size exceeds that threshold, capacity doubles and every entry is rehashed into the new, larger table — an O(n) operation, which is why specifying an expected capacity up front (new HashMap<>(expectedSize)) avoids repeated resizes in hot paths.

Requirements on keys: must have consistent hashCode()/equals() (per the contract) — mutating a key's hash-relevant fields after insertion "loses" the entry, since it will look in the wrong bucket on lookup.

HashMapHashtableConcurrentHashMap
Thread safetynonefully synchronized (every method locks)thread-safe, fine-grained (no single global lock)
Null keys/values1 null key, many null valuesnone allowednone allowed
Performance under concurrencyfastest, single-threaded useslow — one thread at a time, even for readshigh — concurrent reads, striped/CAS-based writes
Iterationfail-fast (ConcurrentModificationException)fail-fastweakly consistent (never throws CME, may or may not reflect concurrent updates)
Eramodern defaultlegacy (pre-Java 2 collections, Map interface retrofitted on)java.util.concurrent, the modern concurrent choice

Hashtable is effectively deprecated in practice — it predates the Collections Framework and synchronizes every single method call (get, put, size, ...), which serializes all access even for read-heavy workloads.

ConcurrentHashMap achieves thread safety without that bottleneck: historically via segment-level locking, and since Java 8 via per-bucket synchronization plus CAS operations on individual bins, so unrelated keys rarely contend. Its null restriction is deliberate: in a concurrent map, map.get(key) == null can't distinguish "no such key" from "key maps to null" reliably when another thread might be concurrently modifying the map — Doug Lea (its author) chose to simply disallow nulls entirely.

Rule of thumb: use HashMap single-threaded (or externally synchronized if needed), and ConcurrentHashMap for shared, concurrent access. Avoid Hashtable in new code entirely.

All three implement Set (no duplicates), backed by different underlying structures:

  • HashSet: backed by a HashMap internally (elements stored as keys). O(1) average add/contains/remove, but iteration order is unspecified and can change across JVM versions or even between runs.
  • LinkedHashSet: extends HashSet but additionally threads a doubly-linked list through all entries to remember insertion order — iteration order matches the order elements were added, at a modest extra memory cost (the links) and similar O(1) operation cost.
  • TreeSet: implements NavigableSet, backed by a red-black tree (TreeMap internally). Elements are always kept in sorted order (natural ordering via Comparable, or a custom Comparator passed to the constructor). Operations are O(log n), and it supports range views (headSet, tailSet, subSet) and nearest-neighbor queries (floor, ceiling, higher, lower).
Set<String> hash = new HashSet<>();       // arbitrary iteration order
Set<String> linked = new LinkedHashSet<>(); // insertion order
Set<String> tree = new TreeSet<>();       // sorted order

Choosing: use HashSet by default for pure membership testing; LinkedHashSet when you need predictable, insertion-ordered iteration without paying for sorting; TreeSet when you need the elements sorted or need range/nearest queries, accepting the O(log n) cost and the requirement that elements be mutually comparable.