I/O, Strings & Serialization

Difficulty

java.io (original, since Java 1.0) is built around streams: InputStream/OutputStream for bytes, Reader/Writer for characters. It's blocking — a read call waits until data is available (or EOF), and a typical server design needs one thread per connection to handle many clients concurrently, which doesn't scale to very large numbers of simultaneous connections.

try (BufferedReader br = new BufferedReader(new FileReader("file.txt"))) {
    String line = br.readLine(); // blocks until a line is available
}

java.nio ("New I/O", Java 4+) introduced buffers (ByteBuffer and friends) and channels (FileChannel, SocketChannel) as a lower-level, more flexible abstraction, plus Selector-based multiplexing: a single thread can monitor many channels and only act on the ones that are actually ready for I/O, avoiding a thread-per-connection design for high-concurrency servers.

try (FileChannel channel = FileChannel.open(Path.of("file.txt"))) {
    ByteBuffer buffer = ByteBuffer.allocate(1024);
    channel.read(buffer);
}

NIO.2 (Java 7) further modernized file system access specifically, adding Path/Paths and the Files utility class as a much richer, more consistent replacement for the old File class (which had inconsistent error reporting, no symbolic link support, and limited metadata access):

Path path = Path.of("data.txt");
List<String> lines = Files.readAllLines(path); // simple, modern convenience API
Files.copy(source, target);

Practical guidance: for everyday file reading/writing, the Files/Path convenience methods (NIO.2) are now the idiomatic default, even for simple blocking use cases — the raw channel/selector machinery mainly matters when building high-concurrency network servers that need to scale beyond a thread-per-connection model.

Related Resources

StringStringBuilderStringBuffer
Mutabilityimmutablemutablemutable
Thread safetysafe (immutable, inherently)not synchronizedsynchronized (thread-safe)
Performancefine for few/no modificationsfastest for building/modifyingslower than StringBuilder due to lock overhead
Introducedsince 1.0Java 5since 1.0

String's immutability means every +/concat/replace call allocates a new String object — fine for one-off operations, but wasteful in a loop:

String s = "";
for (int i = 0; i < 1000; i++) {
    s += i; // creates ~1000 intermediate String objects — O(n²) overall
}

StringBuilder avoids this by mutating an internal, resizable character array in place:

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 1000; i++) {
    sb.append(i); // mutates in place — no new object per iteration
}
String result = sb.toString();

StringBuffer behaves identically to StringBuilder but synchronizes every method, making it safe to share across threads — though in practice, a StringBuilder/StringBuffer is almost always built up and converted to a final String entirely within one thread/method before being shared, making that synchronization overhead rarely worth paying. StringBuilder is the default recommendation for essentially all new code; StringBuffer mostly persists for legacy code and rare genuinely-shared-mutable-buffer scenarios.

Note: the compiler automatically optimizes simple, single-expression string concatenation ("a" + b + "c") into efficient StringBuilder calls internally — the manual StringBuilder concern is specifically about concatenation spread across a loop or built up incrementally across multiple statements.

String.intern() explicitly ties a String instance to the JVM's string pool (the same pool that string literals are automatically placed in):

String a = new String("hello"); // NOT in the pool — a fresh heap object
String b = "hello";              // literal — automatically in the pool

a == b;              // false — different objects
a.intern() == b;     // true — intern() finds/returns the pooled "hello"

Behavior: intern() checks whether a string with the same content already exists in the pool. If so, it returns that existing pooled reference (discarding the caller's distinct instance, from a reference-identity perspective); if not, it adds the current string to the pool and returns a reference to it.

Why use it: if you know a specific string value recurs very frequently (e.g., parsed tokens, repeated field values from a large dataset), interning lets multiple otherwise-independent String objects collapse into one shared instance — saving memory, and enabling fast == comparisons where content equality is guaranteed by the fact that both references point at the same pooled object.

Caveat: interning has a cost (a pool lookup, and potentially growing the pool), so indiscriminately interning every string can increase memory pressure and hurt performance rather than help — it's a targeted optimization for specific, measured cases of highly repeated string values, not a general-purpose habit.

Serialization converts an object graph into a byte stream that can be written to a file, sent over a network, or stored, and later reconstructed into an equivalent object via deserialization:

class User implements Serializable {
    String name;
    int age;
}

try (ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream("user.dat"))) {
    out.writeObject(new User());
}
try (ObjectInputStream in = new ObjectInputStream(new FileInputStream("user.dat"))) {
    User u = (User) in.readObject();
}

A class must implement the Serializable marker interface (no methods to implement — it's purely a signal to ObjectOutputStream, which throws NotSerializableException for a class that doesn't implement it). Every non-transient field must itself be serializable (or primitive), recursively — an object graph with a non-serializable field will fail unless that field is marked transient.

serialVersionUID is a long constant embedded in the serialized bytes, used by ObjectInputStream to verify that the sender's and receiver's class versions are compatible during deserialization:

class User implements Serializable {
    private static final long serialVersionUID = 1L; // explicit, stable
    String name;
    int age;
}

If you don't declare it explicitly, the JVM computes one automatically based on the class's structure (fields, methods, interfaces) — which means an unrelated change (adding a method, reordering fields) can silently change the computed UID, causing InvalidClassException when trying to deserialize data that was serialized by an older version of the class, even if the change was otherwise harmless. Declaring an explicit serialVersionUID avoids this fragility and gives you conscious control over exactly when you intend to break compatibility with previously serialized data.

Given serialization's known security history (deserializing untrusted data has been a major source of RCE vulnerabilities) and its brittleness across versions, many modern systems prefer explicit, versioned formats (JSON, Protocol Buffers) over Java's built-in serialization for anything beyond simple, trusted, same-version use cases like short-lived caching.

transient marks a field so the default Java serialization mechanism skips it — its value is simply not written to the serialized byte stream, and on deserialization it's restored to its type's default (null for objects, 0/false for primitives), regardless of what it held before serialization.

class Session implements Serializable {
    String userId;              // serialized normally
    transient Connection dbConn; // skipped — Connection isn't serializable anyway
    transient String cachedToken; // skipped — recomputed after deserialization, not persisted
}

Typical reasons to mark a field transient:

  • The field's type isn't serializable and can't reasonably be made so (a Thread, a Socket, a database Connection) — including it would throw NotSerializableException at serialization time.
  • The value is derived/cacheable and cheaper to recompute than to persist and restore.
  • The value is sensitive and shouldn't be persisted to disk or sent over the wire as part of the serialized form (e.g., a decrypted secret held only in memory).

If a class needs custom logic to restore a transient field's value after deserialization (rather than leaving it at its default), it can implement a private readObject(ObjectInputStream) method that calls defaultReadObject() for the normal fields and then manually recomputes/reinitializes the transient ones.