Streams & Buffers

Difficulty

Answer: A stream is an abstraction for reading or writing data incrementally — piece by piece — rather than holding the whole payload in memory. Streams are EventEmitters.

The four types:

TypeDirectionExamples
Readablesource you read fromfs.createReadStream, HTTP request, process.stdin
Writablesink you write tofs.createWriteStream, HTTP response, process.stdout
Duplexboth, independentTCP socket (net.Socket)
TransformDuplex that transforms input→outputzlib.createGzip, crypto cipher streams

Reading and writing:

const fs = require('fs');
const rs = fs.createReadStream('input.txt');
const ws = fs.createWriteStream('output.txt');

rs.on('data', chunk => ws.write(chunk));
rs.on('end', () => ws.end());
rs.on('error', err => console.error(err));

The idiomatic version — pipe:

fs.createReadStream('input.txt')
  .pipe(zlib.createGzip())          // Transform
  .pipe(fs.createWriteStream('input.txt.gz'));

Why streams matter: they keep memory usage constant and low regardless of file/response size and start producing output before all input has arrived (lower latency). They're everywhere in Node — HTTP bodies, file I/O, compression, crypto.

Answer: Backpressure occurs when a data source outpaces the destination — e.g., reading a file from a fast SSD and writing to a slow network socket. Without handling it, data piles up in memory and can exhaust it.

How Writable streams signal it:

  • writable.write(chunk) returns false when the internal buffer has exceeded its highWaterMark (default 16 KB for byte streams).
  • When you get false, you should stop writing and wait for the 'drain' event before resuming.

Manual handling (illustrative):

readable.on('data', (chunk) => {
  const ok = writable.write(chunk);
  if (!ok) {
    readable.pause();                 // stop reading
    writable.once('drain', () => readable.resume()); // resume when drained
  }
});

The right way — let Node manage it:

const { pipeline } = require('stream/promises');

await pipeline(
  fs.createReadStream('huge.log'),
  zlib.createGzip(),
  fs.createWriteStream('huge.log.gz')
);
// pipeline handles backpressure AND propagates errors + cleans up
  • pipe() and pipeline() automatically pause/resume the source based on the destination's readiness.
  • pipeline() additionally forwards errors and destroys all streams on failure (avoiding leaks) — prefer it over pipe() for anything beyond trivial cases.

Interview point: backpressure is the reason to use pipe/pipeline instead of hand-rolling on('data') + write(); manual loops without drain handling leak memory under load.

Answer: A Buffer represents a fixed-length sequence of raw bytes. Because JavaScript strings are UTF-16 and not suited to arbitrary binary data, Node uses Buffers for anything binary: file I/O, TCP packets, cryptography, image/video bytes, protocol parsing.

Creating Buffers:

Buffer.from('hello', 'utf8');      // from a string
Buffer.from([0x68, 0x69]);         // from bytes
Buffer.alloc(10);                  // 10 zero-filled bytes (safe)
Buffer.allocUnsafe(10);            // faster, but may contain old memory — overwrite before use

Encoding conversions:

const buf = Buffer.from('hello');
buf.toString('utf8');   // 'hello'
buf.toString('hex');    // '68656c6c6f'
buf.toString('base64'); // 'aGVsbG8='

Key facts:

  • A Buffer is a subclass of Uint8Array, so TypedArray methods work on it.
  • It's allocated outside the V8 heap (in C++), so large Buffers don't pressure V8's garbage collector the same way.
  • Fixed size — you can't grow a Buffer; you allocate a new one or use a stream.

Security note: prefer Buffer.alloc over Buffer.allocUnsafe. allocUnsafe skips zero-filling for speed and can expose leftover memory contents if you read before fully writing it.

When you use Buffers directly: implementing binary protocols, hashing/encrypting bytes, manipulating image data, or reading a file's raw bytes. For text you usually just specify an encoding and work with strings.

Answer: A Readable stream operates in one of two modes governing how data moves.

Paused mode (default): you explicitly pull data:

readable.on('readable', () => {
  let chunk;
  while ((chunk = readable.read()) !== null) {
    process(chunk);
  }
});

Flowing mode: data is pushed to you as fast as it arrives. You enter it by:

  • attaching a 'data' listener,
  • calling .pipe(), or
  • calling .resume().
readable.on('data', chunk => process(chunk)); // now flowing

Switching: adding a 'data' handler or pipe() → flowing; .pause() → paused; removing pipes/handlers can pause again.

Modern, preferred approach — async iteration:

async function readAll(readable) {
  let total = 0;
  for await (const chunk of readable) {   // handles flow + backpressure
    total += chunk.length;
  }
  return total;
}

for await...of is the cleanest way to consume a stream: it respects backpressure, propagates errors as exceptions (usable with try/catch), and reads until the stream ends.

Gotcha: In flowing mode, if you attach a 'data' listener but the consumer is slow and you don't manage backpressure, memory can grow. pipe/pipeline/for await avoid this; a bare 'data' loop does not.

Answer:

Buffering (read it all at once):

const data = await fs.promises.readFile('report.csv'); // whole file in memory

Simple and fine for small, bounded data. But memory usage = file size × concurrent requests, so a 1 GB file (or many medium ones) can OOM the process.

Streaming (process chunk by chunk):

fs.createReadStream('report.csv')
  .pipe(csvParser())
  .pipe(transformRows())
  .pipe(res); // send to the client as you go

Use streams when:

  • Large or unknown-size data — big files, uploads/downloads, DB exports, log processing.
  • Constant memory matters — memory stays around the buffer size regardless of total volume.
  • Lower latency / TTFB — you can start sending/processing before all input is read (e.g., piping a file straight to an HTTP response).
  • Composable pipelines — chain read → decompress → parse → transform → write.

Stick with buffering when:

  • Data is small and you need the whole thing at once (e.g., parse a small JSON config).
  • The processing genuinely requires random access to all the data.

Real-world example: serving a large file download should createReadStream(...).pipe(res), not readFile then res.send, so one big download doesn't spike memory for every concurrent client.