Answer:
A stream is an abstraction for reading or writing data incrementally — piece by piece — rather than holding the whole payload in memory. Streams are EventEmitters.
The four types:
| Type | Direction | Examples |
|---|---|---|
| Readable | source you read from | fs.createReadStream, HTTP request, process.stdin |
| Writable | sink you write to | fs.createWriteStream, HTTP response, process.stdout |
| Duplex | both, independent | TCP socket (net.Socket) |
| Transform | Duplex that transforms input→output | zlib.createGzip, crypto cipher streams |
Reading and writing:
const fs = require('fs');
const rs = fs.createReadStream('input.txt');
const ws = fs.createWriteStream('output.txt');
rs.on('data', chunk => ws.write(chunk));
rs.on('end', () => ws.end());
rs.on('error', err => console.error(err));
The idiomatic version — pipe:
fs.createReadStream('input.txt')
.pipe(zlib.createGzip()) // Transform
.pipe(fs.createWriteStream('input.txt.gz'));
Why streams matter: they keep memory usage constant and low regardless of file/response size and start producing output before all input has arrived (lower latency). They're everywhere in Node — HTTP bodies, file I/O, compression, crypto.
Answer: Backpressure occurs when a data source outpaces the destination — e.g., reading a file from a fast SSD and writing to a slow network socket. Without handling it, data piles up in memory and can exhaust it.
How Writable streams signal it:
writable.write(chunk)returnsfalsewhen the internal buffer has exceeded itshighWaterMark(default 16 KB for byte streams).- When you get
false, you should stop writing and wait for the'drain'event before resuming.
Manual handling (illustrative):
readable.on('data', (chunk) => {
const ok = writable.write(chunk);
if (!ok) {
readable.pause(); // stop reading
writable.once('drain', () => readable.resume()); // resume when drained
}
});
The right way — let Node manage it:
const { pipeline } = require('stream/promises');
await pipeline(
fs.createReadStream('huge.log'),
zlib.createGzip(),
fs.createWriteStream('huge.log.gz')
);
// pipeline handles backpressure AND propagates errors + cleans up
pipe()andpipeline()automatically pause/resume the source based on the destination's readiness.pipeline()additionally forwards errors and destroys all streams on failure (avoiding leaks) — prefer it overpipe()for anything beyond trivial cases.
Interview point: backpressure is the reason to use pipe/pipeline instead of hand-rolling on('data') + write(); manual loops without drain handling leak memory under load.
Answer: A Buffer represents a fixed-length sequence of raw bytes. Because JavaScript strings are UTF-16 and not suited to arbitrary binary data, Node uses Buffers for anything binary: file I/O, TCP packets, cryptography, image/video bytes, protocol parsing.
Creating Buffers:
Buffer.from('hello', 'utf8'); // from a string
Buffer.from([0x68, 0x69]); // from bytes
Buffer.alloc(10); // 10 zero-filled bytes (safe)
Buffer.allocUnsafe(10); // faster, but may contain old memory — overwrite before use
Encoding conversions:
const buf = Buffer.from('hello');
buf.toString('utf8'); // 'hello'
buf.toString('hex'); // '68656c6c6f'
buf.toString('base64'); // 'aGVsbG8='
Key facts:
- A Buffer is a subclass of
Uint8Array, so TypedArray methods work on it. - It's allocated outside the V8 heap (in C++), so large Buffers don't pressure V8's garbage collector the same way.
- Fixed size — you can't grow a Buffer; you allocate a new one or use a stream.
Security note: prefer Buffer.alloc over Buffer.allocUnsafe. allocUnsafe skips zero-filling for speed and can expose leftover memory contents if you read before fully writing it.
When you use Buffers directly: implementing binary protocols, hashing/encrypting bytes, manipulating image data, or reading a file's raw bytes. For text you usually just specify an encoding and work with strings.
Answer: A Readable stream operates in one of two modes governing how data moves.
Paused mode (default): you explicitly pull data:
readable.on('readable', () => {
let chunk;
while ((chunk = readable.read()) !== null) {
process(chunk);
}
});
Flowing mode: data is pushed to you as fast as it arrives. You enter it by:
- attaching a
'data'listener, - calling
.pipe(), or - calling
.resume().
readable.on('data', chunk => process(chunk)); // now flowing
Switching: adding a 'data' handler or pipe() → flowing; .pause() → paused; removing pipes/handlers can pause again.
Modern, preferred approach — async iteration:
async function readAll(readable) {
let total = 0;
for await (const chunk of readable) { // handles flow + backpressure
total += chunk.length;
}
return total;
}
for await...of is the cleanest way to consume a stream: it respects backpressure, propagates errors as exceptions (usable with try/catch), and reads until the stream ends.
Gotcha: In flowing mode, if you attach a 'data' listener but the consumer is slow and you don't manage backpressure, memory can grow. pipe/pipeline/for await avoid this; a bare 'data' loop does not.
Answer:
Buffering (read it all at once):
const data = await fs.promises.readFile('report.csv'); // whole file in memory
Simple and fine for small, bounded data. But memory usage = file size × concurrent requests, so a 1 GB file (or many medium ones) can OOM the process.
Streaming (process chunk by chunk):
fs.createReadStream('report.csv')
.pipe(csvParser())
.pipe(transformRows())
.pipe(res); // send to the client as you go
Use streams when:
- Large or unknown-size data — big files, uploads/downloads, DB exports, log processing.
- Constant memory matters — memory stays around the buffer size regardless of total volume.
- Lower latency / TTFB — you can start sending/processing before all input is read (e.g., piping a file straight to an HTTP response).
- Composable pipelines — chain read → decompress → parse → transform → write.
Stick with buffering when:
- Data is small and you need the whole thing at once (e.g., parse a small JSON config).
- The processing genuinely requires random access to all the data.
Real-world example: serving a large file download should createReadStream(...).pipe(res), not readFile then res.send, so one big download doesn't spike memory for every concurrent client.