What is the difference between a byte stream and a character stream in Java I/O?
Quick Answer
Byte streams (InputStream/OutputStream and subclasses) read and write raw 8-bit bytes, suitable for binary data (images, serialized objects, arbitrary files) with no notion of text encoding. Character streams (Reader/Writer and subclasses) read and write 16-bit Unicode characters, automatically handling the conversion between raw bytes and characters according to a specified (or platform-default) character encoding, making them the right choice for text data.
Detailed Answer
Java's I/O classes are split into two parallel hierarchies, distinguished by what unit of data they work with:
- Byte streams —
InputStream/OutputStreamand their subclasses (FileInputStream,BufferedInputStream,ObjectOutputStream, ...) — read/write raw 8-bit bytes, with no interpretation of text encoding at all. Correct for binary data: images, audio, serialized objects, ZIP files, or any file whose content isn't meant to be interpreted as text.
try (InputStream in = new FileInputStream("image.png")) {
int b = in.read(); // one raw byte at a time (or use a byte[] buffer)
}
- Character streams —
Reader/Writerand their subclasses (FileReader,BufferedReader,InputStreamReader,OutputStreamWriter, ...) — read/write 16-bit Unicode characters, automatically handling the conversion between raw bytes and characters according to a specified (or platform-default) character encoding (UTF-8, etc.).
try (Reader r = new FileReader("text.txt", StandardCharsets.UTF_8)) {
int c = r.read(); // one character (correctly decoded from the underlying bytes)
}
The bridge classes InputStreamReader/OutputStreamWriter explicitly convert between the two worlds — a byte stream in, a character stream out (or vice versa) — given an explicit charset, which is exactly what FileReader/FileWriter do internally under the hood using either a specified or the platform-default charset.
Rule of thumb: use byte streams for anything binary; use character streams (always specifying an explicit charset like StandardCharsets.UTF_8, rather than relying on the platform default) for anything that's genuinely text, to avoid subtle encoding bugs when code runs on machines with different default charsets.