What is the difference between a byte stream and a character stream in Java I/O?

7 minbeginnerbyte-streamcharacter-streamio

Quick Answer

Byte streams (InputStream/OutputStream and subclasses) read and write raw 8-bit bytes, suitable for binary data (images, serialized objects, arbitrary files) with no notion of text encoding. Character streams (Reader/Writer and subclasses) read and write 16-bit Unicode characters, automatically handling the conversion between raw bytes and characters according to a specified (or platform-default) character encoding, making them the right choice for text data.

Detailed Answer

Java's I/O classes are split into two parallel hierarchies, distinguished by what unit of data they work with:

  • Byte streamsInputStream/OutputStream and their subclasses (FileInputStream, BufferedInputStream, ObjectOutputStream, ...) — read/write raw 8-bit bytes, with no interpretation of text encoding at all. Correct for binary data: images, audio, serialized objects, ZIP files, or any file whose content isn't meant to be interpreted as text.
try (InputStream in = new FileInputStream("image.png")) {
    int b = in.read(); // one raw byte at a time (or use a byte[] buffer)
}
  • Character streamsReader/Writer and their subclasses (FileReader, BufferedReader, InputStreamReader, OutputStreamWriter, ...) — read/write 16-bit Unicode characters, automatically handling the conversion between raw bytes and characters according to a specified (or platform-default) character encoding (UTF-8, etc.).
try (Reader r = new FileReader("text.txt", StandardCharsets.UTF_8)) {
    int c = r.read(); // one character (correctly decoded from the underlying bytes)
}

The bridge classes InputStreamReader/OutputStreamWriter explicitly convert between the two worlds — a byte stream in, a character stream out (or vice versa) — given an explicit charset, which is exactly what FileReader/FileWriter do internally under the hood using either a specified or the platform-default charset.

Rule of thumb: use byte streams for anything binary; use character streams (always specifying an explicit charset like StandardCharsets.UTF_8, rather than relying on the platform default) for anything that's genuinely text, to avoid subtle encoding bugs when code runs on machines with different default charsets.