Maybe it’s the search terms I used, I couldn’t find straightforward and simple examples to read binary streams with JavaIO. It’s not I don’t know how to, but I think it’s useful for other developers who know less IO. So, here it is. The first example shows reading a byte at a time, the second shows reading in chunks.
// Example 1
InputStream in = somewhere.getInputStream();
int c;
while ((c = in.read()) != -1) {
// process read byte
System.out.print((char)c);
}
The input stream is where you want to read from, such as a new FileInputStream() or socket.getInputStream(). An integer c is declared to hold each byte as it is read. The in.read() method will return an integer in the range 0-255, which is the value of the byte read. Why not use a Java byte? This is because the Java byte is signed and has the range -128 to 127. Java gives you the “benefit” of getting the actual value so you can process it directly.
The while loop line is the most confusing part of the loop. If we resolve the inner bracket first, it reads a byte from the inputstream, and stores it in c. The bracket now resolves to the value of c, which is compared against -1. This comparison is done because read() will return a -1 if the end of stream is reached. This will break the processing loop and allow the program to continue.
If a valid value is read, it goes into the loop, and you can process the integer c. In this example it just prints the read byte. The cast to char is necessary, or else it will print the number code of the byte.
// Example 2
InputStream in = somewhere.getInputStream();
OutputStream out = new ByteArrayOutputStream(); // for example
byte[] buf = new byte[1024];
int len;
while ((len = in.read(buf)) != -1) {
// process byte buffer
out.write(buf, 0, len);
}
In this example we also have an inputstream to read from, additionally we prepare an outputstream where we will store the read bytes. You should change it to whatever your purpose was for reading the stream. For this method we’ll need two variables — a byte array buffer (buf) for storing the read bytes and an integer (len) which represent the number of bytes actually read. There’s no fixed number for the buffer size, it’ll work whether you put 10 or 100000 for now. I’ll explain the effect later.
Next we reach the confusing while loop again, this time it’s even more complicated. Let’s resolve the inner bracket again, what happens here is in.read() will modify the byte array to store the content of the read bytes. This means when you execute in.read(buf) by itself, the contents of buf before and after this statement might be different. The number returned will tell you how many bytes were read and stored in that byte array. The inner bracket now resolves to the value of len, which is matched against -1. This is because in.read() will return -1 if the end of the stream is reached, so we can terminate the while loop.
If some bytes were read, we go into the while loop, and write the read bytes to the outputstream. We will need to specify that we want to write the bytes 0 to len, because it may be possible that the inputstream read less than the size of the buffer. This may be due to a network latency, or it could be the last chunk of a file that is not a multiple of the buffer size. If len was not specified, we might be writing rubbish that contains data previously written into the byte array, thus corrupting the data.
Now that you understand the loop (hopefully), I’ll explain the effect of the buffer size. If you have a small buffer, you’ll need to run the loop more times, to read an amount of data. If you have a big buffer, you’ll loop less times for the same amount of data. Then why not assign a VERY BIG buffer? This depends on the amount of memory your application can spare. Allocating a big buffer means the byte buffer will take up that much memory, even if only a small part of it is used. So the decision on the size of the buffer depends on whether you have constraints on processing power or memory size, or even the typical size of data read. You don’t need a 1MB buffer to read 1KB streams.
The 2nd buffer method is more efficient than the one used in Example 1, which reads byte by byte. Therefore it is preferred the 2nd method is used.
Thanks very much. I have been trying to find something that reads in chunks and there are many complicated algorithms out there, but this was simple and worked perfectly. 🙂