Read Byte Streams with Java IO

Maybe it’s the search terms I used, I couldn’t find straightforward and simple examples to read binary streams with JavaIO. It’s not I don’t know how to, but I think it’s useful for other developers who know less IO. So, here it is. The first example shows reading a byte at a time, the second shows reading in chunks.


// Example 1
InputStream in = somewhere.getInputStream();

int c;
while ((c = in.read()) != -1) {
	// process read byte
	System.out.print((char)c);
}

The input stream is where you want to read from, such as a new FileInputStream() or socket.getInputStream(). An integer c is declared to hold each byte as it is read. The in.read() method will return an integer in the range 0-255, which is the value of the byte read. Why not use a Java byte? This is because the Java byte is signed and has the range -128 to 127. Java gives you the “benefit” of getting the actual value so you can process it directly.

The while loop line is the most confusing part of the loop. If we resolve the inner bracket first, it reads a byte from the inputstream, and stores it in c. The bracket now resolves to the value of c, which is compared against -1. This comparison is done because read() will return a -1 if the end of stream is reached. This will break the processing loop and allow the program to continue.

If a valid value is read, it goes into the loop, and you can process the integer c. In this example it just prints the read byte. The cast to char is necessary, or else it will print the number code of the byte.


// Example 2
InputStream in = somewhere.getInputStream();
OutputStream out = new ByteArrayOutputStream(); // for example

byte[] buf = new byte[1024];
int len;

while ((len = in.read(buf)) != -1) {
	// process byte buffer
	out.write(buf, 0, len);
}

In this example we also have an inputstream to read from, additionally we prepare an outputstream where we will store the read bytes. You should change it to whatever your purpose was for reading the stream. For this method we’ll need two variables — a byte array buffer (buf) for storing the read bytes and an integer (len) which represent the number of bytes actually read. There’s no fixed number for the buffer size, it’ll work whether you put 10 or 100000 for now. I’ll explain the effect later.

Next we reach the confusing while loop again, this time it’s even more complicated. Let’s resolve the inner bracket again, what happens here is in.read() will modify the byte array to store the content of the read bytes. This means when you execute in.read(buf) by itself, the contents of buf before and after this statement might be different. The number returned will tell you how many bytes were read and stored in that byte array. The inner bracket now resolves to the value of len, which is matched against -1. This is because in.read() will return -1 if the end of the stream is reached, so we can terminate the while loop.

If some bytes were read, we go into the while loop, and write the read bytes to the outputstream. We will need to specify that we want to write the bytes 0 to len, because it may be possible that the inputstream read less than the size of the buffer. This may be due to a network latency, or it could be the last chunk of a file that is not a multiple of the buffer size. If len was not specified, we might be writing rubbish that contains data previously written into the byte array, thus corrupting the data.

Now that you understand the loop (hopefully), I’ll explain the effect of the buffer size. If you have a small buffer, you’ll need to run the loop more times, to read an amount of data. If you have a big buffer, you’ll loop less times for the same amount of data. Then why not assign a VERY BIG buffer? This depends on the amount of memory your application can spare. Allocating a big buffer means the byte buffer will take up that much memory, even if only a small part of it is used. So the decision on the size of the buffer depends on whether you have constraints on processing power or memory size, or even the typical size of data read. You don’t need a 1MB buffer to read 1KB streams.

The 2nd buffer method is more efficient than the one used in Example 1, which reads byte by byte. Therefore it is preferred the 2nd method is used.

Stripes

Stripes is an “easy-to-use” web framework to overthrow Struts, as described on [1]. I have not tried it myself, but I quite agree with the disadvantages of using Struts, especially the high learning curve of learning Struts. The tight integration between the components and cryptic errors has also made incremental development difficult. Stripes has made it easy for a new developer do Stripes in less time, but it will be especially easy for a existing Struts developer to switch over because of the large similarities between them.

You can read about the Struts post at [2].

[1] Stripes vs. Struts
[2] Apache Jakarta Struts (Action 1)

Ant

I’ve been avoiding ANT for many many years now… probably because I’ve not got into any serious deployment, or something which simple compilation cannot do. However, I finally took my step in (JUST to try it), and it wasn’t so bad after all.

Essentially, the Ant tool is a cross-platform build script (windows batch file or unix make) that can run commands in batches. Core commands (“built-in”) include copy, delete, javac, java, jar, etc. You can group these commands in ‘tasks’ in a ‘project’ to be executed separately or you can create dependant tasks that executes one after another.

Common tasks include init (which setup variable for use in the tasks, usually a dependancy for all other tasks), build/compile (main javac code to javac and move them to appropriate dir), deploy (copy ready jar/war and files to deployment locations) and clean (delete everything).

I’ve used the core copy and jar commands to build Wildfire plugins that are deployed directly, eliminating the need to manually move class files, jar and copy to deploy folder. Compilation was automatically done with Eclipse, and the Ant task was run directly in Eclipse too.

For more complicated projects Ant also support defining your own Ant tasks, though I’ve yet to need such services. However being so extensible you can say it can do anything that you can code.

Effective Java (Review)

4. Avoid creating duplicate objects
5. Eliminate obsolete object references

12. Minimize accesibility of classes and members

21. Replace enum constructs with classes

23. Check parameters for validity
27. Return zero-length arrays, not nulls

29. Minimize the scope of local variables
30. Know and use the libraries
31. Avoid float and double if exact answers are required
32. Avoid strings where other types are more appropriate
33. Beware the performance of string concatenation
36. Use native methods judiciously
37. Optimize judiciously

39. Use exceptions only for exceptional conditions
47. Don’t ignore exceptions

JMF

JMF allows you to work with media – audio and video. Working with here means input (reading in from file/device), processing (encoding/decoding/multiplexing) and output (to file/network/screen).

Nitty-gritty details are mostly shielded from the programmer, yielding a clean interface for use. In fact once we completed our audio streaming, converting it to video streaming was almost a one-liner change for the Format class. However since we need to separate the components clearly (content vs transport), we had to try tear JMF apart. In fact JMF handled so much of RTP that we just need to give it a ip/port and off it goes.

The flexibility of the framework theoratically allows all forms of media to be processed, however it is currently limited to certain codecs and formats, maybe due to the need to interact with native hardware/OS, or licensing issues. We had some problems interacting with webcams, likely issue to be the webcam drivers are application specific, only their own apps know how to access the webcam. Hopefully more webcam providers make their cams OS-generic, such that it can be recognized by Windows. This way it is more likely JMF will be able to detect it.

Another issue we bumped into is that JMF binds to AWT. For displayable components like Video visual and Video controls, it is returned as an AWT component. Since we use SWT it could be a chore to code bridges.

Because of performance, JMF comes in certain performance packs, which probably have certain algorithms in native code so it runs faster/better by making use of the OS/hardware. (accelerator? processor opcodes?)

Annotations in J2SE 5.0

Among the new features available (enhanced for loop, generics) in J2SE 5.0 (or 1.5, or Tiger) is the annotations functionality. I have always been hoping to come across simple introduction to such stuff and it came in this issue’s Tech Tips.

Annotations allow us to associate metadata with program elements: metadata is information that describe the program elements, and the elements may be methods or member variables in a class. Annotations are defined using a @xxx tag before the element.

The most common and obvious use is the @Deprecated tag. This tag declares the method as deprecated: it is outdated and kept for backward-compatibility. When the method is called from another class, a warning will be generated.

Other annotations defined in Tiger are @Override and @SuppressWarnings. @Override insists that you are overriding a method in case of a typo or wrong method signature, similar to C#’s Overrides modifier. @SuppressWarnings will tell the compiler not to tell you about certain warnings such as deprecation or unchecked casts in the method. It can be used with some options, such as @SuppressWarnings(“fallthrough”). This looks a lot like .Net’s annotations.

In addition, the tip mentioned that custom annotations may be defined, probably something like XDoclet. That could be something else I could learn in the future. The tip also contains a link to a book “Effective Java Programming Language Guide” by Joshua Bloch, which contains 57 Java tips. Well likely could be the next tip review here.

JavaMail / Activation

Javamail is the standard answer for mail access using Java. Currently supported protocols are SMTP, POP3 and IMAP. In short it means you can send outgoing mail using SMTP, receive email using POP3, and I’m not familiar with IMAP, though I remember having it in my Networking course. I think IMAP’s mails are stored and manipulated on the server, as opposed to POP3 mails which are downloaded (generally).

Using the classes was rather easy to me, partly because I am familiar with the SMTP/POP3 protocol (telnet and manually send/receive mails). Might be a good experience for you to try it (the telnet) too, to understand protocols on top of TCP. Online examples also help me to cut&paste code quickly.

Once peculiar thing is that Javamail depends on the JavaBeans Activation Framework (JAF). I’ve seen it around for some time, but never really knew what it was and what it does. I finally learnt that the activation framework handles MIME types (but how?) and so was useful in the JavaMail package. JavaMail was the only library that really used activation for a few years until web services came along. Web services which uses SOAP calls had MIME types as their encoding.

JAXP – Java API for XML Processing

XML has grown to become important over the years, and it is natural that Java gradually adds support to it. Started from external packages in JDK1.3, core JDK1.5 now has JAXP 1.3 in it. JDK1.4 had JAXP version 1.2. Now java.net offers JAXP 1.3 as a standalone package so that JDK1.3 can make use of JAXP as well. However the bundle has no technical support (use at your own risk).

JAXP contains API to CRUD XML, both DOM and SAX. Thats alot of acronyms in one sentence… For DOM you create or obtain a Document, then manipulate elements and attributes within the document. DOM manipulation works generally the same for both Java and Javascript, since they adhere to the DOM specifications. For SAX you hook up handlers to perform tasks when start/end tags are encountered.

Transformers in the JAXP package allows you easily transform XML from one form to another, though I’ve only tried transforming it to a file. Handy output properties during transformation allow formatted output to make it more readable.

This post is purposely made very abstract — commenting very generally on JAXP. However it should contain enough keywords to allow searching for how to perform certain tasks. Dwelling into XML, DOM and SAX can potentially drag very long, with little benefits since there’s so much information out there already.

Best Practices – General

Other topics in the book did not have significant points (at this moment in time) for me to bring up. However certain bits and pieces keep recurring throughout the topics and are worth mentioning:

Consider Internationalization From the Start

If you’re doing an “Enterprise” application, always I18N your Strings. Especially on the JSPs. For database design, locale-specific fields should be normalized to another table with and additional locale column.

Design First, Optimize Later

Make sure you have a good design before you think about how to make your code run better. Don’t keep optimizing your code, let it go once it works (i.e. passes test case). Instead of worrying how well your code runs, think of the other business functions you have yet to fulfil.

In the end the code may not need to be optimized (maybe due to good design). Even if there is a need to, good design makes it easy to perform such optimizations. If optimization is considered first, the benefit may not be necessary, or further optimization is difficult later and hindered by poor design.