2009-11-10

How to read a whole file to String in Java

Reading a whole file to a String in Java is tricky, one has to pay attention to many aspects:

  • Read with the proper character set (encoding).
  • Don't ignore the newline at the end of the file.
  • Don't waste CPU and memory by adding String objects in a loop (use a StringBuffer or an ArrayList<String> instead).
  • Don't waste memory (by line-buffering or double-buffering).

See my solution at http://stackoverflow.com/questions/1656797/how-to-read-a-file-into-string-in-java/1708115#1708115

For your convenience, here it is my code:

// charsetName can be null to use the default charset.    
public static String readFileAsString(String fileName, String charsetName)    
    throws java.io.IOException {    
  java.io.InputStream is = new java.io.FileInputStream(fileName);    
  try {    
    final int bufsize = 4096;    
    int available = is.available();    
    byte data[] = new byte[available < bufsize ? bufsize : available];    
    int used = 0;    
    while (true) {    
      if (data.length - used < bufsize) {    
        byte newData[] = new byte[data.length << 1];    
        System.arraycopy(data, 0, newData, 0, used);    
        data = newData;    
      }    
      int got = is.read(data, used, data.length - used);    
      if (got <= 0) break;    
      used += got;    
    }    
    return charsetName != null ? new String(data, 0, used, charsetName)    
                               : new String(data, 0, used);    
  } finally {
    is.close();  
  }
}

2 comments:

müzso said...

You don't really trust JVM implementations too much. :-) Otherwise you'd have used an InputStreamReader + a StringBuilder and appended each chunk of data read in to the end of the temporary buffer. Instead you decided to use a custom managed buffer (data[]) and handle the buffer allocation yourself.

It'd be interesting to see which performs better.

Btw. I'd choose a buffersize larger than 4K. If your filesystem's block (or sector ... whatever you call it) size is larger than 4K, then a larger buffersize will perform better. If it's 4K or less, than having a buffer size that is a multiple of the block size performs the same as if you chose the blocksize for the buffer size. At least in theory. :-)

pts said...

@müzso: Thanks for your comments. I agree that a larger block size might be faster.

Feel free to write your implementation, and to measure the speed difference.

I guess you mean FileReader instead of InputStreamReader -- never mind, it doesn't make much difference.

I'd never use a *Reader for reading the entire file, because there might be an UTF-8 multibyte sequence at the buffer boundary. Detecting and cutting that would make buffering inefficient.

I'd never use a StringBuilder for reading the entire file, because it involves copying the data unnecessarily.

I doubt that there is a code faster than in my blog post to read an entire file. If you can contradict that, please give an example.