hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: FSDataInputStream.read(byte[]) only reads to a block boundary?
Date Sun, 28 Jun 2009 18:16:05 GMT

This seems to be the case. I don't think there is any specific reason 
not to read across the block boundary...

Even if HDFS does read across the blocks, it is still not a good idea to 
ignore the JavaDoc for read(). If you want all the bytes read, then you 
should have a while loop or one of the readFully() variants. For e.g. if 
you later change your code by wrapping a BufferedInputStream around 
'in', you would still get partial reads even if HDFS reads all the data.

Raghu.

forbbs forbbs wrote:
> The hadoop version is 0.19.0.
> My file is larger than 64MB, and the block size is 64MB.
> 
> The output of the code below is '10'. May I read across the block
> boundary?  Or I should use 'while (left..){}' style code?
> 
>  public static void main(String[] args) throws IOException
>   {
>     Configuration conf = new Configuration();
>     FileSystem fs = FileSystem.get(conf);
>     FSDataInputStream fin = fs.open(new Path(args[0]));
> 
>     fin.seek(64*1024*1024 - 10);
>     byte[] buffer = new byte[32*1024];
>     int len = fin.read(buffer);
>     //int len = fin.read(buffer, 0, 128);
>     System.out.println(len);
> 
>     fin.close();
>   }


Mime
View raw message