hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang Xiaoliang <yangxiaoliang2...@gmail.com>
Subject Re: hadoop input buffer size
Date Wed, 05 Oct 2011 10:56:15 GMT

Hadoop neither read one line each time, nor fetching dfs.block.size of lines
into a buffer,
Actually, for the TextInputFormat, it read io.file.buffer.size bytes of text
into a buffer each time,
this can be seen from the hadoop source file LineReader.java

2011/10/5 Mark question <markq2011@gmail.com>

> Hello,
>  Correct me if I'm wrong, but when a program opens n-files at the same time
> to read from, and start reading from each file at a time 1 line at a time.
> Isn't hadoop actually fetching dfs.block.size of lines into a buffer? and
> not actually one line.
>  If this is correct, I set up my dfs.block.size = 3MB and each line takes
> about 650 bytes only, then I would assume the performance for reading
> 1-4000
> lines would be the same, but it isn't !  Do you know a way to find #n of
> lines to be read at once?
> Thank you,
> Mark

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message