hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: Reading Records from a Sequence File
Date Sun, 03 Apr 2011 06:02:19 GMT

On Sun, Apr 3, 2011 at 6:49 AM, maha <maha@umail.ucsb.edu> wrote:
> Hi Harsh,
>   My job is for a Similarity Search application. But, my aim for now is to measure the
IO overhead if my mapper.map() opened a sequence file and started to read it record by record
>  SequenceFile.Reader.next(key,value);
>   I want to make sure that "next" here is IO efficient. Otherwise, I will need to write
it myself to be block read then parsed in my program using the "sync" hints.

You can have a look at SequenceFile.Reader class's source code perhaps
- it should clear out all doubts you're having?

> what parameter is used for the buffer size?

Records are not loaded into the memory. Records are read using
key/value size informations off the buffered input stream.

You can specify a buffer size while constructing a Reader object for
SequenceFiles, or the "io.file.buffer.size" value is used as a

Harsh J

View raw message