hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maha <m...@umail.ucsb.edu>
Subject Re: Reading Records from a Sequence File
Date Sun, 03 Apr 2011 01:19:15 GMT
Hi Harsh,

   My job is for a Similarity Search application. But, my aim for now is to measure the IO
overhead if my mapper.map() opened a sequence file and started to read it record by record
with:

 SequenceFile.Reader.next(key,value);

   I want to make sure that "next" here is IO efficient. Otherwise, I will need to write it
myself to be block read then parsed in my program using the "sync" hints.


  So, what you meant in another words is that the reader will buffer couple of records (the
ones between two sync(s)) into memory then use   "next" to read from memory .. right? if yes,
what parameter is used for the buffer size?

Thank you,
Maha



On Mar 31, 2011, at 11:59 PM, Harsh J wrote:

> On Fri, Apr 1, 2011 at 9:00 AM, maha <maha@umail.ucsb.edu> wrote:
>> Hello Everyone,
>> 
>>        As far as I know, when my java program opens a sequence file for a map calculations,
from hdfs. Using SequenceFile.Reader(key,value) will actually read the file in dfs.block.size
then grabes record-by-record from memory.
>> 
>>  Is that right?
> 
> The dfs.block.size part is partially right when applied in MapReduce
> (actually, it would look for sync points for read start and read end).
> And no, the reader does not load the entire data in the memory in
> one-go. It buffers and reads off the stream just like any other
> reader.
> 
> Could we have some more information on what your java program does,
> and what exactly you are measuring? :)
> 
> -- 
> Harsh J
> http://harshj.com


Mime
View raw message