hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Larsen <dave.lar...@connexity.com>
Subject Is it valid to call SequenceFile.Reader `sync` after calling `next`?
Date Wed, 11 Dec 2013 18:25:57 GMT
Reading Tom White's excellent book I see that you can find a record 
boundary in a SequenceFile with the `sync` method.

What'd I'd really like to do is read the first record of the file and 
then sync forward into another part of the file.  Going even further, 
I'd like to sync multiple times in a large file, reading along the way.

Depending on how the SequenceFile is written and its size, this 
sometimes works.  If anyone's interested, I can describe what I've found 
so far, but my initial question is high level.  What I want to understand is

A) Is `next` then `sync` a valid use case?
B) When working with a block-compressed Seq file, will the sync be much 
more efficient than just paging through results on the client?

Here's the link to SO in case anyone wants fake internet points:

Kind Regards,
David Larsen

View raw message