hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré ...@nanthrax.net>
Subject Re: bug in SequenceFile.sync()?
Date Mon, 24 Jun 2013 09:25:53 GMT
Hi Christopher,

indeed, I think that the noBufferedKeys and valuesDecompressed should be 
reset.

Regards
JB

On 06/24/2013 11:20 AM, Christopher Ng wrote:
> cross-posting this from cdh-users group where it received little interest:
>
> is there a bug in SequenceFile.sync()?  This is from cdh4.3.0:
>
>      /** Seek to the next sync mark past a given position.*/
>      public synchronized void sync(long position) throws IOException {
>        if (position+SYNC_SIZE >= end) {
>          seek(end);
>          return;
>        }
>
>        if (position < headerEnd) {
>          // seek directly to first record
>          in.seek(headerEnd);                                         <====
> should this not call seek (ie this.seek) instead?
>          // note the sync marker "seen" in the header
>          syncSeen = true;
>          return;
>        }
>
> the problem is that when you sync to the start of a compressed file, the
> noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
> triggered.  When you subsequently call next() you're potentially getting
> keys from the buffer which still contains keys from the previous position
> of the file.
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Mime
View raw message