hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: splittable vs seekable compressed formats
Date Fri, 24 May 2013 19:50:07 GMT
More specifically, seeking to a known location in the uncompressed data.  So not just seeking
to “the nearest record boundary”, but seeking to “position 100000000 in the uncompressed
data”.  I can see that if the writer kept track of this information on the side it would
be available; my question is more about the standard formats (e.g. LZO compression in SequenceFile)
supporting this without additional work.

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
Sent: Friday, May 24, 2013 1:00 AM
To: user@hadoop.apache.org
Subject: Re: splittable vs seekable compressed formats

Yeah , I think John meant seeking to record boundaries.

On Fri, May 24, 2013 at 12:22 PM, Harsh J <harsh@cloudera.com<mailto:harsh@cloudera.com>>
SequenceFiles should be seekable provided you know/manage their sync
points during writes I think. With LZO this may be non-trivial.

On Thu, May 23, 2013 at 11:01 PM, John Lilley <john.lilley@redpoint.net<mailto:john.lilley@redpoint.net>>
> I’ve read about splittable compressed formats in Hadoop.  Are any of these
> formats also “seekable” (in other words, be able to seek to an absolute
> location in the uncompressed data).
> John

Harsh J

View raw message