hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johan Oskarsson <jo...@oskarsson.nu>
Subject Re: Splittable lzo files
Date Tue, 03 Mar 2009 08:40:21 GMT
We use it with python (dumbo) and streaming, so it should certainly be 
possible. I haven't tried it myself though, so can't give any pointers.


Miles Osborne wrote:
> that's very interesting.  for us poor souls using streaming, would we
> be able to use it?
> (right now i'm looking at a 100+ GB gzipped file ...)
> Miles
> 2009/3/3 Johan Oskarsson <johan@oskarsson.nu>:
>> Hi,
>> thought I'd pass on this blog post I just wrote about how we compress our
>> raw log data in Hadoop using Lzo at Last.fm.
>> The essence of the post is that we're able to make them splittable by
>> indexing where each compressed chunk starts in the file, similar to the gzip
>> input format being worked on.
>> This actually gives us a performance boost in certain jobs that read a lot
>> of data while saving us disk space at the same time.
>> http://blog.oskarsson.nu/2009/03/hadoop-feat-lzo-save-disk-space-and.html
>> /Johan

View raw message