hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Neofotistos <jim.neofotis...@oracle.com>
Subject RE: Doubts on compressed file
Date Wed, 07 Nov 2012 14:00:06 GMT
Gzip is decently fast, but cannot take advantage of Hadoop's natural map splits because it's
impossible to start decompressing a gzip stream starting at a random offset in the file.


LZO is a wonderful compression scheme to use with Hadoop because it's incredibly fast, and
(with a bit of work) it's splittable LZO's block format makes it possible to start decompressing
at certain specific offsets of the file -- those that start new LZO block boundaries.




James Neofotistos 

Senior Sales Consultant

Emerging Markets East

Phone: 1-781-565-1890| Mobile: 1-603-759-7889



HYPERLINK "http://www.oracle.com/"http://www.oracleimg.com/us/assets/oralogo-small.gif

Software, Hardware, Complete.






From: Ramasubramanian Narayanan [mailto:ramasubramanian.narayanan@gmail.com] 
Sent: Wednesday, November 07, 2012 7:23 AM
To: user@hadoop.apache.org
Subject: Doubts on compressed file




If a zip file(Gzip) is loaded into HDFS will it get splitted into Blocks and store in HDFS?


I understand that a single mapper can work with GZip as it reads the entire file from beginning
to end... In that case if the GZip file size is larget than 128 MB will it get splitted into
blocks and stored in HDFS?




View raw message