hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chase Bradford <chase.bradf...@gmail.com>
Subject Re: hadoop fs -put vs writing text files to hadoop as sequence files
Date Thu, 17 Feb 2011 02:33:13 GMT
We use sequence files for storing text data, and you definitely notice the cost of compressing
client side while streaming to hdfs.  if I remember correctly, it took about 10x.  That drove
us to using writer treads that fed off a single input stream a few thousand lines at a time,
and wrote to a hdfs directory with the desired name.

On Feb 16, 2011, at 4:24 PM, Mapred Learn <mapred.learn@gmail.com> wrote:

> Hi,
> I have to upload some terabytes of data that is text files.
> What would be good option to do so:
> i) using hadoop fs -put to copy text files directly on hdfs.
> ii) copying text files as sequence files on hdfs ? What would be extra time in this case
as opposed to (i).
> Thanks,
> Jimmy

View raw message