hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prashant ullegaddi <prashullega...@gmail.com>
Subject Re: :!
Date Mon, 03 Aug 2009 06:37:25 GMT
I don't think you will be able to compress some data unless it's on HDFS.
What you can do is
1. Manually compress the data on the machine where the data resides. Then,
copy the same to
 HDFS. or
2. Copy the data without compressing to HDFS, then run a job which just
emits the data as it reads
 in key/value pair. You can set
FileOutputFormat.setOutputCompressorClass(job,GzipCodec.class) so
 that output gets gzipped.

Does that solve your problem?

btw you didn't exactly specify your data size (how many TBs).

On Mon, Aug 3, 2009 at 11:02 AM, Sugandha Naolekar
<sugandha.n87@gmail.com>wrote:

> Yes, You are right. Here goes the details related::
>
> -> I have a Hadoop cluster of 7 nodes. Now there is this 8th machine, which
> is not a part of the hadoop cluster.
> -> I want to place the data of that machine into the HDFS. Thus, before
> placing it in HDFS, I want to compress it, and then dump in the HDFS.
> -> I have 4 datanodes in my cluster. also, data might get extended upto
> tera
> bytes.
> -> Also, i have set thr replication factor as 2.
> -> I guess, for compression, I will have to run map reduce...?
> right..please
> tel me the complete approach that is needed to be followed.
>
> On Mon, Aug 3, 2009 at 10:48 AM, prashant ullegaddi <
> prashullegaddi@gmail.com> wrote:
>
> > By "I want to compress the data first and then place it in HDFS", do you
> > mean you want to compress the data
> > locally and then copy to DFS?
> >
> > What's the size of your data? What's the capacity of HDFS?
> >
> > On Mon, Aug 3, 2009 at 10:45 AM, Sugandha Naolekar
> > <sugandha.n87@gmail.com>wrote:
> >
> > > I want to compress the data first and then place it in HDFS. Again,
> while
> > > retrieving the same, I want to uncompress it and place on the desired
> > > destination. Can this be possible. How to get started? Also, I want to
> > get
> > > started with actual coding part of compression and MAP reduce. PLease
> > > suggest me aptly...!
> > >
> > >
> > >
> > > --
> > > Regards!
> > > Sugandha
> > >
> >
>
>
>
> --
> Regards!
> Sugandha
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message