hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sugandha Naolekar <sugandha....@gmail.com>
Subject Re: :!
Date Mon, 03 Aug 2009 07:01:53 GMT
dats fine. But, if I place the data in HDFS and then run map reduce code to
provide compression, then the data will get compressed in sequence files
but, even the original data will reside in the memory;thereby leading or
causing a kind of redundancy of data...

Can u pls suggest me a way out?/

On Mon, Aug 3, 2009 at 12:07 PM, prashant ullegaddi <
prashullegaddi@gmail.com> wrote:

> I don't think you will be able to compress some data unless it's on HDFS.
> What you can do is
> 1. Manually compress the data on the machine where the data resides. Then,
> copy the same to
>  HDFS. or
> 2. Copy the data without compressing to HDFS, then run a job which just
> emits the data as it reads
>  in key/value pair. You can set
> FileOutputFormat.setOutputCompressorClass(job,GzipCodec.class) so
>  that output gets gzipped.
>
> Does that solve your problem?
>
> btw you didn't exactly specify your data size (how many TBs).
>
> On Mon, Aug 3, 2009 at 11:02 AM, Sugandha Naolekar
> <sugandha.n87@gmail.com>wrote:
>
> > Yes, You are right. Here goes the details related::
> >
> > -> I have a Hadoop cluster of 7 nodes. Now there is this 8th machine,
> which
> > is not a part of the hadoop cluster.
> > -> I want to place the data of that machine into the HDFS. Thus, before
> > placing it in HDFS, I want to compress it, and then dump in the HDFS.
> > -> I have 4 datanodes in my cluster. also, data might get extended upto
> > tera
> > bytes.
> > -> Also, i have set thr replication factor as 2.
> > -> I guess, for compression, I will have to run map reduce...?
> > right..please
> > tel me the complete approach that is needed to be followed.
> >
> > On Mon, Aug 3, 2009 at 10:48 AM, prashant ullegaddi <
> > prashullegaddi@gmail.com> wrote:
> >
> > > By "I want to compress the data first and then place it in HDFS", do
> you
> > > mean you want to compress the data
> > > locally and then copy to DFS?
> > >
> > > What's the size of your data? What's the capacity of HDFS?
> > >
> > > On Mon, Aug 3, 2009 at 10:45 AM, Sugandha Naolekar
> > > <sugandha.n87@gmail.com>wrote:
> > >
> > > > I want to compress the data first and then place it in HDFS. Again,
> > while
> > > > retrieving the same, I want to uncompress it and place on the desired
> > > > destination. Can this be possible. How to get started? Also, I want
> to
> > > get
> > > > started with actual coding part of compression and MAP reduce. PLease
> > > > suggest me aptly...!
> > > >
> > > >
> > > >
> > > > --
> > > > Regards!
> > > > Sugandha
> > > >
> > >
> >
> >
> >
> > --
> > Regards!
> > Sugandha
> >
>



-- 
Regards!
Sugandha

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message