hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Brown <a...@hortonworks.com>
Subject Re: Can I write to an compressed file which is located in hdfs?
Date Tue, 07 Feb 2012 17:54:05 GMT
Hi Xiobin,

what build of hadoop are you using, and what type of compression is being
used?

thanks,

2012/2/7 Xiaobin She <xiaobinshe@gmail.com>

> thank you Bejoy, I will look at that book.
>
> Thanks again!
>
>
>
> 2012/2/7 <bejoy.hadoop@gmail.com>
>
> > **
> > Hi
> > AFAIK I don't think it is possible to append into a compressed file.
> >
> > If you have files in hdfs on a dir and you need to compress the same
> (like
> > files for an hour) you can use MapReduce to do that by setting
> > mapred.output.compress = true and
> > mapred.output.compression.codec='theCodecYouPrefer'
> > You'd get the blocks compressed in the output dir.
> >
> > You can use the API to read from standard input like
> > -get hadoop conf
> > -register the required compression codec
> > -write to CompressionOutputStream.
> >
> > You should get a well detailed explanation on the same from the book
> > 'Hadoop - The definitive guide' by Tom White.
> > Regards
> > Bejoy K S
> >
> > From handheld, Please excuse typos.
> > ------------------------------
> > *From: * Xiaobin She <xiaobinshe@gmail.com>
> > *Date: *Tue, 7 Feb 2012 14:24:01 +0800
> > *To: *<common-user@hadoop.apache.org>; <bejoy.hadoop@gmail.com>; David
> > Sinclair<dsinclair@chariotsolutions.com>
> > *Subject: *Re: Can I write to an compressed file which is located in
> hdfs?
> >
> > hi Bejoy and David,
> >
> > thank you for you help.
> >
> > So I can't directly write logs or append logs into an compressed file in
> > hdfs, right?
> >
> > Can I compress an file which is already in hdfs and has not been
> > compressed?
> >
> > If I can , how can I do that?
> >
> > Thanks!
> >
> >
> >
> > 2012/2/6 <bejoy.hadoop@gmail.com>
> >
> >> Hi
> >>       I agree with David on the point, you can achieve step 1 of my
> >> previous response with flume. ie load real time inflow of data in
> >> compressed format into hdfs. You can specify a time interval or data
> size
> >> in flume collector that determines when to flush data on to hdfs.
> >>
> >> Regards
> >> Bejoy K S
> >>
> >> From handheld, Please excuse typos.
> >>
> >> -----Original Message-----
> >> From: David Sinclair <dsinclair@chariotsolutions.com>
> >> Date: Mon, 6 Feb 2012 09:06:00
> >> To: <common-user@hadoop.apache.org>
> >> Cc: <bejoy.hadoop@gmail.com>
> >> Subject: Re: Can I write to an compressed file which is located in hdfs?
> >>
> >> Hi,
> >>
> >> You may want to have a look at the Flume project from Cloudera. I use it
> >> for writing data into HDFS.
> >>
> >> https://ccp.cloudera.com/display/SUPPORT/Downloads
> >>
> >> dave
> >>
> >> 2012/2/6 Xiaobin She <xiaobinshe@gmail.com>
> >>
> >> > hi Bejoy ,
> >> >
> >> > thank you for your reply.
> >> >
> >> > actually I have set up an test cluster which has one
> namenode/jobtracker
> >> > and two datanode/tasktracker, and I have make an test on this cluster.
> >> >
> >> > I fetch the log file of one of our modules from the log collector
> >> machines
> >> > by rsync, and then I use hive command line tool to load this log file
> >> into
> >> > the hive warehouse which  simply copy the file from the local
> >> filesystem to
> >> > hdfs.
> >> >
> >> > And I have run some analysis on these data with hive, all this run
> well.
> >> >
> >> > But now I want to avoid the fetch section which use rsync, and write
> the
> >> > logs into hdfs files directly from the servers which generate these
> >> logs.
> >> >
> >> > And it seems easy to do this job if the file locate in the hdfs is not
> >> > compressed.
> >> >
> >> > But how to write or append logs to an file that is compressed and
> >> located
> >> > in hdfs?
> >> >
> >> > Is this possible?
> >> >
> >> > Or is this an bad practice?
> >> >
> >> > Thanks!
> >> >
> >> >
> >> >
> >> > 2012/2/6 <bejoy.hadoop@gmail.com>
> >> >
> >> > > Hi
> >> > >     If you have log files enough to become at least one block size
> in
> >> an
> >> > > hour. You can go ahead as
> >> > > - run a scheduled job every hour that compresses the log files for
> >> that
> >> > > hour and stores them on to hdfs (can use LZO or even Snappy to
> >> compress)
> >> > > - if your hive does more frequent analysis on this data store it as
> >> > > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a
> >> > > directory - sub dir structure. Once data is in hdfs issue a Alter
> >> Table
> >> > Add
> >> > > Partition statement on corresponding hive table.
> >> > > -in Hive DDL use the appropriate Input format (Hive has some
> ApacheLog
> >> > > Input Format already)
> >> > >
> >> > >
> >> > > Regards
> >> > > Bejoy K S
> >> > >
> >> > > From handheld, Please excuse typos.
> >> > >
> >> > > -----Original Message-----
> >> > > From: Xiaobin She <xiaobinshe@gmail.com>
> >> > > Date: Mon, 6 Feb 2012 16:41:50
> >> > > To: <common-user@hadoop.apache.org>; 佘晓彬<xiaobinshe@gmail.com>
> >> > > Reply-To: common-user@hadoop.apache.org
> >> > > Subject: Re: Can I write to an compressed file which is located in
> >> hdfs?
> >> > >
> >> > > sorry, this sentence is wrong,
> >> > >
> >> > > I can't compress these logs every hour and them put them into hdfs.
> >> > >
> >> > > it should be
> >> > >
> >> > > I can  compress these logs every hour and them put them into hdfs.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > 2012/2/6 Xiaobin She <xiaobinshe@gmail.com>
> >> > >
> >> > > >
> >> > > > hi all,
> >> > > >
> >> > > > I'm testing hadoop and hive, and I want to use them in log
> analysis.
> >> > > >
> >> > > > Here I have a question, can I write/append log to  an compressed
> >> file
> >> > > > which is located in hdfs?
> >> > > >
> >> > > > Our system generate lots of log files every day, I can't compress
> >> these
> >> > > > logs every hour and them put them into hdfs.
> >> > > >
> >> > > > But what if I want to write logs into files that was already
in
> the
> >> > hdfs
> >> > > > and was compressed?
> >> > > >
> >> > > > Is these files were not compressed, then this job seems easy,
but
> >> how
> >> > to
> >> > > > write or append logs into an compressed log?
> >> > > >
> >> > > > Can I do that?
> >> > > >
> >> > > > Can anyone give me some advices or give me some examples?
> >> > > >
> >> > > > Thank you very much!
> >> > > >
> >> > > > xiaobin
> >> > > >
> >> > >
> >> > >
> >> >
> >>
> >>
> >
>



-- 
Adam Brown
Enablement Engineer
Hortonworks
<http://www.hadoopsummit.org/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message