hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaobin She <xiaobin...@gmail.com>
Subject Re: Can I write to an compressed file which is located in hdfs?
Date Tue, 07 Feb 2012 09:11:17 GMT
thank you Bejoy, I will look at that book.

Thanks again!



2012/2/7 <bejoy.hadoop@gmail.com>

> **
> Hi
> AFAIK I don't think it is possible to append into a compressed file.
>
> If you have files in hdfs on a dir and you need to compress the same (like
> files for an hour) you can use MapReduce to do that by setting
> mapred.output.compress = true and
> mapred.output.compression.codec='theCodecYouPrefer'
> You'd get the blocks compressed in the output dir.
>
> You can use the API to read from standard input like
> -get hadoop conf
> -register the required compression codec
> -write to CompressionOutputStream.
>
> You should get a well detailed explanation on the same from the book
> 'Hadoop - The definitive guide' by Tom White.
> Regards
> Bejoy K S
>
> From handheld, Please excuse typos.
> ------------------------------
> *From: * Xiaobin She <xiaobinshe@gmail.com>
> *Date: *Tue, 7 Feb 2012 14:24:01 +0800
> *To: *<common-user@hadoop.apache.org>; <bejoy.hadoop@gmail.com>; David
> Sinclair<dsinclair@chariotsolutions.com>
> *Subject: *Re: Can I write to an compressed file which is located in hdfs?
>
> hi Bejoy and David,
>
> thank you for you help.
>
> So I can't directly write logs or append logs into an compressed file in
> hdfs, right?
>
> Can I compress an file which is already in hdfs and has not been
> compressed?
>
> If I can , how can I do that?
>
> Thanks!
>
>
>
> 2012/2/6 <bejoy.hadoop@gmail.com>
>
>> Hi
>>       I agree with David on the point, you can achieve step 1 of my
>> previous response with flume. ie load real time inflow of data in
>> compressed format into hdfs. You can specify a time interval or data size
>> in flume collector that determines when to flush data on to hdfs.
>>
>> Regards
>> Bejoy K S
>>
>> From handheld, Please excuse typos.
>>
>> -----Original Message-----
>> From: David Sinclair <dsinclair@chariotsolutions.com>
>> Date: Mon, 6 Feb 2012 09:06:00
>> To: <common-user@hadoop.apache.org>
>> Cc: <bejoy.hadoop@gmail.com>
>> Subject: Re: Can I write to an compressed file which is located in hdfs?
>>
>> Hi,
>>
>> You may want to have a look at the Flume project from Cloudera. I use it
>> for writing data into HDFS.
>>
>> https://ccp.cloudera.com/display/SUPPORT/Downloads
>>
>> dave
>>
>> 2012/2/6 Xiaobin She <xiaobinshe@gmail.com>
>>
>> > hi Bejoy ,
>> >
>> > thank you for your reply.
>> >
>> > actually I have set up an test cluster which has one namenode/jobtracker
>> > and two datanode/tasktracker, and I have make an test on this cluster.
>> >
>> > I fetch the log file of one of our modules from the log collector
>> machines
>> > by rsync, and then I use hive command line tool to load this log file
>> into
>> > the hive warehouse which  simply copy the file from the local
>> filesystem to
>> > hdfs.
>> >
>> > And I have run some analysis on these data with hive, all this run well.
>> >
>> > But now I want to avoid the fetch section which use rsync, and write the
>> > logs into hdfs files directly from the servers which generate these
>> logs.
>> >
>> > And it seems easy to do this job if the file locate in the hdfs is not
>> > compressed.
>> >
>> > But how to write or append logs to an file that is compressed and
>> located
>> > in hdfs?
>> >
>> > Is this possible?
>> >
>> > Or is this an bad practice?
>> >
>> > Thanks!
>> >
>> >
>> >
>> > 2012/2/6 <bejoy.hadoop@gmail.com>
>> >
>> > > Hi
>> > >     If you have log files enough to become at least one block size in
>> an
>> > > hour. You can go ahead as
>> > > - run a scheduled job every hour that compresses the log files for
>> that
>> > > hour and stores them on to hdfs (can use LZO or even Snappy to
>> compress)
>> > > - if your hive does more frequent analysis on this data store it as
>> > > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a
>> > > directory - sub dir structure. Once data is in hdfs issue a Alter
>> Table
>> > Add
>> > > Partition statement on corresponding hive table.
>> > > -in Hive DDL use the appropriate Input format (Hive has some ApacheLog
>> > > Input Format already)
>> > >
>> > >
>> > > Regards
>> > > Bejoy K S
>> > >
>> > > From handheld, Please excuse typos.
>> > >
>> > > -----Original Message-----
>> > > From: Xiaobin She <xiaobinshe@gmail.com>
>> > > Date: Mon, 6 Feb 2012 16:41:50
>> > > To: <common-user@hadoop.apache.org>; 佘晓彬<xiaobinshe@gmail.com>
>> > > Reply-To: common-user@hadoop.apache.org
>> > > Subject: Re: Can I write to an compressed file which is located in
>> hdfs?
>> > >
>> > > sorry, this sentence is wrong,
>> > >
>> > > I can't compress these logs every hour and them put them into hdfs.
>> > >
>> > > it should be
>> > >
>> > > I can  compress these logs every hour and them put them into hdfs.
>> > >
>> > >
>> > >
>> > >
>> > > 2012/2/6 Xiaobin She <xiaobinshe@gmail.com>
>> > >
>> > > >
>> > > > hi all,
>> > > >
>> > > > I'm testing hadoop and hive, and I want to use them in log analysis.
>> > > >
>> > > > Here I have a question, can I write/append log to  an compressed
>> file
>> > > > which is located in hdfs?
>> > > >
>> > > > Our system generate lots of log files every day, I can't compress
>> these
>> > > > logs every hour and them put them into hdfs.
>> > > >
>> > > > But what if I want to write logs into files that was already in the
>> > hdfs
>> > > > and was compressed?
>> > > >
>> > > > Is these files were not compressed, then this job seems easy, but
>> how
>> > to
>> > > > write or append logs into an compressed log?
>> > > >
>> > > > Can I do that?
>> > > >
>> > > > Can anyone give me some advices or give me some examples?
>> > > >
>> > > > Thank you very much!
>> > > >
>> > > > xiaobin
>> > > >
>> > >
>> > >
>> >
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message