hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj Vishwanathan <rajv...@yahoo.com>
Subject Re: Can I write to an compressed file which is located in hdfs?
Date Tue, 07 Feb 2012 18:06:31 GMT
Hi

Here is a piece of code that does the reverse of what you want; it takes a bunch of compressed
files ( gzip, in this case ) and converts them to text.

You can tweak the code to do the reverse

http://pastebin.com/mBHVHtrm 



Raj





>________________________________
> From: Xiaobin She <xiaobinshe@gmail.com>
>To: bejoy.hadoop@gmail.com 
>Cc: common-user@hadoop.apache.org; David Sinclair <dsinclair@chariotsolutions.com>

>Sent: Tuesday, February 7, 2012 1:11 AM
>Subject: Re: Can I write to an compressed file which is located in hdfs?
> 
>thank you Bejoy, I will look at that book.
>
>Thanks again!
>
>
>
>2012/2/7 <bejoy.hadoop@gmail.com>
>
>> **
>> Hi
>> AFAIK I don't think it is possible to append into a compressed file.
>>
>> If you have files in hdfs on a dir and you need to compress the same (like
>> files for an hour) you can use MapReduce to do that by setting
>> mapred.output.compress = true and
>> mapred.output.compression.codec='theCodecYouPrefer'
>> You'd get the blocks compressed in the output dir.
>>
>> You can use the API to read from standard input like
>> -get hadoop conf
>> -register the required compression codec
>> -write to CompressionOutputStream.
>>
>> You should get a well detailed explanation on the same from the book
>> 'Hadoop - The definitive guide' by Tom White.
>> Regards
>> Bejoy K S
>>
>> From handheld, Please excuse typos.
>> ------------------------------
>> *From: * Xiaobin She <xiaobinshe@gmail.com>
>> *Date: *Tue, 7 Feb 2012 14:24:01 +0800
>> *To: *<common-user@hadoop.apache.org>; <bejoy.hadoop@gmail.com>; David
>> Sinclair<dsinclair@chariotsolutions.com>
>> *Subject: *Re: Can I write to an compressed file which is located in hdfs?
>>
>> hi Bejoy and David,
>>
>> thank you for you help.
>>
>> So I can't directly write logs or append logs into an compressed file in
>> hdfs, right?
>>
>> Can I compress an file which is already in hdfs and has not been
>> compressed?
>>
>> If I can , how can I do that?
>>
>> Thanks!
>>
>>
>>
>> 2012/2/6 <bejoy.hadoop@gmail.com>
>>
>>> Hi
>>>       I agree with David on the point, you can achieve step 1 of my
>>> previous response with flume. ie load real time inflow of data in
>>> compressed format into hdfs. You can specify a time interval or data size
>>> in flume collector that determines when to flush data on to hdfs.
>>>
>>> Regards
>>> Bejoy K S
>>>
>>> From handheld, Please excuse typos.
>>>
>>> -----Original Message-----
>>> From: David Sinclair <dsinclair@chariotsolutions.com>
>>> Date: Mon, 6 Feb 2012 09:06:00
>>> To: <common-user@hadoop.apache.org>
>>> Cc: <bejoy.hadoop@gmail.com>
>>> Subject: Re: Can I write to an compressed file which is located in hdfs?
>>>
>>> Hi,
>>>
>>> You may want to have a look at the Flume project from Cloudera. I use it
>>> for writing data into HDFS.
>>>
>>> https://ccp.cloudera.com/display/SUPPORT/Downloads
>>>
>>> dave
>>>
>>> 2012/2/6 Xiaobin She <xiaobinshe@gmail.com>
>>>
>>> > hi Bejoy ,
>>> >
>>> > thank you for your reply.
>>> >
>>> > actually I have set up an test cluster which has one namenode/jobtracker
>>> > and two datanode/tasktracker, and I have make an test on this cluster.
>>> >
>>> > I fetch the log file of one of our modules from the log collector
>>> machines
>>> > by rsync, and then I use hive command line tool to load this log file
>>> into
>>> > the hive warehouse which  simply copy the file from the local
>>> filesystem to
>>> > hdfs.
>>> >
>>> > And I have run some analysis on these data with hive, all this run well.
>>> >
>>> > But now I want to avoid the fetch section which use rsync, and write the
>>> > logs into hdfs files directly from the servers which generate these
>>> logs.
>>> >
>>> > And it seems easy to do this job if the file locate in the hdfs is not
>>> > compressed.
>>> >
>>> > But how to write or append logs to an file that is compressed and
>>> located
>>> > in hdfs?
>>> >
>>> > Is this possible?
>>> >
>>> > Or is this an bad practice?
>>> >
>>> > Thanks!
>>> >
>>> >
>>> >
>>> > 2012/2/6 <bejoy.hadoop@gmail.com>
>>> >
>>> > > Hi
>>> > >     If you have log files enough to become at least one block size
in
>>> an
>>> > > hour. You can go ahead as
>>> > > - run a scheduled job every hour that compresses the log files for
>>> that
>>> > > hour and stores them on to hdfs (can use LZO or even Snappy to
>>> compress)
>>> > > - if your hive does more frequent analysis on this data store it as
>>> > > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a
>>> > > directory - sub dir structure. Once data is in hdfs issue a Alter
>>> Table
>>> > Add
>>> > > Partition statement on corresponding hive table.
>>> > > -in Hive DDL use the appropriate Input format (Hive has some ApacheLog
>>> > > Input Format already)
>>> > >
>>> > >
>>> > > Regards
>>> > > Bejoy K S
>>> > >
>>> > > From handheld, Please excuse typos.
>>> > >
>>> > > -----Original Message-----
>>> > > From: Xiaobin She <xiaobinshe@gmail.com>
>>> > > Date: Mon, 6 Feb 2012 16:41:50
>>> > > To: <common-user@hadoop.apache.org>; 佘晓彬<xiaobinshe@gmail.com>
>>> > > Reply-To: common-user@hadoop.apache.org
>>> > > Subject: Re: Can I write to an compressed file which is located in
>>> hdfs?
>>> > >
>>> > > sorry, this sentence is wrong,
>>> > >
>>> > > I can't compress these logs every hour and them put them into hdfs.
>>> > >
>>> > > it should be
>>> > >
>>> > > I can  compress these logs every hour and them put them into hdfs.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > 2012/2/6 Xiaobin She <xiaobinshe@gmail.com>
>>> > >
>>> > > >
>>> > > > hi all,
>>> > > >
>>> > > > I'm testing hadoop and hive, and I want to use them in log analysis.
>>> > > >
>>> > > > Here I have a question, can I write/append log to  an compressed
>>> file
>>> > > > which is located in hdfs?
>>> > > >
>>> > > > Our system generate lots of log files every day, I can't compress
>>> these
>>> > > > logs every hour and them put them into hdfs.
>>> > > >
>>> > > > But what if I want to write logs into files that was already in
the
>>> > hdfs
>>> > > > and was compressed?
>>> > > >
>>> > > > Is these files were not compressed, then this job seems easy,
but
>>> how
>>> > to
>>> > > > write or append logs into an compressed log?
>>> > > >
>>> > > > Can I do that?
>>> > > >
>>> > > > Can anyone give me some advices or give me some examples?
>>> > > >
>>> > > > Thank you very much!
>>> > > >
>>> > > > xiaobin
>>> > > >
>>> > >
>>> > >
>>> >
>>>
>>>
>>
>
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message