flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhaskar V. Karambelkar" <bhaska...@gmail.com>
Subject Re: Question about gzip compression when using Flume Ng
Date Tue, 15 Jan 2013 01:25:11 GMT
Sagar,
You're better of downloading and unzipping CDH3u5 or CDH4 some where, and
pointing the HADOOP_HOME env. variable to the base directory.
That way you won't have to worry about which jar files are needed and which
not.
Flume will auto add all JARs from the Hadoop Installation that it needs.

regards
Bhaskar

On Mon, Jan 14, 2013 at 7:43 PM, Sagar Mehta <sagarmehta@gmail.com> wrote:

> ok so I dropped in the new hadoop-core jar in /opt/flume/lib [I got some
> errors about the guava dependencies so put in that jar too]
>
> smehta@collector102:/opt/flume/lib$ ls -ltrh | grep -e "hadoop-core" -e
> "guava"
> -rw-r--r-- 1 hadoop hadoop 1.5M 2012-11-14 21:49 guava-10.0.1.jar
> -rw-r--r-- 1 hadoop hadoop 3.7M 2013-01-14 23:50
> hadoop-core-0.20.2-cdh3u5.jar
>
> Now I don't event see the file being created in hdfs and the flume log is
> happily talking about housekeeping for some file channel checkpoints,
> updating pointers et al
>
> Below is tail of flume log
>
> *hadoop@collector102:/data/flume_log$ tail -10 flume.log*
> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO
>  org.apache.flume.channel.file.Log - Updated checkpoint for file:
> /data/flume_data/channel2/data/log-36 position: 129415524 logWriteOrderID:
> 1358209947324
> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO
>  org.apache.flume.channel.file.LogFile - Closing RandomReader
> /data/flume_data/channel2/data/log-34
> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO
>  org.apache.flume.channel.file.Log - Updated checkpoint for file:
> /data/flume_data/channel1/data/log-36 position: 129415524 logWriteOrderID:
> 1358209947323
> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO
>  org.apache.flume.channel.file.LogFile - Closing RandomReader
> /data/flume_data/channel1/data/log-34
> 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel2] INFO
>  org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta
> currentPosition = 18577138, logWriteOrderID = 1358209947324
> 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel1] INFO
>  org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta
> currentPosition = 18577138, logWriteOrderID = 1358209947323
> 2013-01-15 00:42:10,820 [Log-BackgroundWorker-channel1] INFO
>  org.apache.flume.channel.file.LogFile - Closing RandomReader
> /data/flume_data/channel1/data/log-35
> 2013-01-15 00:42:10,821 [Log-BackgroundWorker-channel2] INFO
>  org.apache.flume.channel.file.LogFile - Closing RandomReader
> /data/flume_data/channel2/data/log-35
> 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel1] INFO
>  org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta
> currentPosition = 217919486, logWriteOrderID = 1358209947323
> 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel2] INFO
>  org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta
> currentPosition = 217919486, logWriteOrderID = 1358209947324
>
> Sagar
>
>
> On Mon, Jan 14, 2013 at 3:38 PM, Brock Noland <brock@cloudera.com> wrote:
>
>> Hmm, could you try and updated version of Hadoop? CDH3u2 is quite old,
>> I would upgrade to CDH3u5 or CDH 4.1.2.
>>
>> On Mon, Jan 14, 2013 at 3:27 PM, Sagar Mehta <sagarmehta@gmail.com>
>> wrote:
>> > About the bz2 suggestion, we have a ton of downstream jobs that assume
>> gzip
>> > compressed files - so it is better to stick to gzip.
>> >
>> > The plan B for us is to have a Oozie step to gzip compress the logs
>> before
>> > proceeding with downstream Hadoop jobs - but that looks like a hack to
>> me!!
>> >
>> > Sagar
>> >
>> >
>> > On Mon, Jan 14, 2013 at 3:24 PM, Sagar Mehta <sagarmehta@gmail.com>
>> wrote:
>> >>
>> >> hadoop@jobtracker301:/home/hadoop/sagar/debug$ zcat
>> >> collector102.ngpipes.sac.ngmoco.com.1358204406896.gz | wc -l
>> >>
>> >> gzip: collector102.ngpipes.sac.ngmoco.com.1358204406896.gz:
>> decompression
>> >> OK, trailing garbage ignored
>> >> 100
>> >>
>> >> This should be about 50,000 events for the 5 min window!!
>> >>
>> >> Sagar
>> >>
>> >> On Mon, Jan 14, 2013 at 3:16 PM, Brock Noland <brock@cloudera.com>
>> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> Can you try:  zcat file > output
>> >>>
>> >>> I think what is occurring is because of the flush the output file is
>> >>> actually several concatenated gz files.
>> >>>
>> >>> Brock
>> >>>
>> >>> On Mon, Jan 14, 2013 at 3:12 PM, Sagar Mehta <sagarmehta@gmail.com>
>> >>> wrote:
>> >>> > Yeah I have tried the text write format in vain before, but
>> >>> > nevertheless
>> >>> > gave it a try again!! Below is the latest file - still the same
>> thing.
>> >>> >
>> >>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ date
>> >>> > Mon Jan 14 23:02:07 UTC 2013
>> >>> >
>> >>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hls
>> >>> >
>> >>> >
>> /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz
>> >>> > Found 1 items
>> >>> > -rw-r--r--   3 hadoop supergroup    4798117 2013-01-14 22:55
>> >>> >
>> >>> >
>> /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz
>> >>> >
>> >>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hget
>> >>> >
>> >>> >
>> /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz
>> >>> > .
>> >>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ gunzip
>> >>> > collector102.ngpipes.sac.ngmoco.com.1358204141600.gz
>> >>> >
>> >>> > gzip: collector102.ngpipes.sac.ngmoco.com.1358204141600.gz:
>> >>> > decompression
>> >>> > OK, trailing garbage ignored
>> >>> >
>> >>> > Interestingly enough, the gzip page says it is a harmless warning
-
>> >>> > http://www.gzip.org/#faq8
>> >>> >
>> >>> > However, I'm losing events on decompression so I cannot afford
to
>> >>> > ignore
>> >>> > this warning. The gzip page gives an example about magnetic tape
-
>> >>> > there is
>> >>> > an analogy of hdfs block here since the file is initially stored
in
>> >>> > hdfs
>> >>> > before I pull it out on the local filesystem.
>> >>> >
>> >>> > Sagar
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Mon, Jan 14, 2013 at 2:52 PM, Connor Woodson
>> >>> > <cwoodson.dev@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> collector102.sinks.sink1.hdfs.writeFormat = TEXT
>> >>> >> collector102.sinks.sink2.hdfs.writeFormat = TEXT
>> >>> >
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Apache MRUnit - Unit testing MapReduce -
>> >>> http://incubator.apache.org/mrunit/
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce -
>> http://incubator.apache.org/mrunit/
>>
>
>

Mime
View raw message