flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: Unable to dump data into the hdfs
Date Tue, 12 Jun 2012 18:58:17 GMT
Thank you so much Eric for pointing out the difference between -F and
-f...I have not tuned flush/rotation configuration..Also, two files
are getting generated every time I start the agent.Is it normal??Also
I would like to ask you if there is any link where I can find info on
agent configuration(specially for hbase-sink).Many thanks.

Regards,
    Mohammad Tariq


On Wed, Jun 13, 2012 at 12:13 AM, Eric Sammer <esammer@cloudera.com> wrote:
> Mohammad:
>
> There's a few reasons why this could be.
>
> On Tue, Jun 12, 2012 at 10:51 AM, Mohammad Tariq <dontariq@gmail.com> wrote:
>>
>> Hello list,
>>
>>    I am trying to collect apache web server logs and put them into
>> the hdfs, but I am not able to do it properly..only first few rows
>> from the log file are going into the hdfs..my conf file looks like
>> this -
>>
>> agent1.sources = tail
>> agent1.channels = MemoryChannel-2
>> agent1.sinks = HDFS
>>
>> agent1.sources.tail.type = exec
>> agent1.sources.tail.command = tail -f /var/log/apache2/access.log.1
>
>
> You probably want to use tail -F rather than tail -f. The former will follow
> file truncation where as the latter will not. Also, I'm not familiar with
> how your apache logs are being written, but access.log.1 is usually a
> rotated out (i.e. non-changing) file. Do you mean to tail access.log
> instead?
>
>> agent1.sources.tail.channels = MemoryChannel-2
>>
>> agent1.sinks.HDFS.channel = MemoryChannel-2
>> agent1.sinks.HDFS.type = hdfs
>> agent1.sinks.HDFS.hdfs.path = hdfs://localhost:9000/flume
>> agent1.sinks.HDFS.hdfs.file.Type = DataStream
>
>
> The frequency with which you flush the open file handle in HDFS can effect
> the rate that data "appears" in HDFS. If you never flush or rotate, data
> appears in HDFS block sized increments (e.g. with a block size of 128MB,
> data appears in chunks of 128MB as blocks are completed). Presumably, data
> is arriving in significant quantity to avoid this problem (or you've tuned
> the flush / rotation configuration appropriately).
>
>>
>> agent1.channels.MemoryChannel-2.type = memory
>>
>> Regards,
>>     Mohammad Tariq
>
>
>
>
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com

Mime
View raw message