Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B5123DEFA for ; Wed, 12 Sep 2012 22:02:20 +0000 (UTC) Received: (qmail 53496 invoked by uid 500); 12 Sep 2012 22:02:20 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 53445 invoked by uid 500); 12 Sep 2012 22:02:20 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 53398 invoked by uid 99); 12 Sep 2012 22:02:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Sep 2012 22:02:20 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of brock@cloudera.com designates 209.85.220.179 as permitted sender) Received: from [209.85.220.179] (HELO mail-vc0-f179.google.com) (209.85.220.179) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Sep 2012 22:02:15 +0000 Received: by vcqp16 with SMTP id p16so2659427vcq.38 for ; Wed, 12 Sep 2012 15:01:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding:x-gm-message-state; bh=iy6A7wkXkRhc/tUAdnaX2LJTf6IcZUv8mZT7n0U7tsg=; b=DIsSWF8xqDoncwbsYm/Ij2hOtdEpxd6jXaWU9+bibUu8Aj2CGsrg9nEilKznSUGFS2 ay0vOVlCb+SOlDh27HaNJO9/OU1bBNzSLlXtztJ/wh/BiyY91oebSxS9tULoeCLdxAWF VPw9afbYoBI4X3aw8WOTF87vhoPuEuQWRyiWiOZ+GExn03APrQkXcpO9tC0tVnUVyCQl jdhef7eUmb/g5RWZqvoCmKo7h434btXuE/icAiJEp9D89KmJ7i8QC073rddQnb5uC/4v QBYswbfDz9VBllIJJ2K5+CKv00TtV1/RLnitqrtSOI13yodYl8YiEtWJj4mDD6cvNm8E iGtg== Received: by 10.58.0.7 with SMTP id 7mr2149089vea.18.1347487314254; Wed, 12 Sep 2012 15:01:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.114.137 with HTTP; Wed, 12 Sep 2012 15:01:34 -0700 (PDT) In-Reply-To: References: From: Brock Noland Date: Wed, 12 Sep 2012 17:01:34 -0500 Message-ID: Subject: Re: splitting functions To: user@flume.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQndl1WTWeL12w5MyJ3ahc/6sQ/u9skwmEEPbgq+ljKKGpEfeaZqGAqkcwDlKfCtRj22ubVU X-Virus-Checked: Checked by ClamAV on apache.org Nevermind, it doesn't look like FILE_ROLL supports batching.... On Wed, Sep 12, 2012 at 4:56 PM, Brock Noland wrote: > It looks like have a batch size of 1000 which could mean the sink is > waiting for a 1000 entries... > > node102.sinks.filesink1.batchSize =3D 1000 > > > > On Wed, Sep 12, 2012 at 3:12 PM, Cochran, David M (Contractor) > wrote: >> Putting a copy of hadoop-core.jar in the lib directory did the trick.. a= t least it made the errors go away.. >> >> Just trying to sort out why nothing is getting written to the sink's fil= es now... but when I add entries to the file being tailed nothing makes it = to the sink log file(s). guess I need to run tcpdump on that port and see i= f anything is being sent or if the problem is on the receive side now. >> >> Thanks for the help! >> Dave >> >> >> >> -----Original Message----- >> From: Brock Noland [mailto:brock@cloudera.com] >> Sent: Wed 9/12/2012 12:41 PM >> To: user@flume.apache.org >> Subject: Re: splitting functions >> >> Yeah that is my fault. FileChannel uses a few hadoop classes for >> serialization. I want to get rid of that but it's just not a priority >> item. You either need the hadoop command in your path or the >> hadoop-core.jar in your lib directory. >> >> On Wed, Sep 12, 2012 at 1:38 PM, Cochran, David M (Contractor) >> wrote: >>> Brock, >>> >>> Thanks for the sample! Starting to see a bit more light and making a l= ittle more sense now... >>> >>> If you wouldn't mind and have a couple mins to spare...I'm getting this= error and not sure how to make it go away.. I can not use hadoop for stora= ge instead just FILE_ROLL (ultimately the logs will need to be processed fu= rther in plain text) I'm just not sure why.... >>> >>> The error follows and my conf further down. >>> >>> 12 Sep 2012 13:18:54,120 INFO [lifecycleSupervisor-1-0] (org.apache.fl= ume.channel.file.FileChannel.start:211) - Starting FileChannel fileChannel= { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] }... >>> 12 Sep 2012 13:18:54,124 ERROR [lifecycleSupervisor-1-0] (org.apache.fl= ume.channel.file.FileChannel.start:234) - Failed to start the file channel= [channel=3DfileChannel] >>> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable >>> at java.lang.ClassLoader.defineClass1(Native Method) >>> at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) >>> at java.lang.ClassLoader.defineClass(ClassLoader.java:615) >>> at java.security.SecureClassLoader.defineClass(SecureClassLoade= r.java:141) >>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) >>> at java.net.URLClassLoader.access$000(URLClassLoader.java:58) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:197) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301= ) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:247) >>> at org.apache.flume.channel.file.Log$Builder.build(Log.java:144= ) >>> at org.apache.flume.channel.file.FileChannel.start(FileChannel.= java:223) >>> at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnab= le.run(LifecycleSupervisor.java:236) >>> at java.util.concurrent.Executors$RunnableAdapter.call(Executor= s.java:441) >>> at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Future= Task.java:317) >>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:= 150) >>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu= tureTask.access$101(ScheduledThreadPoolExecutor.java:98) >>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu= tureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) >>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu= tureTask.run(ScheduledThreadPoolExecutor.java:204) >>> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Threa= dPoolExecutor.java:886) >>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo= lExecutor.java:908) >>> at java.lang.Thread.run(Thread.java:662) >>> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writa= ble >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:202) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301= ) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:247) >>> ... 24 more >>> 12 Sep 2012 13:18:54,126 ERROR [lifecycleSupervisor-1-0] (org.apache.fl= ume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:238) - Unable to sta= rt FileChannel fileChannel { dataDirs: [/tmp/flume/data1, /tmp/flume/data2,= /tmp/flume/data3] } - Exception follows. >>> java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable >>> at java.lang.ClassLoader.defineClass1(Native Method) >>> at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) >>> at java.lang.ClassLoader.defineClass(ClassLoader.java:615) >>> at java.security.SecureClassLoader.defineClass(SecureClassLoade= r.java:141) >>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) >>> at java.net.URLClassLoader.access$000(URLClassLoader.java:58) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:197) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301= ) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:247) >>> at org.apache.flume.channel.file.Log$Builder.build(Log.java:144= ) >>> at org.apache.flume.channel.file.FileChannel.start(FileChannel.= java:223) >>> at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnab= le.run(LifecycleSupervisor.java:236) >>> at java.util.concurrent.Executors$RunnableAdapter.call(Executor= s.java:441) >>> at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Future= Task.java:317) >>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:= 150) >>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu= tureTask.access$101(ScheduledThreadPoolExecutor.java:98) >>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu= tureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) >>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu= tureTask.run(ScheduledThreadPoolExecutor.java:204) >>> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Threa= dPoolExecutor.java:886) >>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo= lExecutor.java:908) >>> at java.lang.Thread.run(Thread.java:662) >>> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Writa= ble >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:202) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301= ) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:247) >>> ... 24 more >>> 12 Sep 2012 13:18:54,127 INFO [lifecycleSupervisor-1-0] (org.apache.fl= ume.channel.file.FileChannel.stop:249) - Stopping FileChannel fileChannel = { dataDirs: [/tmp/flume/data1, /tmp/flume/data2, /tmp/flume/data3] }... >>> 12 Sep 2012 13:18:54,127 ERROR [lifecycleSupervisor-1-0] (org.apache.fl= ume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:249) - Unsuccessful = attempt to shutdown component: {} due to missing dependencies. Please shutd= own the agentor disable this component, or the agent will bein an undefined= state. >>> java.lang.IllegalStateException: Channel closed[channel=3DfileChannel] >>> at com.google.common.base.Preconditions.checkState(Precondition= s.java:145) >>> at org.apache.flume.channel.file.FileChannel.getDepth(FileChann= el.java:282) >>> at org.apache.flume.channel.file.FileChannel.stop(FileChannel.j= ava:250) >>> at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnab= le.run(LifecycleSupervisor.java:244) >>> at java.util.concurrent.Executors$RunnableAdapter.call(Executor= s.java:441) >>> at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Future= Task.java:317) >>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:= 150) >>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu= tureTask.access$101(ScheduledThreadPoolExecutor.java:98) >>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu= tureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) >>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu= tureTask.run(ScheduledThreadPoolExecutor.java:204) >>> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Threa= dPoolExecutor.java:886) >>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo= lExecutor.java:908) >>> at java.lang.Thread.run(Thread.java:662) >>> 12 Sep 2012 13:18:54,622 INFO [conf-file-poller-0] (org.apache.flume.n= ode.nodemanager.DefaultLogicalNodeManager.startAllComponents:141) - Starti= ng Sink filesink1 >>> 12 Sep 2012 13:18:54,624 INFO [conf-file-poller-0] (org.apache.flume.n= ode.nodemanager.DefaultLogicalNodeManager.startAllComponents:152) - Starti= ng Source avroSource >>> 12 Sep 2012 13:18:54,626 INFO [lifecycleSupervisor-1-1] (org.apache.fl= ume.source.AvroSource.start:138) - Starting Avro source avroSource: { bind= Address: 0.0.0.0, port: 9432 }... >>> 12 Sep 2012 13:18:54,641 ERROR [SinkRunner-PollingRunner-DefaultSinkPro= cessor] (org.apache.flume.SinkRunner$PollingRunner.run:160) - Unable to de= liver event. Exception follows. >>> java.lang.IllegalStateException: Channel closed [channel=3DfileChannel] >>> at com.google.common.base.Preconditions.checkState(Precondition= s.java:145) >>> at org.apache.flume.channel.file.FileChannel.createTransaction(= FileChannel.java:267) >>> at org.apache.flume.channel.BasicChannelSemantics.getTransactio= n(BasicChannelSemantics.java:118) >>> at org.apache.flume.sink.RollingFileSink.process(RollingFileSin= k.java:172) >>> at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSi= nkProcessor.java:68) >>> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.jav= a:147) >>> at java.lang.Thread.run(Thread.java:662) >>> >>> >>> >>> >>> Using your config this is my starting point... (trying to get it functi= oning on a single host first) >>> >>> node105.sources =3D tailsource >>> node105.channels =3D fileChannel >>> node105.sinks =3D avroSink >>> >>> node105.sources.tailsource.type =3D exec >>> node105.sources.tailsource.command =3Dtail -F /root/Desktop/apache-flum= e-1.3.0-SNAPSHOT/test.log >>> #node105.sources.stressSource.batchSize =3D 1000 >>> node105.sources.tailsource.channels =3D fileChannel >>> >>> ## Sink sends avro messages to node103.bashkew.com port 9432 >>> node105.sinks.avroSink.type =3D avro >>> node105.sinks.avroSink.batch-size =3D 1000 >>> node105.sinks.avroSink.channel =3D fileChannel >>> node105.sinks.avroSink.hostname =3D localhost >>> node105.sinks.avroSink.port =3D 9432 >>> >>> node105.channels.fileChannel.type =3D file >>> node105.channels.fileChannel.checkpointDir =3D /root/Desktop/apache-flu= me-1.3.0-SNAPSHOT/tmp/flume/checkpoint >>> node105.channels.fileChannel.dataDirs =3D /root/Desktop/apache-flume-1.= 3.0-SNAPSHOT/tmp/flume/tmp/flume/data >>> node105.channels.fileChannel.capacity =3D 10000 >>> node105.channels.fileChannel.checkpointInterval =3D 3000 >>> node105.channels.fileChannel.maxFileSize =3D 5242880 >>> >>> node102.sources =3D avroSource >>> node102.channels =3D fileChannel >>> node102.sinks =3D filesink1 >>> >>> ## Source listens for avro messages on port 9432 on all ips >>> node102.sources.avroSource.type =3D avro >>> node102.sources.avroSource.channels =3D fileChannel >>> node102.sources.avroSource.bind =3D 0.0.0.0 >>> node102.sources.avroSource.port =3D 9432 >>> >>> node102.sinks.filesink1.type =3D FILE_ROLL >>> node102.sinks.filesink1.batchSize =3D 1000 >>> node102.sinks.filesink1.channel =3D fileChannel >>> node102.sinks.filesink1.sink.directory =3D /root/Desktop/apache-flume-1= .3.0-SNAPSHOT/logs/rhel5/ >>> node102.channels.fileChannel.type =3D file >>> node102.channels.fileChannel.checkpointDir =3D /tmp/flume/checkpoints >>> node102.channels.fileChannel.dataDirs =3D /tmp/flume/data1,/tmp/flume/d= ata2,/tmp/flume/data3 >>> node102.channels.fileChannel.capacity =3D 5000 >>> node102.channels.fileChannel.checkpointInterval =3D 45000 >>> node102.channels.fileChannel.maxFileSize =3D 5242880 >>> >>> >>> >>> Thanks! >>> Dave >>> >>> >>> -----Original Message----- >>> From: Brock Noland [mailto:brock@cloudera.com] >>> Sent: Wed 9/12/2012 9:11 AM >>> To: user@flume.apache.org >>> Subject: Re: splitting functions >>> >>> Hi, >>> >>> Below is a config I use to test out the FileChannel. See the comments >>> "##" for how messages are sent from one host to another. >>> >>> node105.sources =3D stressSource >>> node105.channels =3D fileChannel >>> node105.sinks =3D avroSink >>> >>> node105.sources.stressSource.type =3D org.apache.flume.source.StressSou= rce >>> node105.sources.stressSource.batchSize =3D 1000 >>> node105.sources.stressSource.channels =3D fileChannel >>> >>> ## Sink sends avro messages to node103.bashkew.com port 9432 >>> node105.sinks.avroSink.type =3D avro >>> node105.sinks.avroSink.batch-size =3D 1000 >>> node105.sinks.avroSink.channel =3D fileChannel >>> node105.sinks.avroSink.hostname =3D node102.bashkew.com >>> node105.sinks.avroSink.port =3D 9432 >>> >>> node105.channels.fileChannel.type =3D file >>> node105.channels.fileChannel.checkpointDir =3D /tmp/flume/checkpoints >>> node105.channels.fileChannel.dataDirs =3D >>> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3 >>> node105.channels.fileChannel.capacity =3D 10000 >>> node105.channels.fileChannel.checkpointInterval =3D 3000 >>> node105.channels.fileChannel.maxFileSize =3D 5242880 >>> >>> node102.sources =3D avroSource >>> node102.channels =3D fileChannel >>> node102.sinks =3D nullSink >>> >>> >>> ## Source listens for avro messages on port 9432 on all ips >>> node102.sources.avroSource.type =3D avro >>> node102.sources.avroSource.channels =3D fileChannel >>> node102.sources.avroSource.bind =3D 0.0.0.0 >>> node102.sources.avroSource.port =3D 9432 >>> >>> node102.sinks.nullSink.type =3D null >>> node102.sinks.nullSink.batchSize =3D 1000 >>> node102.sinks.nullSink.channel =3D fileChannel >>> >>> node102.channels.fileChannel.type =3D file >>> node102.channels.fileChannel.checkpointDir =3D /tmp/flume/checkpoints >>> node102.channels.fileChannel.dataDirs =3D >>> /tmp/flume/data1,/tmp/flume/data2,/tmp/flume/data3 >>> node102.channels.fileChannel.capacity =3D 5000 >>> node102.channels.fileChannel.checkpointInterval =3D 45000 >>> node102.channels.fileChannel.maxFileSize =3D 5242880 >>> >>> >>> >>> On Wed, Sep 12, 2012 at 10:06 AM, Cochran, David M (Contractor) >>> wrote: >>>> Okay folks, after spending the better part of a week reading the docs = and >>>> experimenting I'm lost. I have flume 1.3.x working pretty much as exp= ected >>>> on a single host. It tails a log file and writes it to another rollin= g log >>>> file via flume. No problem there, seems to work flawlessly. Where my= issue >>>> is trying to break apart the functions across multiple hosts... a sing= le >>>> host listening for others to send their logs to. All of my efforts ha= ve >>>> resulted in little more than headaches. >>>> >>>> I can't even see the specified port open on what should be the logging= host. >>>> I've tried the basic examples posted on different docs but can't seem = to get >>>> things working across multiple hosts. >>>> >>>> Would someone post a working example of the conf's needed to get me st= arted? >>>> Something simple that works, so I can them pick it apart to gain more >>>> understanding. Apparently, I just don't yet have a firm enough grasp = on all >>>> the pieces yet, but want to learn! >>>> >>>> Thanks in advance! >>>> Dave >>>> >>>> >>> >>> >>> >>> -- >>> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mr= unit/ >>> >> >> >> >> -- >> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mru= nit/ >> > > > > -- > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrun= it/ --=20 Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit= /