Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 17B07E271 for ; Mon, 28 Jan 2013 13:27:51 +0000 (UTC) Received: (qmail 3656 invoked by uid 500); 28 Jan 2013 13:27:50 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 2879 invoked by uid 500); 28 Jan 2013 13:27:47 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 2822 invoked by uid 99); 28 Jan 2013 13:27:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jan 2013 13:27:45 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of israelekpo@gmail.com designates 209.85.210.171 as permitted sender) Received: from [209.85.210.171] (HELO mail-ia0-f171.google.com) (209.85.210.171) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jan 2013 13:27:38 +0000 Received: by mail-ia0-f171.google.com with SMTP id z13so4182447iaz.16 for ; Mon, 28 Jan 2013 05:27:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=ixPx/pPIGiASzWeg4mK8itLhK6AOC6y7f/zLUCsN5+0=; b=C0MehFfBU0YXUxc1gSTTgkF6V/gTDjgiJtjT+/UQPB5i0NrpOQwE0YKtNCDwElXwon R3dnP3VAC0OjM5t0ShoSS68Qd7sNQIyQtuvlFQLPCIe3fsq0UgScZtT8I9Y+X2kpkBw6 JYHs3aD6AVIHIyZUlia1h/54BSOiVPFu9X73i3/JMIjlnAr49zeKuKImOIBrSD8EuAFv m9Y2DTeBzNU1fXUEcEdZ80E6i0jHXvQ2kz8fNouy7hwNKntxHmssGD32q8zZbrbrgwtU hASM93srki/7cSw9GzOyt8VsQrRN4DOFkYj7+WNQLLcln3tuGbOsH3uz1ccuV5XMJ9oV rCXg== MIME-Version: 1.0 X-Received: by 10.50.46.228 with SMTP id y4mr4870938igm.40.1359379637326; Mon, 28 Jan 2013 05:27:17 -0800 (PST) Received: by 10.42.67.66 with HTTP; Mon, 28 Jan 2013 05:27:17 -0800 (PST) In-Reply-To: References: Date: Mon, 28 Jan 2013 08:27:17 -0500 Message-ID: Subject: Re: Flume-NG 1.3.1 : Spooling dir source : java.io.IOException: Stream closed From: Israel Ekpo To: user@flume.apache.org Content-Type: multipart/alternative; boundary=14dae934104165d1d404d4593c0a X-Virus-Checked: Checked by ClamAV on apache.org --14dae934104165d1d404d4593c0a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Nguyen, It might be helpful, if the original log data is saved in a separate directory first and then you can use a separate script/program to send the *diffs* periodically to the directory being spooled by Flume. If the log files are rolled, then your script might need to be aware of the timestamp of the last event for the last diff generated. The diffs can contain the timestamp (in milliseconds) in the file name to prevent conflict in the spooling directory. Using diffs will prevent duplicate events being logged more than once for the same original file. This will also ensure that whatever files you dump in the spolled directory are not modified as Flume is ingesting the events. Hope this helps. On Mon, Jan 28, 2013 at 3:02 AM, NGuyen thi Kim Tuyen wrote: > Thank for your reply . > > If spooling source only works on "done" , immutable files , it's not > suitable my problem . I think I 'll use exec tail command instead . But > warning from http://flume.apache.org/FlumeUserGuide.html#exec-source : T= he > problem with ExecSource and other asynchronous sources is that the source > can not guarantee that if there is a failure to put the event into the > Channel the client knows about it. ..... For stronger reliability > guarantees, consider the Spooling Directory Source or direct integration > with Flume via the SDK. > > I'm still considering between ExecSource and Log4jAppender . > http://www.slideshare.net/sematext/search-analytics-with-flume-and-hbase > > > Could you share me your opinion ? > > On Mon, Jan 28, 2013 at 2:29 PM, Mike Percy wrote: > >> Hi Nguy=E1=BB=85n, >> The spooling source only works on "done", immutable files. So they have >> to be atomically moved and they cannot be modified after being placed in= to >> the spooling directory. >> >> Regards, >> Mike >> >> >> On Sun, Jan 27, 2013 at 11:14 PM, NGuyen thi Kim Tuyen < >> tuyen03a128@gmail.com> wrote: >> >>> Hi , >>> >>> Please help me . >>> >>> I want to use Flume in the following case : >>> Spooling directory source --> FileChannel --> HBase sink . But I have >>> some problems with Spooling directory source : >>> >>> Here is my test flume.conf : >>> t-game-db194.sources =3D test-hbase >>> >>> t-game-db194.sinks =3D sink-hbase >>> >>> t-game-db194.channels =3D hbase-channel >>> >>> #source spoolDir >>> t-game-db194.sources.test-hbase.type =3D spooldir >>> >>> t-game-db194.sources.test-hbase.spoolDir =3D/var/log/testhbase >>> >>> t-game-db194.sources.test-hbase.fileHeader =3D true >>> >>> t-game-db194.sources.test-hbase.channels =3D hbase-channel >>> >>> #file Channel >>> t-game-db194.channels.hbase-channel.type =3D file >>> >>> t-game-db194.channels.hbase-channel.checkpointDir =3D >>> /var/log/flume-ng/checkpoint >>> >>> t-game-db194.channels.hbase-channel.dataDir =3D /var/log/flume-ng/filed= ata >>> >>> >>> #sink >>> t-game-db194.sinks.sink-hbase.type =3D logger >>> >>> t-game-db194.sinks.sink-hbase.channel =3D hbase-channel >>> >>> And I tested : echo "tuyen ssssssssss " >> >>> "/var/log/testhbase/hbase_1.log" . The first event is OK , but the next >>> events are not work . Here is flume.log >>> >>> 28 Jan 2013 13:16:47,424 INFO [lifecycleSupervisor-1-0] >>> (org.apache.flume.source.SpoolDirectorySource.start:64) - >>> SpoolDirectorySource source starting with directory:/var/log/testhbase >>> 28 Jan 2013 13:16:47,732 INFO [pool-7-thread-1] >>> (org.apache.flume.client.avro.SpoolingFileLineReader.retireCurrentFile:= 229) >>> - Preparing to move file /var/log/testhbase/hbase_1.log to >>> /var/log/testhbase/hbase_1.log.COMPLETED >>> 28 Jan 2013 13:16:48,436 INFO >>> [SinkRunner-PollingRunner-DefaultSinkProcessor] >>> (org.apache.flume.sink.LoggerSink.process:70) - Event: { >>> headers:{file=3D/var/log/testhbase/hbase_1.log} body: 74 75 79 65 6E 20= 73 73 >>> 73 73 73 73 73 73 73 73 tuyen ssssssssss } >>> >>> 28 Jan 2013 13:17:08,836 INFO [pool-7-thread-1] >>> (org.apache.flume.client.avro.SpoolingFileLineReader.retireCurrentFile:= 229) >>> - Preparing to move file /var/log/testhbase/hbase_1.log to >>> /var/log/testhbase/hbase_1.log.COMPLETED >>> 28 Jan 2013 13:17:08,837 ERROR [pool-7-thread-1] >>> (org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.ru= n:148) >>> - Uncaught exception in Runnable >>> java.lang.IllegalStateException: File name has been re-used with >>> different files. Spooling assumption violated for >>> /var/log/testhbase/hbase_1.log.COMPLETED >>> at >>> org.apache.flume.client.avro.SpoolingFileLineReader.retireCurrentFile(S= poolingFileLineReader.java:272) >>> at >>> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingF= ileLineReader.java:185) >>> at >>> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run= (SpoolDirectorySource.java:135) >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >>> at >>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:3= 17) >>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.ac= cess$101(ScheduledThreadPoolExecutor.java:98) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.ru= nPeriodic(ScheduledThreadPoolExecutor.java:180) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.ru= n(ScheduledThreadPoolExecutor.java:204) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecut= or.java:886) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j= ava:908) >>> at java.lang.Thread.run(Thread.java:662) >>> 28 Jan 2013 13:17:09,340 ERROR [pool-7-thread-1] >>> (org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.ru= n:148) >>> - Uncaught exception in Runnable >>> java.io.IOException: Stream closed >>> at java.io.BufferedReader.ensureOpen(BufferedReader.java:97) >>> at java.io.BufferedReader.readLine(BufferedReader.java:292) >>> at java.io.BufferedReader.readLine(BufferedReader.java:362) >>> at >>> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingF= ileLineReader.java:180) >>> at >>> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run= (SpoolDirectorySource.java:135) >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >>> at >>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:3= 17) >>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.ac= cess$101(ScheduledThreadPoolExecutor.java:98) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.ru= nPeriodic(ScheduledThreadPoolExecutor.java:180) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.ru= n(ScheduledThreadPoolExecutor.java:204) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecut= or.java:886) >>> >>> >>> Are there more documents about Flume-ng spooling source , beside >>> http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source ? >>> >>> Could you please give me some advice ? >>> >>> -- >>> Nguy=E1=BB=85n Th=E1=BB=8B Kim Tuy=C3=AAn >>> Computer Science Engineering >>> HCMC University Of Technology. >> >> >> > > > -- > Nguy=E1=BB=85n Th=E1=BB=8B Kim Tuy=C3=AAn > Computer Science Engineering > HCMC University Of Technology. > --=20 =C2=B0O=C2=B0 "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ --14dae934104165d1d404d4593c0a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Nguyen,

It might be helpful, if the original log data is= saved in a separate directory first and then you can use a separate script= /program to send the *diffs* periodically to the directory being spooled by= Flume.

If the log files are rolled, then your script might nee= d to be aware of the timestamp of the last event for the last diff generate= d.

The diffs can contain the timestamp (in millise= conds) in the file name to prevent conflict in the spooling directory.

Using diffs will prevent duplicate events being logged = more than once for the same original file.

This wi= ll also ensure that whatever files you dump in the spolled directory are no= t modified as Flume is ingesting the events.

Hope this helps.

On M= on, Jan 28, 2013 at 3:02 AM, NGuyen thi Kim Tuyen <tuyen03a128@gmail.c= om> wrote:
Thank for your reply .=C2=A0

<= div>If spooling source only works on "done" , immutable files , i= t's not suitable my problem . I think I 'll use exec tail command i= nstead . But warning from=C2=A0http://flume.apache.org/FlumeUser= Guide.html#exec-source=C2=A0 :=C2=A0The problem with ExecSource and other asynchro= nous sources is that the source can not guarantee that if there is a failur= e to put the event into the Channel the client knows about it. .....=C2=A0<= /span>For s= tronger reliability guarantees, consider the Spooling Directory Source or d= irect integration with Flume via the SDK.=C2=A0

I'm still considering between ExecSource and Log4jAppender .=C2= =A0http://www.slideshare.net/sematext/search-an= alytics-with-flume-and-hbase


Could you share me your opin= ion ?=C2=A0

On M= on, Jan 28, 2013 at 2:29 PM, Mike Percy <mpercy@cloudera.com> wrote:
Hi=C2=A0Nguy=E1=BB=85n,
The spooling sou= rce only works on "done", immutable files. So they have to be ato= mically moved and they cannot be modified after being placed into the spool= ing directory.

Regards,
Mike


On Sun, Jan 27, 2013 = at 11:14 PM, NGuyen thi Kim Tuyen <tuyen03a128@gmail.com> wrote:
Hi ,=C2=A0

Please help me .=C2=A0

I want to use Flume in the following case :
Spooling directory source = --> FileChannel --> HBase sink . But I have some problems with Spooli= ng directory source :=C2=A0
Here is my test flume.conf :
t-game-db194.sources =3D test-hbase

t-game-db194.sinks =3D sink-hbase

t-game-db194.channels =3D hbase-channel

#source spoolDir
t-game-db194.sources.test-hbase.type =3D spooldir

t-game-db194= .sources.test-hbase.spoolDir =3D/var/log/testhbase
t-game-db194.sources.test-hbase.fileHeader =3D true

t-game-db194.sources.test-hbase.channels =3D hbase-channel

<= /div>
#f= ile Channel
t-game-db194.channels.hbase-channel.type =3D file

t-game-db194.channels.hbase-channel.checkpointDir =3D /var/log/flum= e-ng/checkpoint

t-game-db194.channels.hbase-channel.dataDir =3D /var/log/flume-ng/f= iledata


#sink
t-game-db194.sinks.sink-hbase.type =3D logger

t-game-db194.sinks= .sink-hbase.channel =3D hbase-channel

And I tested : echo "tuyen ssssssssss " >> &quo= t;/var/log/testhbase/hbase_1.log" . The first event is OK , but the ne= xt events are not work . Here is flume.log
28 Jan 2013 13:16:47,424 INFO =C2=A0[lifecycleSupervisor-1-0] (org.apache.f= lume.source.SpoolDirectorySource.start:64) =C2=A0- SpoolDirectorySource sou= rce starting with directory:/var/log/testhbase
28 Jan 2013 13:16:47,732 INFO =C2=A0[pool-7-thread-1] (org.apache.flume.cli= ent.avro.SpoolingFileLineReader.retireCurrentFile:229) =C2=A0- Preparing to= move file /var/log/testhbase/hbase_1.log to /var/log/testhbase/hbase_1.log= .COMPLETED
28= Jan 2013 13:16:48,436 INFO =C2=A0[SinkRunner-PollingRunner-DefaultSinkProc= essor] (org.apache.flume.sink.LoggerSink.process:70) =C2=A0- Event: { heade= rs:{file=3D/var/log/testhbase/hbase_1.log} body: 74 75 79 65 6E 20 73 73 73= 73 73 73 73 73 73 73 tuyen ssssssssss }
28 Jan 2013 13:17:08,836 INFO =C2=A0[pool-7-thread-1] (org.apache.flu= me.client.avro.SpoolingFileLineReader.retireCurrentFile:229) =C2=A0- Prepar= ing to move file /var/log/testhbase/hbase_1.log to /var/log/testhbase/hbase= _1.log.COMPLETED
28= Jan 2013 13:17:08,837 ERROR [pool-7-thread-1] (org.apache.flume.source.Spo= olDirectorySource$SpoolDirectoryRunnable.run:148) =C2=A0- Uncaught exceptio= n in Runnable
ja= va.lang.IllegalStateException: File name has been re-used with different fi= les. Spooling assumption violated for /var/log/testhbase/hbase_1.log.COMPLE= TED
at org.apache.flume.client.avro.SpoolingFileLineR= eader.retireCurrentFile(SpoolingFileLineReader.java:272)
at org.apache.flume.client.avro.SpoolingFileLineR= eader.readLines(SpoolingFileLineReader.java:185)
at org.apache.flume.source.SpoolDirectorySource$S= poolDirectoryRunnable.run(SpoolDirectorySource.java:135)
at java.util.concurrent.Executors$RunnableAdapter= .call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunA= ndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(Fu= tureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecut= or$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecut= or$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecut= or$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker= .runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker= .run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
28 Jan 2013 13:17:09,340 ERROR [pool-7-thread-1] (org.apache.flume.source.S= poolDirectorySource$SpoolDirectoryRunnable.run:148) =C2=A0- Uncaught except= ion in Runnable
java.io.IOException: Stream closed
at java.i= o.BufferedReader.ensureOpen(BufferedReader.java:97)
at java.io.BufferedReader.readLine(BufferedReader= .java:292)
at java.io.BufferedReader.readLine(BufferedReader= .java:362)
at org.apache.flume.client.avro.SpoolingFileLineR= eader.readLines(SpoolingFileLineReader.java:180)
at org.apache.flume.source.SpoolDirectorySource$S= poolDirectoryRunnable.run(SpoolDirectorySource.java:135)
at java.util.concurrent.Executors$RunnableAdapter= .call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunA= ndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(Fu= tureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecut= or$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecut= or$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecut= or$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker= .runTask(ThreadPoolExecutor.java:886)


Are there more documents about Flume-ng spooling source =C2=A0, beside=C2= =A0http://f= lume.apache.org/FlumeUserGuide.html#spooling-directory-source=C2=A0?
= =C2=A0
Could you please give me some advice ?=C2=A0

--
Nguy=E1=BB=85n Th=E1=BB=8B Kim Tuy=C3=AAn
Computer Science Engine= ering
HCMC University Of Technology.




--
Nguy=E1=BB= =85n Th=E1=BB=8B Kim Tuy=C3=AAn
Computer Science Engineering
HCMC Uni= versity Of Technology.



--
=C2=B0O=C2= =B0
"Good Enough" is not good enough.
To give anything less= than your best is to sacrifice the gift.
Quality First. Measure Twice. = Cut Once.
http://www.israele= kpo.com/
--14dae934104165d1d404d4593c0a--