flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Israel Ekpo <isr...@aicer.org>
Subject Re: spool directorty configiration problem
Date Tue, 23 Apr 2013 14:40:25 GMT
Hello Venkat,

Your question is more appropriate for the users mailing list so I have
changed the list in this reply.

Going forward, you can use the following as a guide when sending emails to
the lists:

For questions about how to use or configure Apache FLUME or if you are
experiencing issues using Apache Flume, such queries should be sent to the
user mailing list (user@flume.apache.org)

For questions about API internals, patches and code reviews, such queries
are more appropriate for the developer mailing list (dev@flume.apache.org)

Coming back to the issue you reported, I have had this problem before in my
early days with Flume.

The root cause of your problem is in the log files you included in your
message.

You cannot set the spooling directory to one where the files and constantly
being updated.

If the files have been updated after they are picked up by Flume in the
spooling directory you are going to encounter an Exception.

You can check out the user gude for more info.

http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source

>From the user guide, the SpoolingDirectorySource expects that only
immutable, uniquely named files are dropped in the spooling directory. If
duplicate names are used, or files are modified while being read, the
source will fail with an error message. For some use cases this may require
adding unique identifiers (such as a timestamp) to log file names when they
are copied into the spooling directory.



On 23 April 2013 01:17, Venkateswarlu Danda <
Venkateswarlu.Danda@lntinfotech.com> wrote:

> Hello
>
> I am generating files continuously in local folder of my base machine. How
> I can now use the flume to stream the generated files from local folder to
> HDFS.
>
> I have written some configuration but its giving some issues ,please give
> me some sample code configuration
>
> This is my configratin  file
>
> agents.sources=spooldir-source
> agents.sinks=hdfs-sink
> agents.channels=ch1
>
> agents.sources.spooldir-source.type=spooldir
>
> agents.sources.spooldir-source.spoolDir=/apache-tomcat-7.0.39/logs/MultiThreadLogs
> agents.sources.spooldir-source.fileSuffix=.SPOOL
> agents.sources.spooldir-source.fileHeader=true
> agents.sources.spooldir-source.bufferMaxLineLength=50000
>
> agents.sinks.hdfs-sink.type=hdfs
> agents.sinks.hdfs-sink.hdfs.path=hdfs://cloudx-740-677:54300/multipleFiles/
> agents.sinks.hdfs-sink.hdfs.rollSize=12553700
> agents.sinks.hdfs-sink.hdfs.rollCount=12553665
> agents.sinks.hdfs-sink.hdfs.rollInterval=3000
> agents.sinks.hdfs-sink.hdfs.fileType=DataStream
> agents.sinks.hdfs-sink.hdfs.writeFormat=Text
>
> agents.channels.ch1.type=file
>
> agents.sources.spooldir-source.channels=ch1
> agents.sinks.hdfs-sink.channel=ch1
>
>
>
> If I adding a large files (10Mb) , getting error
>
>
> 13/04/18 16:11:21 ERROR source.SpoolDirectorySource: Uncaught exception in
> Runnable
> java.lang.IllegalStateException: File has been modified since being read:
> /apache-tomcat-7.0.39/logs/MultiThreadLogs/log_0.txt
>         at
> org.apache.flume.client.avro.SpoolingFileLineReader.retireCurrentFile(SpoolingFileLineReader.java:237)
>         at
> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:185)
>         at
> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)
> 13/04/18 16:11:21 ERROR source.SpoolDirectorySource: Uncaught exception in
> Runnable
> java.io.IOException: Stream closed
>         at java.io.BufferedReader.ensureOpen(BufferedReader.java:115)
>         at java.io.BufferedReader.readLine(BufferedReader.java:310)
>         at java.io.BufferedReader.readLine(BufferedReader.java:382)
>         at
> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:180)
>         at
> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:722)
>
>
> if increase the "bufferMaxLineLength"
>
> java.lang.OutOfMemoryError: Java heap space
>         at java.io.BufferedReader.<init>(BufferedReader.java:98)
>         at
> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
>         at
> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
>         at
> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
>
> Thanks&Regards
> Venkat.D
>
>
> -----Original Message-----
> From: Venkatesh Sivasubramanian (JIRA) [mailto:jira@apache.org]
> Sent: Tuesday, April 23, 2013 8:57 AM
> To: dev@flume.apache.org
> Subject: [jira] [Comment Edited] (FLUME-1819) ExecSource don't flush the
> cache if there is no input entries
>
>
>     [
> https://issues.apache.org/jira/browse/FLUME-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638744#comment-13638744]
>
> Venkatesh Sivasubramanian edited comment on FLUME-1819 at 4/23/13 3:27 AM:
> ---------------------------------------------------------------------------
>
> Yes Hari, let me take a stab. Will keep you posted. thanks!
>
>
>       was (Author: venkyz):
>     Yes Hari, let me take a stab. Will keep you posted.
>
>
> > ExecSource don't flush the cache if there is no input entries
> > -------------------------------------------------------------
> >
> >                 Key: FLUME-1819
> >                 URL: https://issues.apache.org/jira/browse/FLUME-1819
> >             Project: Flume
> >          Issue Type: Bug
> >          Components: Sinks+Sources
> >    Affects Versions: v1.3.0
> >            Reporter: Fengdong Yu
> >            Assignee: Venkatesh Sivasubramanian
> >             Fix For: v1.4.0
> >
> >         Attachments: FLUME-1819.patch, FLUME-1819.patch.1
> >
> >
> > ExecSource has a default batchSize: 20, exec source read data from the
> source, then put it into the cache, after the cache is full, push it to the
> channel.
> > but if exec source's cache is not full, and there isn't any input for a
> long time, then these entries always kept in the cache, there is no chance
> to the channel until the source's cache is full.
> > so, the patch added a new config line: batchTimeout for ExecSource, and
> default is 3 seconds, if batchTimeout exceeded, push all cached data to the
> channel even the cache is not full.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators For more information on JIRA, see:
> http://www.atlassian.com/software/jira
>
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and  using or disseminating the information,  and must notify
> the sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>

Mime
View raw message