flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Young <danoyo...@gmail.com>
Subject Re: .SpoolingFileLineReader warning....
Date Tue, 20 Nov 2012 15:02:53 GMT
Hey Brock,

I can do some more testing on my side with smaller files as well as doing a
mv vs a cp . I do believe that a slight delay would be helpful since people
will be moving/copying large files around.

Regards ,

Dano
On Nov 20, 2012 5:26 AM, "Brock Noland" <brock@cloudera.com> wrote:

> Thinking about this more, I think it's probably going to be quite
> common for people to cp large files into the spooling directory.
> Patrick, what do you think about waiting until the mtime is say 1
> second old?
>
> Brock
>
> On Mon, Nov 19, 2012 at 5:29 PM, Brock Noland <brock@cloudera.com> wrote:
> > My guess is that the file does not have the correct permissions while
> > being copied.
> >
> > [noland@localhost cp-test]$ cp -p test-0 test-1 & sleep 0.1; ls -al
> test*
> > [1] 18780
> > -rw-rw-r-- 1 noland noland 1048576000 Nov 19 17:25 test-0
> > -rw------- 1 noland noland   52334592 Nov 19 17:27 test-1
> >
> >
> > For large files, it probably makes sense to copy the file in as .file
> > and then rename it to file.
> >
> > Brock
> >
> > On Mon, Nov 19, 2012 at 5:04 PM, Patrick Wendell <pwendell@gmail.com>
> wrote:
> >> The spooling source gets a directory listing, then reads each file, then
> >> renames it to X.COMPLETED. Is it possible some other process deleted
> that
> >> file between when Flume listed the directory and when it tried to open
> the
> >> file? Otherwise, I'm confused why the file would not be present in the
> >> listing you give here.
> >>
> >>
> >> On Mon, Nov 19, 2012 at 6:03 PM, Patrick Wendell <pwendell@gmail.com>
> wrote:
> >>>
> >>> Hey Dan,
> >>>
> >>> You say that it seems like Flume has already processed the log... why
> do
> >>> you think that?
> >>>
> >>> When you listed the directory contents I don't see the original or the
> >>> COMPLETED version of the file that Flume is complaining about:
> >>>
> >>> /clickstream.log-2012-11-17-1353163623
> >>>
> >>> doesn't appear in the
> >>>
> >>> /mnt/flume/clickstream/
> >>>
> >>> directory listing anywhere.
> >>>
> >>>
> >>> On Mon, Nov 19, 2012 at 2:33 PM, Dan Young <danoyoung@gmail.com>
> wrote:
> >>>>
> >>>> Hello Brock,
> >>>>
> >>>> It seems like we get this message each time that logrotate runs and
> is in
> >>>> the process of copying the file to the SpoolingDirectory. It seems
> that
> >>>> Flume starts reading the file as soon as it shows up in the
> >>>> SpoolingDirectory.....  Maybe it's trying to read the file while it's
> still
> >>>> being written to????
> >>>>
> >>>> 2012-11-19 19:27:27,924 (pool-12-thread-1) [WARN -
> >>>>
> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:328)]
> >>>> Could not find file:
> >>>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239
> >>>> java.io.FileNotFoundException:
> >>>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239
> (Permission
> >>>> denied)
> >>>> at java.io.FileInputStream.open(Native Method)
> >>>> at java.io.FileInputStream.<init>(FileInputStream.java:138)
> >>>> at java.io.FileReader.<init>(FileReader.java:72)
> >>>> at
> >>>>
> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
> >>>> at
> >>>>
> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
> >>>> at
> >>>>
> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
> >>>> at
> >>>>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >>>> at
> >>>>
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> >>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >>>> at
> >>>>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> >>>> at
> >>>>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> >>>> at
> >>>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>>> at
> >>>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >>>> at java.lang.Thread.run(Thread.java:722)
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Sat, Nov 17, 2012 at 9:15 AM, Brock Noland <brock@cloudera.com>
> wrote:
> >>>>>
> >>>>> Ok, do you mind sharing your log rotate config to see if we can
> >>>>> reproduce?
> >>>>>
> >>>>> --
> >>>>> Brock Noland
> >>>>> Sent with Sparrow
> >>>>>
> >>>>> On Saturday, November 17, 2012 at 10:01 AM, Dan Young wrote:
> >>>>>
> >>>>> Hey Brock,
> >>>>>
> >>>>> No I have not modified the conf while the agent was running.
> >>>>>
> >>>>> /mnt/flume is local. Note that this is running on an ec2 instance
and
> >>>>> the disk is the ephemeral drive, not EBS.
> >>>>>
> >>>>> Regards ,
> >>>>>
> >>>>> Dano
> >>>>>
> >>>>> On Nov 17, 2012 8:58 AM, "Brock Noland" <brock@cloudera.com>
wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I highly doubt it's related to
> >>>>> (https://issues.apache.org/jira/browse/FLUME-1721) but have you
> >>>>> modified the configuration file since starting the agent?  If so,
can
> >>>>> you restart the agent and see if the error continues?
> >>>>>
> >>>>> Also, is /mnt/flume local disk or NAS?
> >>>>>
> >>>>> Brock
> >>>>>
> >>>>> On Sat, Nov 17, 2012 at 9:02 AM, Dan Young <danoyoung@gmail.com>
> wrote:
> >>>>> > First a bit of context, I'm using logrotate to monitor and
copy (cp
> >>>>> > -p) log
> >>>>> > files to a flume spooling directory source.  So every hour,
> logrotate
> >>>>> > checks
> >>>>> > for and copies a file from the source to the flume destination.
I
> see
> >>>>> > the
> >>>>> > following warning message in the flume logs.
> >>>>> >
> >>>>> >
> >>>>> > 17 Nov 2012 14:47:07,682 WARN  [pool-10-thread-1]
> >>>>> >
> (org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile:328)
> >>>>> > -
> >>>>> > Could not find file:
> >>>>> > /mnt/flume/clickstream/clickstream.log-2012-11-17-1353163623
> >>>>> > java.io.FileNotFoundException:
> >>>>> > /mnt/flume/clickstream/clickstream.log-2012-11-17-1353163623
> >>>>> > (Permission
> >>>>> > denied)
> >>>>> > at java.io.FileInputStream.open(Native Method)
> >>>>> > at java.io.FileInputStream.<init>(FileInputStream.java:138)
> >>>>> > at java.io.FileReader.<init>(FileReader.java:72)
> >>>>> > at
> >>>>> >
> >>>>> >
> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
> >>>>> > at
> >>>>> >
> >>>>> >
> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
> >>>>> > at
> >>>>> >
> >>>>> >
> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
> >>>>> > at
> >>>>> >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >>>>> > at
> >>>>> >
> >>>>> >
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> >>>>> > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >>>>> > at
> >>>>> >
> >>>>> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> >>>>> > at
> >>>>> >
> >>>>> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> >>>>> > at
> >>>>> >
> >>>>> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>>>> > at
> >>>>> >
> >>>>> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >>>>> > at java.lang.Thread.run(Thread.java:722)
> >>>>> >
> >>>>> >
> >>>>> > Although it appears that Flume processes the log, I'm curious
why
> I''m
> >>>>> > seeing this and if I have anything with permissions incorrect?
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > Here's the permissions:
> >>>>> >
> >>>>> > source log directory under /var/log:
> >>>>> > drwxrwxr-x 2 ubuntu    ubuntu   4096 Nov 17 14:47 clickstream
> >>>>> >
> >>>>> > source files:
> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu   9055750 Nov 17 13:29
> >>>>> > clickstream.log-2012-11-17-1353158953.gz
> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu  13583565 Nov 17 14:17
> >>>>> > clickstream.log-2012-11-17-1353161821.gz
> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 14:47
> >>>>> > clickstream.log-2012-11-17-1353163623
> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu  65648336 Nov 17 14:52 clickstream.log
> >>>>> >
> >>>>> > flume source directory under /mnt/flume:
> >>>>> > drwxrwxr-x 2 ubuntu ubuntu 4096 Nov 17 14:48 clickstream
> >>>>> >
> >>>>> > flume source files:
> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 13:29
> >>>>> > clickstream.log-2012-11-17-1353158953.COMPLETED
> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu 196945008 Nov 17 14:17
> >>>>> > clickstream.log-2012-11-17-1353161821.COMPLETED
> >>>>> > -rw-rw-r-- 1 ubuntu ubuntu 131296672 Nov 17 14:47
> >>>>> > clickstream.log-2012-11-17-1353163623.COMPLETED
> >>>>> >
> >>>>> > Any insight would be appreciated.
> >>>>> >
> >>>>> > Regards,
> >>>>> >
> >>>>> > Dan
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Apache MRUnit - Unit testing MapReduce -
> >>>>> http://incubator.apache.org/mrunit/
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > Apache MRUnit - Unit testing MapReduce -
> http://incubator.apache.org/mrunit/
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce -
> http://incubator.apache.org/mrunit/
>

Mime
View raw message