incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corbin Hoenes <cor...@tynt.com>
Subject Re: _SUCCESS files appearing in demuxOutput
Date Fri, 25 Feb 2011 00:24:41 GMT
We're using Cloudera's CDH3 beta 4 release.   Maybe they've patched in the
FileOutputCommitter stuff into their release as it's based on Hadoop 0.20.2
Looking at the source for Chukwa 0.3 (version we are on) the
MoveToRepository class skips the _log and _temporary directories.

Seems like Chukwa should skip the _SUCCESS directory as well?  Or could a
more general skip be used like skip anything starting with and underscore?
(maybe too aggressive).

Does Chukwa 0.4 or 0.5 fix this issue? (I'm probably going to just have to
patch 0.3 but maybe just another reason to upgrade.)

On Thu, Feb 24, 2011 at 12:20 PM, Jerome Boulon <jboulon@netflix.com> wrote:

> This filename is coming from here:
> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/constant-values.html
> In general for hadoop you may want to avoid looking at any "_*" file since
> those are Hadoop related files like (_temporary, _log,…)
>
> /Jerome.
> From: Eric Yang <eyang@yahoo-inc.com>
> Reply-To: "chukwa-user@incubator.apache.org" <
> chukwa-user@incubator.apache.org>
> Date: Thu, 24 Feb 2011 10:55:57 -0800
> To: "chukwa-user@incubator.apache.org" <chukwa-user@incubator.apache.org>
> Subject: Re: _SUCCESS files appearing in demuxOutput
>
> Hi Corbin,
>
> I have not seen this.  What is the version of hadoop that you are using,
> are you using 0.21?  It looks like the _SUCCESS file is spill out after
> demux mapreduce job.  There are two possibilities leading to the creation of
> this file.  Demux is modified and it is doing something that is unexpected,
> or the mapreduce framework 0.21 put that file there.
> If you are using 0.21, I would recommend to avoid it.
>
> A more stable version of Hadoop is 0.20.100 branch, and you can download it
> from:
>
> http://people.apache.org/~eyang/
>
> Regards,
> Eric
>
> On 2/24/11 10:12 AM, "Corbin Hoenes" <corbin@tynt.com> wrote:
>
> Anyone seen this?
>
> /chukwa/postProcess/demuxOutputDir_1298061686862/_SUCCESS
>
> I clean them out and I keep getting the same file showing up and chukwa
> doesn't know how to handle it:
>
> postProcess.log:
> 2011-02-21 06:51:55,027 INFO main MoveToRepository - main procesing Cluster
> (_SUCCESS)
> 2011-02-21 06:51:55,027 INFO main MoveToRepository -
> processClutserDirectory (_SUCCESS,/chukwa/repos//_SUCCESS)
> 2011-02-21 06:51:55,028 WARN main PostProcessorManager - Error in
> processDemuxOutput:
> java.io.IOException:
> hdfs://cluster1/chukwa/postProcess/demuxOutputDir_1298061686862/_SUCCESS is
> not a directory!
>     at
> org.apache.hadoop.chukwa.extraction.demux.MoveToRepository.processClutserDirectory(MoveToRepository.java:54)
>     at
> org.apache.hadoop.chukwa.extraction.demux.MoveToRepository.main(MoveToRepository.java:250)
>     at
> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.movetoMainRepository(PostProcessorManager.java:201)
>     at
> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.start(PostProcessorManager.java:146)
>     at
> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.main(PostProcessorManager.java:80)
>
>
>

Mime
View raw message