incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Rabkin <asrab...@gmail.com>
Subject Re: _SUCCESS files appearing in demuxOutput
Date Fri, 25 Feb 2011 00:27:21 GMT
I don't think this has been fixed yet in trunk, let alone 0.4.

I would support skipping everything starting with _.  Is there an
actual use case this would break?

--Ari

On Thu, Feb 24, 2011 at 4:24 PM, Corbin Hoenes <corbin@tynt.com> wrote:
> We're using Cloudera's CDH3 beta 4 release.   Maybe they've patched in the
> FileOutputCommitter stuff into their release as it's based on Hadoop 0.20.2
> Looking at the source for Chukwa 0.3 (version we are on) the
> MoveToRepository class skips the _log and _temporary directories.
>
> Seems like Chukwa should skip the _SUCCESS directory as well?  Or could a
> more general skip be used like skip anything starting with and underscore?
> (maybe too aggressive).
>
> Does Chukwa 0.4 or 0.5 fix this issue? (I'm probably going to just have to
> patch 0.3 but maybe just another reason to upgrade.)
>
> On Thu, Feb 24, 2011 at 12:20 PM, Jerome Boulon <jboulon@netflix.com> wrote:
>>
>> This filename is coming from
>> here: http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/constant-values.html
>> In general for hadoop you may want to avoid looking at any "_*" file since
>> those are Hadoop related files like (_temporary, _log,…)
>> /Jerome.
>> From: Eric Yang <eyang@yahoo-inc.com>
>> Reply-To: "chukwa-user@incubator.apache.org"
>> <chukwa-user@incubator.apache.org>
>> Date: Thu, 24 Feb 2011 10:55:57 -0800
>> To: "chukwa-user@incubator.apache.org" <chukwa-user@incubator.apache.org>
>> Subject: Re: _SUCCESS files appearing in demuxOutput
>>
>> Hi Corbin,
>>
>> I have not seen this.  What is the version of hadoop that you are using,
>> are you using 0.21?  It looks like the _SUCCESS file is spill out after
>> demux mapreduce job.  There are two possibilities leading to the creation of
>> this file.  Demux is modified and it is doing something that is unexpected,
>> or the mapreduce framework 0.21 put that file there.
>> If you are using 0.21, I would recommend to avoid it.
>>
>> A more stable version of Hadoop is 0.20.100 branch, and you can download
>> it from:
>>
>> http://people.apache.org/~eyang/
>>
>> Regards,
>> Eric
>>
>> On 2/24/11 10:12 AM, "Corbin Hoenes" <corbin@tynt.com> wrote:
>>
>> Anyone seen this?
>>
>> /chukwa/postProcess/demuxOutputDir_1298061686862/_SUCCESS
>>
>> I clean them out and I keep getting the same file showing up and chukwa
>> doesn't know how to handle it:
>>
>> postProcess.log:
>> 2011-02-21 06:51:55,027 INFO main MoveToRepository - main procesing
>> Cluster (_SUCCESS)
>> 2011-02-21 06:51:55,027 INFO main MoveToRepository -
>> processClutserDirectory (_SUCCESS,/chukwa/repos//_SUCCESS)
>> 2011-02-21 06:51:55,028 WARN main PostProcessorManager - Error in
>> processDemuxOutput:
>> java.io.IOException:
>> hdfs://cluster1/chukwa/postProcess/demuxOutputDir_1298061686862/_SUCCESS is
>> not a directory!
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.MoveToRepository.processClutserDirectory(MoveToRepository.java:54)
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.MoveToRepository.main(MoveToRepository.java:250)
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.movetoMainRepository(PostProcessorManager.java:201)
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.start(PostProcessorManager.java:146)
>>     at
>> org.apache.hadoop.chukwa.extraction.demux.PostProcessorManager.main(PostProcessorManager.java:80)
>>
>>
>
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Mime
View raw message