hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: [jira] Commented: (HADOOP-92) Error Reporting/logging in MapReduce
Date Wed, 22 Mar 2006 19:39:22 GMT
this might be a reasonable short term solution.  Although it adds a  
lot of complexity.

I was assuming master aggregation.  This clearly could be a burden  
(good point), although if we simply log launch, completion and  
failure events, that should be ok.  Maybe we should stick to that?   
Or only record failure logs by default?

Another approach would be to launch a job to reap entries whenever  
there get to be a large number, and just concatenate them together.   
Say concatenate the smallest 100 together whenever we get to 200?

On Mar 22, 2006, at 9:21 AM, Stefan Groschupf wrote:

> Hi,
>
> In case we would be able to query a host that runs a specific  
> maprunnable from the jobtracker,
> we would be able to run one logging server as map task and  
> tasktrackers can send log messages to this logging server.
> From my point of view this would be easier to implement than  
> multiple writers to one dfs file.
>
> Just my 2 cents.
> Greetings,
> Stefan
>
>
> Am 22.03.2006 um 18:10 schrieb Yoram Arnon:
>
>> DFS files can only be written once, and by a single writer.
>> Until that changes our hands are tied, as long as we require the  
>> output to
>> reside in the output directory.
>>
>> Unless... we create a protocol whereby the task masters report up  
>> to the job
>> master, and it's only the job master that does the logging.
>> That might introduce unwanted overhead and some load on the job  
>> master.
>>
>>
>>> -----Original Message-----
>>> From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com]
>>> Sent: Tuesday, March 21, 2006 8:54 PM
>>> To: hadoop-dev@lucene.apache.org
>>> Subject: Re: [jira] Commented: (HADOOP-92) Error Reporting/ 
>>> logging in
>>> MapReduce
>>>
>>> Will it really make sense to have 300,000 subdirectories with  
>>> several
>>> log files?  Seems like a real loosing proposition.  I'd just go  
>>> for a
>>> single log file with reasonable per line prefixes (time, job, ...).
>>>
>>> Then you can grep out what you want.
>>
>>
>>
>
> ---------------------------------------------------------------
> company:        http://www.media-style.com
> forum:        http://www.text-mining.org
> blog:            http://www.find23.net
>
>


Mime
View raw message