hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: local node Quotas (for an R&D cluster)
Date Fri, 25 Sep 2009 15:04:46 GMT
Paul Smith wrote:
> On 25/09/2009, at 8:55 PM, Steve Loughran wrote:

>> I'd love to see more direct Log4J/Hadoop integration, such as a 
>> standardised log4j-in-hadoop format that was easily readable, included 
>> stack traces on exceptions, etc, and came with some sample mapreducer 
>> or pig scripts to analyse.
> I have been mulling over just that sort of thing.  Some sort of 
> HadoopAppender that outputs files in the SequenceFile format and 
> periodically submits to a DFS node.  Hey,   this is something I _can_ 
> contribute!  I might start another thread.  Thanks.
> Paul

Have a look at the Chukwa work, think how you could use that style of 
distributed aggregation -and what analysis you could do with it. One 
thing to consider is there is no reason why you can't use Chukwa on HDFS 
to monitor non-hadoop applications -anything that runs in the datacentre 
is a source of log data, whether it is Tomcat or Jetty, or the back end 

We welcome your contribution. I even volunteer to committing the code, 
if it doesn't get vetoed by everyone else.

One big difference between Hadoop and, say,  Log4J, is that Facebook and 
Yahoo! know that Hadoop is a Line-of-Business application. If any patch 
were to lose their data, they would cease to exist. So they worry about 
patches in a way that no other ASF project I've ever come across does. 
There's no correction of spelling mistakes in variable names here, not 
without a proper patch and review.  That makes it less agile than many 
other apache projects, but what it does ensure is that the overall 
system is good to use, that you really can trust your many TB of data to 
the filesystem. Nothing else in apache-land has this reponsibility. 
HTTPD has security reponsibilities, but note even that project has to 
worry about preserving 14PB of file system data during an upgrade.


View raw message