hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enis Soztutar <enis.soz.nu...@gmail.com>
Subject Re: How I should use hadoop to analyze my logs?
Date Fri, 15 Aug 2008 08:20:30 GMT
You can use chukwa, which is a contrib in the trunk for collecting log 
entries from web servers. You can run adaptors in the web servers, and a 
collector in the log server. The log entries may not be analyzed in real 
time, but it should be close to real time.
I suggest you use pig, for log data analysis.

Juho Mäkinen wrote:
> Hello,
> I'm looking how Hadoo could solve our datamining applications and I've
> come up with a few questions which I haven't found any answer yet.
> Our setup contains multiple diskless webserver frontends which
> generates log data. Each webserver hit generates an UDP packet which
> contains basically the same info than normal apache access log line
> (url, return code, client ip, timestamp etc). The udp packet is
> receivered by a log server. I would want to run map/reduce processed
> on the log data at the same time when the servers are generating new
> data. I was planning that each day would have it's own file in HDFS
> which contains all log entries for that day.
> How I should use hadoop and HDFS to write each log entry to a file? I
> was planning that I would create a class which contains request
> attributes (url, return code, client ip etc) and use this as the
> value. I did not found any info how this could be done with HDFS. The
> api seems to support arbitary objects as both key and value, but there
> was no example how to do this.
> How will Hadoop handle the concurrency with the writes and the reads?
> The servers will generate log entries around the clock. I also want to
> analyse the log entries at the same time when the servers are
> generating new data. How I can do this? The HDFS architecture page
> tells that the client writes the data first into a local file and once
> the file has reached the block size, the file will be transferred to
> the HDFS storage nodes and the client writes the following data to
> another local file. Is it possible to read the blocks already
> transferred to the HDFS using the map/reduce processes and write new
> blocks to the same file at the same time?
> Thanks in advance,
>  - Juho Mäkinen

View raw message