hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: any suggestions on IIS log storage and analysis?
Date Mon, 30 Dec 2013 09:13:11 GMT
You can run a mapreduce firstly, Join these data sets into one data set.
then analyze the joined dataset.

On Mon, Dec 30, 2013 at 3:58 PM, Fengyun RAO <raofengyun@gmail.com> wrote:

> Hi,
> HDFS splits files into blocks, and mapreduce runs a map task for each
> block. However, Fields could be changed in IIS log files, which means
> fields in one block may depend on another, and thus make it not suitable
> for mapreduce job. It seems there should be some preprocess before storing
> and analyzing the IIS log files. We plan to parse each line to the same
> fields and store in Avro files with compression. Any other alternatives?
> Hbase?  or any suggestions on analyzing IIS log files?
> thanks!

View raw message