Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E481D106B3 for ; Mon, 30 Dec 2013 08:00:04 +0000 (UTC) Received: (qmail 23547 invoked by uid 500); 30 Dec 2013 07:59:39 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 23437 invoked by uid 500); 30 Dec 2013 07:59:26 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 23426 invoked by uid 99); 30 Dec 2013 07:59:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Dec 2013 07:59:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of raofengyun@gmail.com designates 209.85.219.49 as permitted sender) Received: from [209.85.219.49] (HELO mail-oa0-f49.google.com) (209.85.219.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Dec 2013 07:59:18 +0000 Received: by mail-oa0-f49.google.com with SMTP id i4so11516817oah.8 for ; Sun, 29 Dec 2013 23:58:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=6Pai5SxmPbohSsXG59mZWaW4B3nWKd8+wmNn9OAreRw=; b=quHi7LYYv6aUsMuqkF+w8Y+B/I7cRYjGpJGbCn5HsU6fOJ4biZW8WPX8Svn7hBd4D/ 6b/0SRBjY3durJVjSFH/U/dIFruKy1L/4o5KhG2D9+H60S/yjhFFdgKi5n5H4kFHU+Cb 5VkLmJMiod/ZQf2X0D8yz68A0BAbenrdOtPBHJ2faypRPqkMsEYmGUuDp/83Ir6bN7FS KZ1NefhS9jSsc9BhVNlcb4O+buxxZVIBcwiqR2DuXEcoJEc42WI9kur3ikQSWwLDbPjy SCOZF2YuCgtjBl27W0n+hZ3vf0l89dKOTbgja6AeWsg2mS6GmLJvM9Dj/exBAcQzSZw6 RcJQ== MIME-Version: 1.0 X-Received: by 10.60.115.138 with SMTP id jo10mr1052365oeb.71.1388390337750; Sun, 29 Dec 2013 23:58:57 -0800 (PST) Received: by 10.60.21.138 with HTTP; Sun, 29 Dec 2013 23:58:57 -0800 (PST) Date: Mon, 30 Dec 2013 15:58:57 +0800 Message-ID: Subject: any suggestions on IIS log storage and analysis? From: Fengyun RAO To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e01161d26e4342604eebbd03a X-Virus-Checked: Checked by ClamAV on apache.org --089e01161d26e4342604eebbd03a Content-Type: text/plain; charset=ISO-8859-1 Hi, HDFS splits files into blocks, and mapreduce runs a map task for each block. However, Fields could be changed in IIS log files, which means fields in one block may depend on another, and thus make it not suitable for mapreduce job. It seems there should be some preprocess before storing and analyzing the IIS log files. We plan to parse each line to the same fields and store in Avro files with compression. Any other alternatives? Hbase? or any suggestions on analyzing IIS log files? thanks! --089e01161d26e4342604eebbd03a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,

HDFS splits files into blocks, and = mapreduce runs a map task for each block. However, Fields could be changed = in IIS log files, which means fields in one block may depend on another, an= d thus make it not suitable for mapreduce job. It seems there should be som= e preprocess before storing and analyzing the IIS log files. We plan to par= se each line to the same fields and store in Avro files with compression. A= ny other alternatives? Hbase? =A0or any suggestions on analyzing IIS log fi= les?

thanks!


--089e01161d26e4342604eebbd03a--