Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9474C7B43 for ; Sun, 18 Sep 2011 06:19:20 +0000 (UTC) Received: (qmail 4086 invoked by uid 500); 18 Sep 2011 06:19:13 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 3918 invoked by uid 500); 18 Sep 2011 06:19:12 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 3674 invoked by uid 99); 18 Sep 2011 06:19:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Sep 2011 06:19:03 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE X-Spam-Check-By: apache.org Received-SPF: unknown (nike.apache.org: error in processing during lookup of Guy.Doulberg@conduit.com) Received: from [64.78.22.18] (HELO EXHUB017-3.exch017.msoutlookonline.net) (64.78.22.18) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Sep 2011 06:18:54 +0000 Received: from EXVMBX017-12.exch017.msoutlookonline.net ([64.78.22.53]) by EXHUB017-3.exch017.msoutlookonline.net ([64.78.22.18]) with mapi; Sat, 17 Sep 2011 23:18:33 -0700 From: Guy Doulberg To: "common-user@hadoop.apache.org" , "core-user@hadoop.apache.org" Date: Sat, 17 Sep 2011 23:16:51 -0700 Subject: RE: Creating a hive table for a custom log Thread-Topic: Creating a hive table for a custom log Thread-Index: Acx0p/DOaGa4HkwZS9CY5xv2pCqYrwBIeAAw Message-ID: <6AB151AD074C18409E0CA3CD8D43123063C6732D01@EXVMBX017-12.exch017.msoutlookonline.net> References: <32379849.post@talk.nabble.com> <32481457.post@talk.nabble.com> In-Reply-To: <32481457.post@talk.nabble.com> Accept-Language: he-IL, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: he-IL, en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org If it makes more sense you could also store your lines with with the defaul= t serde, and extract the you intend to query using a UDF For example you could use parse_url(string urlString, string partToExtract = [, string keyToExtract]) to parse url stuff.... Good luck -----Original Message----- From: Raimon Bosch [mailto:raimon.bosch@gmail.com]=20 Sent: Friday, September 16, 2011 10:36 PM To: core-user@hadoop.apache.org Subject: Re: Creating a hive table for a custom log Any Ideas?=20 The most common aproach will be writting your own serde and plug it to your hive like: http://code.google.com/p/hive-json-serde/ But I'm wondering if there is some work already done in this area. Raimon Bosch wrote: >=20 > Hi, >=20 > I'm trying to create a table similar to apache_log but I'm trying to avoi= d > to write my own map-reduce task because I don't want to have my HDFS file= s > twice. >=20 > So if you're working with log lines like this: >=20 > 186.92.134.151 [31/Aug/2011:00:10:41 +0000] "GET > /client/action1/?transaction_id=3D8002&user_id=3D871793100001248&ts=3D131= 4749223525&item1=3D271&item2=3D6045&environment=3D2 > HTTP/1.1" >=20 > 112.201.65.238 [31/Aug/2011:00:10:41 +0000] "GET > /client/action1/?transaction_id=3D9002&ts=3D1314749223525&user_id=3D90488= 71793100&item2=3D6045&item1=3D271&environment=3D2 > HTTP/1.1" >=20 > 90.45.198.251 [31/Aug/2011:00:10:41 +0000] "GET > /client/action2/?transaction_id=3D9022&ts=3D1314749223525&user_id=3D90488= 71793100&item2=3D6045&item1=3D271&environment=3D2 > HTTP/1.1" >=20 > And having in mind that the parameters could be in different orders. Whic= h > will be the best strategy to create this table? Write my own > org.apache.hadoop.hive.contrib.serde2? Is there any resource already > implemented that I could use to perform this task? >=20 > In the end the objective is convert all the parameters in fields and use > as type the "action". With this big table I will be able to perform my > queries, my joins or my views. >=20 > Any ideas? >=20 > Thanks in Advance, > Raimon Bosch. >=20 --=20 View this message in context: http://old.nabble.com/Creating-a-hive-table-f= or-a-custom-log-tp32379849p32481457.html Sent from the Hadoop core-user mailing list archive at Nabble.com.