Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 731A3EB33 for ; Wed, 27 Feb 2013 02:40:27 +0000 (UTC) Received: (qmail 50927 invoked by uid 500); 27 Feb 2013 02:40:22 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 50824 invoked by uid 500); 27 Feb 2013 02:40:22 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 50816 invoked by uid 99); 27 Feb 2013 02:40:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Feb 2013 02:40:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of azuryyyu@gmail.com designates 209.85.223.181 as permitted sender) Received: from [209.85.223.181] (HELO mail-ie0-f181.google.com) (209.85.223.181) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Feb 2013 02:40:17 +0000 Received: by mail-ie0-f181.google.com with SMTP id 17so80284iea.12 for ; Tue, 26 Feb 2013 18:39:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=BdPZ8t6sY9TEBr8vUmGGuWDesJ67UyFUcEEl9xGrqyU=; b=iFEK/NPpVvStuLZFy4BpPWmwskrZI4oy3dxEUBiz6Jy1fK1mx0zhI/CAoAfFEu6/qa hgFIHZAOy0/OH+SoNGfQjq5ndC6usXh0IzNC5JG8bDB5Ekuu6nVg39NUYiFGg2LesT9J 5lcoiLS1Pug+At1AZkz+xx8D0xI/K6u9FaYHy87LLrbNbbzhNWuQGDDOCtHPyM6ARdQ3 O7N1ff2+uCcLNJy+H9WprGXFXaqRTSyfo+j9G4b5bEZqbWu4nSiluQolcRjjG/ClZiLW CcN3aJtS1FEk/qxRZLlWf/39YF6VOo75He9QDL5YE09Whzwk0WtpGC1Q3jodm3Le45KP yaBA== MIME-Version: 1.0 X-Received: by 10.50.42.168 with SMTP id p8mr232154igl.106.1361932796706; Tue, 26 Feb 2013 18:39:56 -0800 (PST) Received: by 10.64.26.70 with HTTP; Tue, 26 Feb 2013 18:39:56 -0800 (PST) In-Reply-To: References: Date: Wed, 27 Feb 2013 10:39:56 +0800 Message-ID: Subject: Re: Concatenate adjacent lines with hadoop From: Azuryy Yu To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae934108d8e45c504d6abb0d5 X-Virus-Checked: Checked by ClamAV on apache.org --14dae934108d8e45c504d6abb0d5 Content-Type: text/plain; charset=ISO-8859-1 That's easy, in your example, Map output key: FIELD-N ; Map output value: just original value. In the reduece: if there is LOGTAG in the value, then this is the first log entry. if not, this is a splitted log entry. just get a sub string and concat with the first log entry. Am I explain clearly? On Wed, Feb 27, 2013 at 9:36 AM, Matthieu Labour wrote: > Hi > > Please find below the issue I need to solve. Thank you in advance for your > help/ tips. > > I have log files where sometimes log lines are splited (this happens when > the log line exceeds a specific length) > > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > LOGTAGFIELD-0....FIELD-MAX > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > LOGTAGFIELD-0....FIELD-N <======= log line is being > splitted > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > FIELD-NFIELD-N+1 .....FIELD-MAX > > Can I "reconcile"/ "concatenate" splited log lines with a hadoop map > reduce job? > > On other words, using a map reduce job, can I concatenate the 2 following > adjacent lines (provided that I 'detect' them) > > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > LOGTAGFIELD-0....FIELD-N <======= log line is being > splitted > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > FIELD-NFIELD-N+1 .....FIELD-MAX > > into > > Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] > LOGTAGFIELD-0....FIELD-NFIELD-N+1 .....FIELD-MAX > > Thank you! > --14dae934108d8e45c504d6abb0d5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

That's easy, in your example,

=A0

=

Map output key: FIELD-N=A0; Map output value: just original value.

In the reduece: if there is=A0 LOGTAG<TAB> in the value, then = this is the first log entry. if not, this is a splitted log entry. just get= a sub string and concat with the first log entry.

=A0

Am I explain clearly?

=A0

On Wed, Feb 27, 2013 a= t 9:36 AM, Matthieu Labour <matthieu@actionx.com> wrote:<= br>

Hi

Please find below the = issue I need to solve. Thank you in advance for your help/ tips.

I have log files where sometimes log lines are splited (this= happens when the log line exceeds a specific length)

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 = app[web.3] LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX
Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3= ] LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N =A0 =A0 =A0<= =3D=3D=3D=3D=3D=3D=3D log line is being splitted

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] FIEL= D-N<TAB>FIELD-N+1 .....FIELD-MAX

Can I= "reconcile"/ "concatenate" splited log lines with a ha= doop map reduce job?

On other words, using a map reduce job, can I concatena= te the 2 following adjacent lines (provided that I 'detect' them)= =A0

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a= -04e821a42cb6 app[web.3] LOGTAG<TAB>FIELD-0<TAB>....<TAB>= FIELD-N =A0 =A0 =A0<=3D=3D=3D=3D=3D=3D=3D log line is being splitted
Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] FIEL= D-N<TAB>FIELD-N+1 .....FIELD-MAX

into<= /div>

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a4= 2cb6 app[web.3] LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N&l= t;TAB>FIELD-N+1 .....FIELD-MAX

Thank you!

--14dae934108d8e45c504d6abb0d5--