hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: Concatenate adjacent lines with hadoop
Date Wed, 27 Feb 2013 02:39:56 GMT
That's easy, in your example,

Map output key: FIELD-N ; Map output value: just original value.
In the reduece: if there is  LOGTAG<TAB> in the value, then this is the
first log entry. if not, this is a splitted log entry. just get a sub
string and concat with the first log entry.

Am I explain clearly?



On Wed, Feb 27, 2013 at 9:36 AM, Matthieu Labour <matthieu@actionx.com>wrote:

> Hi
>
> Please find below the issue I need to solve. Thank you in advance for your
> help/ tips.
>
> I have log files where sometimes log lines are splited (this happens when
> the log line exceeds a specific length)
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is
being
> splitted
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>
> Can I "reconcile"/ "concatenate" splited log lines with a hadoop map
> reduce job?
>
> On other words, using a map reduce job, can I concatenate the 2 following
> adjacent lines (provided that I 'detect' them)
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is
being
> splitted
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>
> into
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>
> Thank you!
>

Mime
View raw message