hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Labour <matth...@actionx.com>
Subject Concatenate adjacent lines with hadoop
Date Wed, 27 Feb 2013 01:36:11 GMT
Hi

Please find below the issue I need to solve. Thank you in advance for your
help/ tips.

I have log files where sometimes log lines are splited (this happens when
the log line exceeds a specific length)

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX
Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
splitted
Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX

Can I "reconcile"/ "concatenate" splited log lines with a hadoop map reduce
job?

On other words, using a map reduce job, can I concatenate the 2 following
adjacent lines (provided that I 'detect' them)

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line is being
splitted
Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX

into

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX

Thank you!

Mime
View raw message