hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Labour <matth...@actionx.com>
Subject Re: Concatenate adjacent lines with hadoop
Date Wed, 27 Feb 2013 05:01:51 GMT
Thank you for your answer. I am not sure i understand fully. My email was
most likely not very clear. Here is an example of log line. Please note the
beginning of the log line YSLOGROW. Please note that the second line should
be concatenated with the first line.

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] YSLOGROW
20121216T214720.345Z
remote-addr=166.137.156.155&user-agent=Mozilla%2F5.0+%28Linux%3B+U%3B+Android+4.0.4%3B+en-us%3B+SAMSUNG-SGH-I717+Build%2FIMM76D%29+AppleWebKit%2F534.30+%28KHTML%2C+like+Gecko%29+Version%2F4.0+Mobile+Safari%2F534.30&referrer=http%3A%2F%
2Flp.mydas.mobi
%2F%2Frich%2Ffoundation%2FdynamicInterstitial%2Fint_launch.php%3Fmm_urid%3DWBNMMG9h4XmbJBUHbDrNWWWm%26mm_ipaddress%3D166.137.156.155%26mm_handset%3D8440%26mm_carrier%3D2%26mm_apid%3D78683%26mm_acid%3D1050500%26mm_osid%3D14%26mm_uip%3D166.137.156.155%26mm_ua%3DMozilla%252F5.0%2B%2528Linux%253B%2BU%253B%2BAndroid%2B4.0.4%253B%2Ben-us%253B%2BSAMSUNG-SGH-I717%2BBuild%252FIMM76D%2529%2BAppleWebKit%252F534.30%2B%2528KHTML%252C%2Blike%2BGecko%2529%2BVersion%252F4.0%2BMobile%2BSafari%252F534.30SAMSUNG-SGH-I717%26mtpid%3DUNKNOWN%26mm_msuid%3DUNKNOWN%26mm_mmisdk%3D4.6.0-12.07.16.a%26mm_mxsdk%3DUNKNOWN%26mm_dv%3DAndroid4.0.4%26mm_adtype%3DMMFullScreenAdTransition%26mm_hswd%3DUNKNOWN%26mm_dm%3DSAMSUNG-SGH-I717%26mm_hsht%3DUNKNOWN%26mm_auid%3Dmmi

Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
d_bd6b33dc569994102eaa60a060987d99e9_013b35a758bd%26mm_accelerometer%3Dtrue%26mm_lat%3DUNKNOWN%26mm_long%3DUNKNOWN%26mm_hpx%3D1280%26mm_wpx%3D800%26mm_density%3D2.0%26mm_dpi%3DUNKNOWN%26mm_campaignid%3D45695%26autoExpand%3Dtrue&query-string=ncid%3DWBNMMG9h4XmbJBUHbDrNWWWm
tr7y MLNL 1009 10034 3401 t4fx 10034 click


On Tue, Feb 26, 2013 at 9:39 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:

> That's easy, in your example,
>
> Map output key: FIELD-N ; Map output value: just original value.
> In the reduece: if there is  LOGTAG<TAB> in the value, then this is the
> first log entry. if not, this is a splitted log entry. just get a sub
> string and concat with the first log entry.
>
> Am I explain clearly?
>
>
>
> On Wed, Feb 27, 2013 at 9:36 AM, Matthieu Labour <matthieu@actionx.com>wrote:
>
>> Hi
>>
>> Please find below the issue I need to solve. Thank you in advance for
>> your help/ tips.
>>
>> I have log files where sometimes log lines are splited (this happens when
>> the log line exceeds a specific length)
>>
>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX
>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line
is being
>> splitted
>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>>
>> Can I "reconcile"/ "concatenate" splited log lines with a hadoop map
>> reduce job?
>>
>> On other words, using a map reduce job, can I concatenate the 2 following
>> adjacent lines (provided that I 'detect' them)
>>
>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log line
is being
>> splitted
>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>>
>> into
>>
>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>>
>> Thank you!
>>
>
>


-- 
Matthieu Labour, Engineering | *Action**X* |
584 Broadway, Suite 1002 – NY, NY 10012
415-994-3480 (m)

Mime
View raw message