hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: Concatenate adjacent lines with hadoop
Date Wed, 27 Feb 2013 05:16:32 GMT
I just noticed your two lines are all started with: Dec 16 21:47:20
d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app

does that different for other lines? if your answer is yes, then just using
this prefix as map output key.


On Wed, Feb 27, 2013 at 1:01 PM, Matthieu Labour <matthieu@actionx.com>wrote:

> Thank you for your answer. I am not sure i understand fully. My email was
> most likely not very clear. Here is an example of log line. Please note the
> beginning of the log line YSLOGROW. Please note that the second line should
> be concatenated with the first line.
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3] YSLOGROW
> 20121216T214720.345Z
> remote-addr=166.137.156.155&user-agent=Mozilla%2F5.0+%28Linux%3B+U%3B+Android+4.0.4%3B+en-us%3B+SAMSUNG-SGH-I717+Build%2FIMM76D%29+AppleWebKit%2F534.30+%28KHTML%2C+like+Gecko%29+Version%2F4.0+Mobile+Safari%2F534.30&referrer=http%3A%2F%
> 2Flp.mydas.mobi
> %2F%2Frich%2Ffoundation%2FdynamicInterstitial%2Fint_launch.php%3Fmm_urid%3DWBNMMG9h4XmbJBUHbDrNWWWm%26mm_ipaddress%3D166.137.156.155%26mm_handset%3D8440%26mm_carrier%3D2%26mm_apid%3D78683%26mm_acid%3D1050500%26mm_osid%3D14%26mm_uip%3D166.137.156.155%26mm_ua%3DMozilla%252F5.0%2B%2528Linux%253B%2BU%253B%2BAndroid%2B4.0.4%253B%2Ben-us%253B%2BSAMSUNG-SGH-I717%2BBuild%252FIMM76D%2529%2BAppleWebKit%252F534.30%2B%2528KHTML%252C%2Blike%2BGecko%2529%2BVersion%252F4.0%2BMobile%2BSafari%252F534.30SAMSUNG-SGH-I717%26mtpid%3DUNKNOWN%26mm_msuid%3DUNKNOWN%26mm_mmisdk%3D4.6.0-12.07.16.a%26mm_mxsdk%3DUNKNOWN%26mm_dv%3DAndroid4.0.4%26mm_adtype%3DMMFullScreenAdTransition%26mm_hswd%3DUNKNOWN%26mm_dm%3DSAMSUNG-SGH-I717%26mm_hsht%3DUNKNOWN%26mm_auid%3Dmmi
>
> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
> d_bd6b33dc569994102eaa60a060987d99e9_013b35a758bd%26mm_accelerometer%3Dtrue%26mm_lat%3DUNKNOWN%26mm_long%3DUNKNOWN%26mm_hpx%3D1280%26mm_wpx%3D800%26mm_density%3D2.0%26mm_dpi%3DUNKNOWN%26mm_campaignid%3D45695%26autoExpand%3Dtrue&query-string=ncid%3DWBNMMG9h4XmbJBUHbDrNWWWm
> tr7y MLNL 1009 10034 3401 t4fx 10034 click
>
>
> On Tue, Feb 26, 2013 at 9:39 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:
>
>> That's easy, in your example,
>>
>> Map output key: FIELD-N ; Map output value: just original value.
>> In the reduece: if there is  LOGTAG<TAB> in the value, then this is the
>> first log entry. if not, this is a splitted log entry. just get a sub
>> string and concat with the first log entry.
>>
>> Am I explain clearly?
>>
>>
>>
>> On Wed, Feb 27, 2013 at 9:36 AM, Matthieu Labour <matthieu@actionx.com>wrote:
>>
>>> Hi
>>>
>>> Please find below the issue I need to solve. Thank you in advance for
>>> your help/ tips.
>>>
>>> I have log files where sometimes log lines are splited (this happens
>>> when the log line exceeds a specific length)
>>>
>>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-MAX
>>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log
line is being
>>> splitted
>>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>>> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>>>
>>> Can I "reconcile"/ "concatenate" splited log lines with a hadoop map
>>> reduce job?
>>>
>>> On other words, using a map reduce job, can I concatenate the 2
>>> following adjacent lines (provided that I 'detect' them)
>>>
>>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N      <======= log
line is being
>>> splitted
>>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>>> FIELD-N<TAB>FIELD-N+1 .....FIELD-MAX
>>>
>>> into
>>>
>>> Dec 16 21:47:20 d.14b48e47-abf2-403e-8a1a-04e821a42cb6 app[web.3]
>>> LOGTAG<TAB>FIELD-0<TAB>....<TAB>FIELD-N<TAB>FIELD-N+1
.....FIELD-MAX
>>>
>>> Thank you!
>>>
>>
>>
>
>
> --
> Matthieu Labour, Engineering | *Action**X* |
> 584 Broadway, Suite 1002 – NY, NY 10012
> 415-994-3480 (m)
>

Mime
View raw message