hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: InputFormat for dealing with log files.
Date Sun, 05 Oct 2014 20:32:27 GMT
Regex processing is not that slow - when adopting best practices.

This project provides better performance compared to that of Java's:
https://github.com/jruby/joni

Cheers

On Sun, Oct 5, 2014 at 1:18 PM, Guillermo Ortiz <gortiz@pragsis.com> wrote:

> I thought something like that,, but I guess it should be a little more
> complex because it should look for a pattern, maybe a date format? An idea
> it's if you know that the first 10 digits are the date, you could get them
> and try to match with a date format or something more generic like a RE,
> although it seems too expensive in time process and the operations in the
> InputFormat should be pretty fast.
>
> Any better idea?
>
> ------------------------------
> *De: *"Ted Yu" <yuzhihong@gmail.com>
> *Para: *"common-user@hadoop.apache.org" <user@hadoop.apache.org>
> *Enviados: *Domingo, 5 de Octubre 2014 16:27:18
> *Asunto: *Re: InputFormat for dealing with log files.
>
> Have you read http://blog.rguha.net/?p=293?
>
> Cheers
>
> On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz <gortiz@pragsis.com>
> wrote:
>
>>
>> I'd like to know if there's an InputFormat to be able to deal with log
>> files. The problem that I have it's that if I have to read an Tomcat log
>> for example, sometimes the exceptions are typed on several lines, but they
>> should be processed just like one line, I mean all the lines together to
>> the map.
>> Is there something like that implemented? I've been looking for, but I
>> don't find anything and I don't want to reinvent the wheel.
>> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al
>> mismo es privada y confidencial y va dirigida exclusivamente a su
>> destinatario. Pragsis informa a quien pueda haber recibido este correo por
>> error que contiene información confidencial cuyo uso, copia, reproducción o
>> distribución está expresamente prohibida. Si no es Vd. el destinatario del
>> mismo y recibe este correo por error, le rogamos lo ponga en conocimiento
>> del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo
>> de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information
>> contained in or attached to it are private and confidential and intended
>> exclusively for the addressee. Pragsis informs to whom it may receive it in
>> error that it contains privileged information and its use, copy,
>> reproduction or distribution is prohibited. If you are not an intended
>> recipient of this E-mail, please notify the sender, delete it and do not
>> read, act upon, print, disclose, copy, reta
>>  in or redistribute any portion of this E-mail.
>>
>
>
>
> AVISO CONFIDENCIAL
> Este correo y la información contenida o adjunta al mismo es privada y
> confidencial y va dirigida exclusivamente a su destinatario. Pragsis
> informa a quien pueda haber recibido este correo por error que contiene
> información confidencial cuyo uso, copia, reproducción o distribución está
> expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe
> este correo por error, le rogamos lo ponga en conocimiento del emisor y
> proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún
> modo.
> CONFIDENTIALITY WARNING.
> This message and the information contained in or attached to it are
> private and confidential and intended exclusively for the addressee.
> Pragsis informs to whom it may receive it in error that it contains
> privileged information and its use, copy, reproduction or distribution is
> prohibited. If you are not an intended recipient of this E-mail, please
> notify the sender, delete it and do not read, act upon, print, disclose,
> copy, retain or redistribute any portion of this E-mail.
>

Mime
View raw message