hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-593) RegExLoader stops an non-matching line
Date Fri, 18 Sep 2009 18:02:16 GMT

     [ https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Gates updated PIG-593:
---------------------------

    Resolution: Duplicate
        Status: Resolved  (was: Patch Available)

Looks like this issue has already been addressed with a separate patch.

> RegExLoader stops an non-matching line
> --------------------------------------
>
>                 Key: PIG-593
>                 URL: https://issues.apache.org/jira/browse/PIG-593
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.1.0
>            Reporter: Vadim Zaliva
>            Priority: Minor
>         Attachments: PIG-593.diff
>
>
> Class RegExLoader and all its subclasses stop if some of lines does not match provided
regular expression.
> In particular, I have noticed this when CombinedLogLoader stopped at the following line:
> 58.210.62.24 - - [29/Dec/2008:23:06:57 -0800] "GET /tor/browse/?id=24746&rel=FLY
> 999%40Jack's+Teen+America+22%2FFLY999原創%40單掛D.C.資訊交流網+Jack's+Teen+Ameri
> ca+22+cd1.avi HTTP/1.1" 8952 200 "http://img252.imageshack.us/tor/browse/?id=247
> 46&rel=FLY999%40Jack%27s+Teen+America+22" "Mozilla/4.0 (compatible; MSIE 6.0; Wi
> ndows NT 5.1; )" "-"
> Looks like some japanese characters here do not match \S expression used.  
> In general I expect it to skip such lines, not to stop processing data file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message