hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Hargraves (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression
Date Thu, 01 Jul 2010 19:32:50 GMT

    [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884389#action_12884389
] 

Christian Hargraves commented on PIG-1449:
------------------------------------------

I ran into this issue last night and before seeing this bug, I fixed it. My fix is similar
to the previous, but it includes a unit test. Hopefully, the test will help move this in more
quickly. I notice that it takes over 4 minutes to run the unit tests. Would be any added value
in trying to reduce the execution time in these tests? If there's any interest, I might be
able to help.

> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>         Attachments: RegExLoader.patch
>
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay
in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if
they didn't match the regular expression.  The result is the mapper will not respond and will
time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)')
AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The
mapper hangs at 33% since it can process the first line but gets stuck into the while loop
on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message