pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-1891) Enable StoreFunc to make intelligent decision based on job success or failure
Date Sat, 01 Sep 2012 19:59:07 GMT

    [ https://issues.apache.org/jira/browse/PIG-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446793#comment-13446793

Eli Reisman commented on PIG-1891:

Now when I run my local machine tests with 'ant test-commit' on PIG-1891-3.patch + trunk,
I get this error (and only this error):

Testcase: testNumSamples took 22.016 sec
expected:<47> but was:<42>
junit.framework.AssertionFailedError: expected:<47> but was:<42>
	at org.apache.pig.test.TestPoissonSampleLoader.testNumSamples(TestPoissonSampleLoader.java:125)

I did not alter then number of allowed instantiations in the TestLoadStoreFuncLifeCycle test
for loads, just stores, so perhaps this set off a ripple effect of other problems, its odd
that the fail is in a loader. But I am unsure if this is directly related to this patch or
an existing problem you guys know about so i thought I'd post here before hunting it down.
Thanks again!

> Enable StoreFunc to make intelligent decision based on job success or failure
> -----------------------------------------------------------------------------
>                 Key: PIG-1891
>                 URL: https://issues.apache.org/jira/browse/PIG-1891
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.10.0
>            Reporter: Alex Rovner
>            Priority: Minor
>              Labels: patch
>         Attachments: PIG-1891-1.patch, PIG-1891-2.patch, PIG-1891-3.patch
> We are in the process of using PIG for various data processing and component integration.
Here is where we feel pig storage funcs lack:
> They are not aware if the over all job has succeeded. This creates a problem for storage
funcs which needs to "upload" results into another system:
> DB, FTP, another file system etc.
> I looked at the DBStorage in the piggybank (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup)
and what I see is essentially a mechanism which for each task does the following:
> 1. Creates a recordwriter (in this case open connection to db)
> 2. Open transaction.
> 3. Writes records into a batch
> 4. Executes commit or rollback depending if the task was successful.
> While this aproach works great on a task level, it does not work at all on a job level.

> If certain tasks will succeed but over job will fail, partial records are going to get
uploaded into the DB.
> Any ideas on the workaround? 
> Our current workaround is fairly ugly: We created a java wrapper that launches pig jobs
and then uploads to DB's once pig's job is successful. While the approach works, it's not
really integrated into pig.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message