hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
Date Thu, 15 Apr 2010 09:56:52 GMT

    [ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857253#action_12857253
] 

Ankur commented on PIG-1229:
----------------------------

So I read the complete thread and here are my thoughts:-

- Speculative execution issue : With recent changes of moving to Hadoop's I/O format in Load/Store,
DBStorage has been modified to commit the data to DB in OutputCommitter's 
commitTask() method.   Hadoop itself gaurantees that the method will be called only for first
successful attempt so it shouldn't matter whether or not speculative execution is on. 
BUT this does NOT solve the problem where certain tasks finished successfully but the JOB
itself failed in which case the data from successful attempts should be rolled back.

- Writing to Temporary Table: Even this does not handle the case the above case since some
of the tasks would have moved their data to the actual table.

- Bulk loading : This is the most suitable option in my opinion if the data is large. However
for small to medium data size (like aggregate summaries), I found DBStorage UDF to be most
helpful. 
It just eliminates one more layer of processing from the application. In fact this was precisely
the reason it was written for.

So in a nutshell, using a single mapper/reducer with this patch should be good regardless
of speculative execution being off/on. In case of multiple mappers/reducers writing to DB
it should be application's
responsibility to cleanup data ONLY IN CASE of job failure.

> allow pig to write output into a JDBC db
> ----------------------------------------
>
>                 Key: PIG-1229
>                 URL: https://issues.apache.org/jira/browse/PIG-1229
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Ian Holsman
>            Assignee: Ankur
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message