sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Jarcec Cecho (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1603) Sqoop2: Explicit support for Merge in the Sqoop Job lifecyle
Date Wed, 22 Oct 2014 18:04:34 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180255#comment-14180255
] 

Jarek Jarcec Cecho commented on SQOOP-1603:
-------------------------------------------

I guess that you can blame me for this one for choosing confusing name :) Even thought that
we're calling it a "destroyer" it's modeled after MapReduce's OutputCommitter, where you are
enabled to do any "finish" work for the transfer. "Committing data", moving them from temporary
directories to final one are expected operations.

>  Sqoop2:  Explicit support for Merge in the Sqoop Job lifecyle
> --------------------------------------------------------------
>
>                 Key: SQOOP-1603
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1603
>             Project: Sqoop
>          Issue Type: Bug
>            Reporter: Veena Basavaraj
>            Assignee: Qian Xu
>
> This ticket was created while reviewing the Kite Connector use case where the destroyer
does the actual temp data set merge
> https://reviews.apache.org/r/26963/diff/# [~stanleyxu2005]
> {code}
> public void destroy(DestroyerContext context, LinkConfiguration link,
>       ToJobConfiguration job) {
>     LOG.info("Running Kite connector destroyer");
>     // Every loader instance creates a temporary dataset. If the MR job is
>     // successful, all temporary dataset should be merged as one dataset,
>     // otherwise they should be deleted all.
>     String[] uris = KiteDatasetExecutor.listTemporaryDatasetUris(
>         job.toDataset.uri);
>     if (context.isSuccess()) {
>       KiteDatasetExecutor executor = new KiteDatasetExecutor(job.toDataset.uri,
>           context.getSchema(), link.link.fileFormat);
>       for (String uri : uris) {
>         executor.mergeDataset(uri);
>         LOG.info(String.format("Temporary dataset %s merged", uri));
>       }
>     } else {
>       for (String uri : uris) {
>         KiteDatasetExecutor.deleteDataset(uri);
>         LOG.info(String.format("Temporary dataset %s deleted", uri));
>       }
>     }
>   }
> {code}
> Wondering if such things should be its own phase rather than in destroyers. The responsibility
of destroyer is more to clean up/ closing/ anything thats pretty much destroying, should such
operations that modify records its own step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message