Return-Path: X-Original-To: apmail-sqoop-dev-archive@www.apache.org Delivered-To: apmail-sqoop-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 636E51777A for ; Wed, 22 Oct 2014 18:04:34 +0000 (UTC) Received: (qmail 89705 invoked by uid 500); 22 Oct 2014 18:04:34 -0000 Delivered-To: apmail-sqoop-dev-archive@sqoop.apache.org Received: (qmail 89671 invoked by uid 500); 22 Oct 2014 18:04:34 -0000 Mailing-List: contact dev-help@sqoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@sqoop.apache.org Delivered-To: mailing list dev@sqoop.apache.org Received: (qmail 89658 invoked by uid 99); 22 Oct 2014 18:04:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Oct 2014 18:04:34 +0000 Date: Wed, 22 Oct 2014 18:04:34 +0000 (UTC) From: "Jarek Jarcec Cecho (JIRA)" To: dev@sqoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SQOOP-1603) Sqoop2: Explicit support for Merge in the Sqoop Job lifecyle MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SQOOP-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180255#comment-14180255 ] Jarek Jarcec Cecho commented on SQOOP-1603: ------------------------------------------- I guess that you can blame me for this one for choosing confusing name :) Even thought that we're calling it a "destroyer" it's modeled after MapReduce's OutputCommitter, where you are enabled to do any "finish" work for the transfer. "Committing data", moving them from temporary directories to final one are expected operations. > Sqoop2: Explicit support for Merge in the Sqoop Job lifecyle > -------------------------------------------------------------- > > Key: SQOOP-1603 > URL: https://issues.apache.org/jira/browse/SQOOP-1603 > Project: Sqoop > Issue Type: Bug > Reporter: Veena Basavaraj > Assignee: Qian Xu > > This ticket was created while reviewing the Kite Connector use case where the destroyer does the actual temp data set merge > https://reviews.apache.org/r/26963/diff/# [~stanleyxu2005] > {code} > public void destroy(DestroyerContext context, LinkConfiguration link, > ToJobConfiguration job) { > LOG.info("Running Kite connector destroyer"); > // Every loader instance creates a temporary dataset. If the MR job is > // successful, all temporary dataset should be merged as one dataset, > // otherwise they should be deleted all. > String[] uris = KiteDatasetExecutor.listTemporaryDatasetUris( > job.toDataset.uri); > if (context.isSuccess()) { > KiteDatasetExecutor executor = new KiteDatasetExecutor(job.toDataset.uri, > context.getSchema(), link.link.fileFormat); > for (String uri : uris) { > executor.mergeDataset(uri); > LOG.info(String.format("Temporary dataset %s merged", uri)); > } > } else { > for (String uri : uris) { > KiteDatasetExecutor.deleteDataset(uri); > LOG.info(String.format("Temporary dataset %s deleted", uri)); > } > } > } > {code} > Wondering if such things should be its own phase rather than in destroyers. The responsibility of destroyer is more to clean up/ closing/ anything thats pretty much destroying, should such operations that modify records its own step. -- This message was sent by Atlassian JIRA (v6.3.4#6332)