sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Jarcec Cecho (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1601) Sqoop2: To part of the Connector API to support balancing/ re-partioning step
Date Wed, 22 Oct 2014 17:10:34 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180183#comment-14180183
] 

Jarek Jarcec Cecho commented on SQOOP-1601:
-------------------------------------------

I've just pressed  the "Vote for this issue" button as well :)

> Sqoop2: To part of the Connector API to support balancing/ re-partioning step
> -----------------------------------------------------------------------------
>
>                 Key: SQOOP-1601
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1601
>             Project: Sqoop
>          Issue Type: Task
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>
> Today the job lifecycle of the SQOOP looks like this.
> to recap:
> Step 1 : Intializers for the sources both from/ to
> Step 2 : Partitioner ( for the data from the FROM data source )
> Step 3 : Extractor ( actual reading from the FROM data source)
> Step 4: Loader ( for the TO datasource, i.e writing data to)
> Step 5: Destroyer for both the sources
> Both Extractors and Loaders are parallelized in themselves, so we can say the numExtractors
and numLoaders to use via the driver config.
> But in cases when there is imbalance between the extractors and loaders, we may need
a intermediate step to rebalance/ repartition or shuffle as the writing is happening in the
Loaders.  Today we do not support this step, might be good to provide another step that may
be relevant for some connectors to add for better control on the load step.
> Whether this step can be generic one that can operate/ transform the output as it is
written to the TO data source, we should discuss that in addition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message