sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kathleen Ting (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SQOOP-474) Split-by specification incorrectly triggers bounding value query
Date Thu, 05 Apr 2012 22:32:25 GMT

     [ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kathleen Ting updated SQOOP-474:
--------------------------------

    Description: 
To reproduce this, run an import using a query with number of mappers set to 1 and a split-by
specification. For example:
{code}
$ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM
A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1
--m=1
{code}

This import will output the following:
{code}
12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID)
FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
{code}

An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix
if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax,
without which, the boundary query is harmless. The boundary query is being triggered because
of the split-by specification. However specifying split-by is redundant given that the number
of mappers is 1.

  was:
To reproduce this, run an import using a query with number of mappers set to 1 and a split-by
specification. For example:
{code}
$ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM
A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1
--m=1
{code}

This import will output the following:
{code}
12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID)
FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
{code}

The problem is that the bounding value query construction is being triggered because of the
--split-by specification. However specifying split-by is redundant given that the number of
mappers is 1.

    
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
>                 Key: SQOOP-474
>                 URL: https://issues.apache.org/jira/browse/SQOOP-474
>             Project: Sqoop
>          Issue Type: Bug
>          Components: build, connectors/generic
>    Affects Versions: 1.4.2-incubating
>            Reporter: Kathleen Ting
>            Assignee: Kathleen Ting
>         Attachments: SQOOP-474-1.patch, SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a
split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.*
FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1
--m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID),
MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
> {code}
> An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix
if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax,
without which, the boundary query is harmless. The boundary query is being triggered because
of the split-by specification. However specifying split-by is redundant given that the number
of mappers is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message