hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1224) Calling "SELECT t.* from <table> AS t" to get meta information is too expensive for big tables
Date Sat, 28 Nov 2009 01:05:20 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783192#action_12783192

Aaron Kimball commented on MAPREDUCE-1224:

@Jeff Sqoop is already using the ResultSetMetaData associated with the query, rather than
trying to read the DatabaseMetaData directly. Especially when we eventually support arbitrary
user-supplied queries, this will be necessary. It can also be tricky to set all the parameters
for a DatabaseMetaData correctly in a generic way. But to get at ResultSetMetaData (which
definitely includes the proper typing information), a query must be submitted.

@Spenser This is a good catch and improvement! What database are you testing against? This
patch passes unit tests against HSQLDB, PostgreSQL, and Oracle, so +1 from me. 

For PostgreSQL and MySQL, Sqoop uses {{connection.setFetchSize()}} to specify a row-buffered
(rather than table-buffered) result, so it returns fast. But unfortunately, {{setFetchSize()}}
is, like everything else in JDBC, poorly specified, so there isn't a good way to do this generically.
This is a good way to ensure that the query returns quickly even if the database does not
respect a row-buffered connection.

> Calling "SELECT t.* from <table> AS t" to get meta information is too expensive
for big tables
> ----------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-1224
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/sqoop
>    Affects Versions: 0.20.1
>         Environment: all platforms, generic jdbc driver
>            Reporter: Spencer Ho
>         Attachments: MAPREDUCE-1224.patch, SqlManager.java
> The SqlManager uses the query, "SELECT t.* from <table> AS t" to get table spec
is too expensive for big tables, and it was called twice to generate column names and types.
 For tables that are big enough to be map-reduced, this is too expensive to make sqoop useful.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message