hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Spencer Ho (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1224) Calling "SELECT t.* from <table> AS t" to get meta information is too expensive for big tables
Date Fri, 20 Nov 2009 23:01:39 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Spencer Ho updated MAPREDUCE-1224:

    Attachment: SqlManager.java

The original code from line 66 to 68 of SqlManager was

protected String getColNamesQuery(String tableName) {
    return "SELECT t.* FROM " + tableName + " AS t";

As this method was invoked three times in the code to generated column name and type information,
it queries the database three times.  For a large table, it makes the whole loading work to
query the whole table four time.

The change made is to add an always-false where clause that forces db to return zero-size
result set yet with meta data. (from line 66 to 69)

  protected String getColNamesQuery(String tableName) {
    // adding where clause to prevent loading a big table
    return "SELECT t.* FROM " + tableName + " AS t WHERE 1=0";

The execution time for retrieving one of the large tables we have reduced from 40 minutes
to 11 minutes.

> Calling "SELECT t.* from <table> AS t" to get meta information is too expensive
for big tables
> ----------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-1224
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/sqoop
>    Affects Versions: 0.20.1
>         Environment: all platforms, generic jdbc driver
>            Reporter: Spencer Ho
>         Attachments: SqlManager.java
> The SqlManager uses the query, "SELECT t.* from <table> AS t" to get table spec
is too expensive for big tables, and it was called twice to generate column names and types.
 For tables that are big enough to be map-reduced, this is too expensive to make sqoop useful.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message