hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1168) Export data to databases via Sqoop
Date Thu, 29 Oct 2009 20:55:59 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron Kimball updated MAPREDUCE-1168:
-------------------------------------

    Attachment: MAPREDUCE-1168.patch

This patch provides Sqoop with the ability to export tables from HDFS to an external RDBMS.
Sqoop runs a MapReduce job over the contents of a directory (identified by {{\-\-export-dir}}),
parsing the records contained within based on the auto-generated class definition for a table.
DBOutputFormat is used to inject the records back into the database table (specified by {{\-\-table}}).
The table must already exist in the target database.

Sqoop can auto-generate the appropriate ORM class for parsing the input files by examining
the target table (much as is done during importing); the existing command-line options that
govern delimiters are used to specify which delimiters are used in the files to be exported.

If an ORM class has already been generated for the table, this can now be specified with the
{{\-\-jar-file}} and {{\-\-class-name}} options; code auto-generation is bypassed in this
case. (This applies to imports as well.)

Export supports both delimited text files as well as SequenceFiles containing {{SqoopRecords}}
as values (i.e., SequenceFiles created via a Sqoop import with {{\-\-as-sequencefile}}). Users
do not need to identify the file type; it is automatically inferred. Gzipped text files will
be handled transparantly.

Testing has been performed via unit tests (included) against HSQLDB with several column datatypes.
I performed manual larger-scale testing by exporting 100MB and 500MB datasets containing 1-
and 5 million rows respectively to tables in mysql.

> Export data to databases via Sqoop
> ----------------------------------
>
>                 Key: MAPREDUCE-1168
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1168
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1168.patch
>
>
> Sqoop can import from a database into HDFS. It's high time it works in reverse too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message