hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1446) Sqoop should support CLOB and BLOB datatypes
Date Tue, 02 Feb 2010 21:18:30 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Aaron Kimball updated MAPREDUCE-1446:

    Attachment: MAPREDUCE-1446.patch

Attaching a patch which provides this functionality. The main challenge of BLOB and CLOB data
is that it can result in very large records -- larger than will fit in memory all at once.
The current patch proposes a mechanism to have two serializations for CLOB/BLOB data:

* Data less than 16MB will be stored inline in the record bodies
* Data greater than 16MB will be stored in separate files in HDFS; the records will contain
only a pointer to the file. This will then be accessed through an InputStream interface so
that users can buffer in as much data as is appropriate.

The latter of these two mechanisms is unimplemented, but placeholders have been left in the
code where necessary. The boundary size (16MB) is also a load-time parameter. It is currently
hardcoded, but it would be trivial to allow users to configure this to their own liking based
on their datasets, hardware, etc.

> Sqoop should support CLOB and BLOB datatypes
> --------------------------------------------
>                 Key: MAPREDUCE-1446
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1446
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1446.patch
> Sqoop should allow import of CLOB and BLOB based data.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message