hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1524) Support for CLOB and BLOB values larger than can fit in memory
Date Mon, 22 Feb 2010 23:16:27 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron Kimball updated MAPREDUCE-1524:
-------------------------------------

    Attachment: MAPREDUCE-1524.patch

Attaching a patch which provides this functionality. This completes the ClobRef/BlobRef interface
added in MAPREDUCE-1446. These objects can reference inline data; if data is too large, however
the actual values will be imported to separate files in HDFS. They are placed in a {{_lobs}}
directory underneath the import path. The relative filename is then preserved in the inline
import data, for later reconstruction of the objects in user-side code.

This import process makes use of the FileOutputCommitter's ability to promote side-channel
work files from only succeeded tasks. LOB files are named with the attempt id embedded to
prevent collisions.

The import process itself now includes a delayed component. Using FileOutputCommitter, the
FileSystem for the current Context/Configuration, etc, requires access to the MapContext,
which is unavailable inside DBWritable's {{readFields(ResultSet)}} method. This approach now
adds another method to {{SqoopRecord}}, {{loadLargeObjects()}}. This method is called in the
import map() method to propagate the current Context into a {{LargeObjectLoader}}.

This addition to the {{SqoopRecord}} interface makes this an incompatible change, because
previously generated SqoopRecord objects cannot now be used with this version of Sqoop; users
will need to regenerate their classes before reusing data stored in SqoopRecord instances
(e.g., with {{--generate-only}}. The on-disk layout of existing SqoopRecord instances is unaffected.

Unit tests are added for LargeObjectLoader, ClobRef, and BlobRef to verify all of the above
functionality.

Users can control the threshold at which CLOB/BLOB fields are no longer directly materialized
with the {{\-\-inline-lob-limit}} argument. The default value for this is 16 MB.

> Support for CLOB and BLOB values larger than can fit in memory
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-1524
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1524
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1524.patch
>
>
> The patch in MAPREDUCE-1446 provides support for "inline" CLOB and BLOB values which
can be fully materialized. Values which are too big for RAM should be written to separate
files in HDFS and referenced in an indirect fashion; access should be provided through a stream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message