hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-716) org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle
Date Wed, 08 Jul 2009 16:25:15 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728763#action_12728763
] 

Aaron Kimball commented on MAPREDUCE-716:
-----------------------------------------

* Doing {{setFetchSize(Integer.MIN_VALUE)}} is the signal to MySQL to send the data row-at-a-time
instead of buffering the entire resultset in the client.  No other values for the setFetchSize
argument are supported (see [http://forums.mysql.com/read.php?39,137457]). Given the volume
of data encountered, it is likely that buffering all results could cause OutOfMemory exceptions
as were seen in Sqoop. There are enough bottlenecks elsewhere in HDFS that this is likely
to not be the slowest point. Consequently, this is the "correct" setting for result sets which
are expected to be large.
* So are you saying that DBRR should be a top-level class? I don't have strong opinions about
this. I can pull it up to top level easily enough. I will only do this on the trunk branch,
not the 0.20 branch patch.
* A reference to the statement object is no longer held onto. I'll reintroduce explicitly
tracking the reference to the statement object and close it in the close() method again. 

> org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-716
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-716
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Java 1.6, HAdoop0.19.0, Linux..Oracle, 
>            Reporter: evanand
>            Assignee: Aaron Kimball
>         Attachments: HADOOP-5482.20-branch.patch, HADOOP-5482.patch, HADOOP-5482.trunk.patch,
MAPREDUCE-716.2.branch20.patch, MAPREDUCE-716.2.trunk.patch, MAPREDUCE-716.3.trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle.
> The out of the box implementation of the Hadoop is working properly with mysql/hsqldb,
but NOT with oracle.
> Reason is DBInputformat is implemented with mysql/hsqldb specific query constructs like
"LIMIT", "OFFSET".
> FIX:
> building a database provider specific logic based on the database providername (which
we can get using connection).
> I HAVE ALREADY IMPLEMENTED IT FOR ORACLE...READY TO CHECK_IN CODE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message