hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-716) org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle
Date Tue, 07 Jul 2009 23:09:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728411#action_12728411
] 

Aaron Kimball commented on MAPREDUCE-716:
-----------------------------------------

Enis,

The DBRecordReader API already contained a {{getSelectQuery()}} that was designed for overriding
by subclasses. So after looking at things more closely, rather than extend DBIF, it makes
more sense to extend DBRR in my mind. Of course, DBRecordReader is not static, so any other
extensions would have to be in DBInputFormat.java. I've made DBRecordReader a static class
and added protected accessor methods for private fields where they make sense.

I added {{src/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java}} to hold
the Oracle-specific logic.

I also added {{src/java/org/apache/hadoop/mapreduce/lib/db/MySQLDBRecordReader.java}} to hold
MySQL-specific logic to force it to use unbuffered mode for queries, which prevents Out-of-memory
errors (see MAPREDUCE-685 for a related problem in other queries run by Sqoop). 

DBInputFormat itself includes logic in {{getRecordReader()}} to determine the particular RR
implementation to instantiate. This uses the same metadata as was originally pushed down into
{{getSelectQuery()}}.

I've again tested this locally against Oracle and MySQL databases I've installed; both work.


> org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-716
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-716
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Java 1.6, HAdoop0.19.0, Linux..Oracle, 
>            Reporter: evanand
>            Assignee: Aaron Kimball
>         Attachments: HADOOP-5482.20-branch.patch, HADOOP-5482.patch, HADOOP-5482.trunk.patch,
MAPREDUCE-716.2.branch20.patch, MAPREDUCE-716.2.trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle.
> The out of the box implementation of the Hadoop is working properly with mysql/hsqldb,
but NOT with oracle.
> Reason is DBInputformat is implemented with mysql/hsqldb specific query constructs like
"LIMIT", "OFFSET".
> FIX:
> building a database provider specific logic based on the database providername (which
we can get using connection).
> I HAVE ALREADY IMPLEMENTED IT FOR ORACLE...READY TO CHECK_IN CODE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message