hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2536) MapReduce for MySQL
Date Fri, 13 Jun 2008 10:50:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604796#action_12604796
] 

Enis Soztutar commented on HADOOP-2536:
---------------------------------------

Thanks for the useful patch ! 
I think we should iron out a few issues before this issue gets in, 

#It has been discussed in several blogs that LIMIT and OFFSET should not be used w/o ORDER
BY clause, since the query execution plan might opt for different row orderings (http://azimbabu.blogspot.com/2008/03/sqllimit-offset-without-order-by.html)

Please note that I am no expert on this subject, any thoughts are welcome. 
#I guess the key field does not have to be a Text object. Shall we make it more general? 
#as suggested by your inline comment, inferring the field types from the ResultSetMetaData
might be a better solution
#It would be really useful if DatabaseInputFormat and DatabaseOutputFormat include more documentation,
and a simple example in their javadocs (or in mapred tutorial). 
#we are executing an update request for every record in the RecordWriter, this may not be
optimal. Also the connection should not be in autocommit mode. We should issue the commit
in the close function of RecordWriter, catch exceptions in write function and do a rollback
should an error occur. 
#does {{ON DUPLICATE KEY UPDATE}} work only on MySQL. If so we should either change it,  or
document this in the javadoc for DatabaseOutputFormat. 
#why don't we just use Derby, then switch to JavaDB once HADOOP-2235 is in?  
#the patch has to be changed for the new directory structure. You can use the sed script in
HADOOP-2916.
#The patch uses tabs in several places, should be changed to spaces

> MapReduce for MySQL
> -------------------
>
>                 Key: HADOOP-2536
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2536
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Fredrik Hedberg
>            Assignee: Fredrik Hedberg
>            Priority: Minor
>         Attachments: database-2.diff, database.diff
>
>
> Add support for running MapReduce jobs over data residing in a MySQL table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message