hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tharindu Mathew <mcclou...@gmail.com>
Subject Extension points available for data locality
Date Tue, 21 Aug 2012 09:06:17 GMT

I'm doing some research that involves pulling data stored in a mysql
cluster directly for a map reduce job, without storing the data in HDFS.

I'd like to run hadoop task tracker nodes directly on the mysql cluster
nodes. The purpose of this being, starting mappers directly in the node
closest to the data if possible (data locality).

I notice that with HDFS, since the name node knows exactly where each data
block is, it uses this to achieve data locality.

Is there a way to achieve my requirement possibly by extending the name
node or otherwise?

Thanks in advance.



blog: http://mackiemathew.com/

View raw message