hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Minh Duc Nguyen <mdngu...@gmail.com>
Subject Re: Extension points available for data locality
Date Tue, 21 Aug 2012 19:17:25 GMT
Tharindu, have you considered using something like Sqoop?  For efficiency,
your idea is to run a Hadoop cluster on the same nodes as your MySQL
cluster, in effect, moving your processing to your data.  If you use
something like Sqoop, you could move your data to your Hadoop cluster.
 While it may not make sense for what you're trying to accomplish, I
thought I'd at least offer up the idea.

HTH,
Minh

On Tue, Aug 21, 2012 at 5:06 AM, Tharindu Mathew <mccloud35@gmail.com>wrote:

> Hi,
>
> I'm doing some research that involves pulling data stored in a mysql
> cluster directly for a map reduce job, without storing the data in HDFS.
>
> I'd like to run hadoop task tracker nodes directly on the mysql cluster
> nodes. The purpose of this being, starting mappers directly in the node
> closest to the data if possible (data locality).
>
> I notice that with HDFS, since the name node knows exactly where each data
> block is, it uses this to achieve data locality.
>
> Is there a way to achieve my requirement possibly by extending the name
> node or otherwise?
>
> Thanks in advance.
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>
>

Mime
View raw message