accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff N <>
Subject Re: Rack and Datacenter Awareness
Date Wed, 22 Jan 2014 19:01:16 GMT
I am currently interested with the latter half of your second question. My
main interest lies in determining how to optimize data processing. If I have
two data centers that are geographically far apart and I am working on a
local machines but I need data from the second data center, how do I have
the processing occur on the second data center? The constraints to this
problem include a lack of empirical knowledge of the HDFS node that the data
contains, but is within the network topology I currently reside in.
Furthermore, it pertains to Map/Reduce jobs that utilize the
AccumuloInputFormat. Is it possible to have the distant data center process
my Mapper and send me the resulting data set instead of processing the
Mapper locally and making numerous network queries?


View this message in context:
Sent from the Developers mailing list archive at

View raw message