Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@accumulo.apache.org
Received-SPF: softfail (nike.apache.org: transitioning domain of
 mastergeek505@gmail.com does not designate 216.139.236.26 as permitted
 sender)
Date: Wed, 22 Jan 2014 11:01:16 -0800 (PST)
From: Jeff N <mastergeek505@gmail.com>
To: dev@accumulo.apache.org
Message-ID: <1390417276828-7225.post@n5.nabble.com>
In-Reply-To: 
 <CAPMpPc47J9cBV7XTFjFnKczA8UaeqLsXuxgSo5gLptCeuYJeOg@mail.gmail.com>
References: <1390254996382-7193.post@n5.nabble.com>
 <CAGUtCHqeioRhFQf7hq+BTOZFEBKi-2tPM9qHCbeTpw73yam43g@mail.gmail.com>
 <CAPMpPc47J9cBV7XTFjFnKczA8UaeqLsXuxgSo5gLptCeuYJeOg@mail.gmail.com>
Subject: Re: Rack and Datacenter Awareness
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

@Adam
I am currently interested with the latter half of your second question. My
main interest lies in determining how to optimize data processing. If I have
two data centers that are geographically far apart and I am working on a
local machines but I need data from the second data center, how do I have
the processing occur on the second data center? The constraints to this
problem include a lack of empirical knowledge of the HDFS node that the data
contains, but is within the network topology I currently reside in.
Furthermore, it pertains to Map/Reduce jobs that utilize the
AccumuloInputFormat. Is it possible to have the distant data center process
my Mapper and send me the resulting data set instead of processing the
Mapper locally and making numerous network queries?


-----


--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Rack-and-Datacenter-Awareness-tp7193p7225.html
Sent from the Developers mailing list archive at Nabble.com.