accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russ Weeks <rwe...@newbrightidea.com>
Subject Re: MR Data Locality with AccumuloInputFormat?
Date Fri, 16 May 2014 21:49:34 GMT
Thanks, Josh. I'll take a look through the Hadoop web UI.
-Russ


On Fri, May 16, 2014 at 1:37 PM, Josh Elser <josh.elser@gmail.com> wrote:

> Hi Russ,
>
> I believe that the AccumuloInputFormat will use the splits on the table
> you're reading to generate the MR InputSplits. The InputFormat should be
> trying to run the Mappers on the same machine as the tserver serving the
> data is located.
>
> If you're only getting a few mappers, adding more splits to your table
> should help. As your job runs, you can verify locality using the counters
> that your Job creates using the JobTracker/ResourceManger web UI.
>
>
> On 5/16/14, 1:32 PM, Russ Weeks wrote:
>
>> Hi, folks,
>>
>> When I execute an MR job with AccumuloInputFormat, are there any
>> guarantees about which mappers process which rows? I'm trying to
>> minimize crosstalk in my cluster but either I haven't split my table
>> properly or I'm expecting too much, because I'm only seeing 1 or 2 nodes
>> running MR tasks that should be reading data from tablet servers on 8
>> different nodes.
>>
>> Thanks,
>> -Russ
>>
>

Mime
View raw message