accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Vines <john.w.vi...@ugov.gov>
Subject Re: Accumulo Input Format over hadoop blocks
Date Mon, 09 Jul 2012 18:43:06 GMT
On Mon, Jul 9, 2012 at 2:24 PM, Roshan Punnoose <roshanp@gmail.com> wrote:

> This might be a very easy question, but I was wondering how the Accumulo
> Input Format handled a tablet file splitting over multiple nodes.
>
> For example, if I have a tablet file that is 1GB large, where my hadoop
> block size is 256MB. Then there is a possibility that up to 4 nodes could
> be holding the data from my tablet file. However, when Accumulo Input
> Format creates mappers, it creates a mapper for every tablet. This might
> mean that 3 blocks are transferred over the network to where the mapper is
> running to ensure data locality.
>
> Am I correct in this assumption? Or is there something else the
> TabletServer is doing underneath to make sure all the data actually resides
> in one server, so there is no network overhead of moving blocks before a
> Map Reduce job.
>
> Thanks!
> Roshan
>

If a single file spans 4 HDFS blocks, there is a reasonable assumption that
a single datanode possesses all 4 blocks of that one file (it's an
assumption because if the datanode died and data was rereplicated that
guarantee is lost). The node which possesses all 4 blocks is the same as
the tserver who wrote that data. More likely than not, that file was
written by a tserver at major compaction time. Factoring that with our
attempts to do unnecessary migrations, then in most cases you will see
minimal data over the network. Yes, occasionally you will do some over the
network transfers due to tablet migrations, data that hasn't been compacted
in a while, nodes failures, etc., but these are by no means the norm.

For a bit more education, when using the Accumulo Input Format, the mapper
task is actually talking to the tserver, and only the tserver, for reading
in data. This is because the tablet server is doing a merged read of the
data, applying all scan time iterators (including visibility filtering),
and then giving results back to the Mapper. So even if there were blocks
over the network, there really couldn't be anything done in the MapReduce
job to ensure locality because you can't have partial tablets handled
because of the way deletes, versioning, and aggregation work. If there are
concerns about locality on your system, forcing a compaction will ensure
data locality, but this really isn't necessary unless your system has had a
lot of failures or oddly distributed ingest.

John

Mime
View raw message