Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
Received-SPF: pass (nike.apache.org: domain of roshanp@gmail.com designates
 74.125.82.169 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADczPYSwMmnLYm_0i90_51cu3fVHEiTdcnh-BVEfJMgDT4sfBg@mail.gmail.com>
References: 
 <52431626.34815.1341858287693.JavaMail.root@linzimmb04o.imo.intelink.gov>
	<CADczPYSwMmnLYm_0i90_51cu3fVHEiTdcnh-BVEfJMgDT4sfBg@mail.gmail.com>
Date: Mon, 9 Jul 2012 14:52:34 -0400
Message-ID: 
 <CA+zmODjBJYvJNQoC7YbWdDgm5=fyQ4=c222=At7uNreLi__CuA@mail.gmail.com>
Subject: Re: Accumulo Input Format over hadoop blocks
From: Roshan Punnoose <roshanp@gmail.com>
To: user@accumulo.apache.org
Content-Type: multipart/alternative; boundary=e89a8f3ba7fbec636404c46a1d5d

--e89a8f3ba7fbec636404c46a1d5d
Content-Type: text/plain; charset=ISO-8859-1

Thanks, that makes perfect sense. My assumption that the mapper is pulling
the data from the hadoop blocks was wrong. Thanks for the full explanation,
that really helps.

Roshan

On Mon, Jul 9, 2012 at 2:43 PM, John Vines <john.w.vines@ugov.gov> wrote:

> On Mon, Jul 9, 2012 at 2:24 PM, Roshan Punnoose <roshanp@gmail.com> wrote:
>
>> This might be a very easy question, but I was wondering how the Accumulo
>> Input Format handled a tablet file splitting over multiple nodes.
>>
>> For example, if I have a tablet file that is 1GB large, where my hadoop
>> block size is 256MB. Then there is a possibility that up to 4 nodes could
>> be holding the data from my tablet file. However, when Accumulo Input
>> Format creates mappers, it creates a mapper for every tablet. This might
>> mean that 3 blocks are transferred over the network to where the mapper is
>> running to ensure data locality.
>>
>> Am I correct in this assumption? Or is there something else the
>> TabletServer is doing underneath to make sure all the data actually resides
>> in one server, so there is no network overhead of moving blocks before a
>> Map Reduce job.
>>
>> Thanks!
>> Roshan
>>
>
> If a single file spans 4 HDFS blocks, there is a reasonable assumption
> that a single datanode possesses all 4 blocks of that one file (it's an
> assumption because if the datanode died and data was rereplicated that
> guarantee is lost). The node which possesses all 4 blocks is the same as
> the tserver who wrote that data. More likely than not, that file was
> written by a tserver at major compaction time. Factoring that with our
> attempts to do unnecessary migrations, then in most cases you will see
> minimal data over the network. Yes, occasionally you will do some over the
> network transfers due to tablet migrations, data that hasn't been compacted
> in a while, nodes failures, etc., but these are by no means the norm.
>
> For a bit more education, when using the Accumulo Input Format, the mapper
> task is actually talking to the tserver, and only the tserver, for reading
> in data. This is because the tablet server is doing a merged read of the
> data, applying all scan time iterators (including visibility filtering),
> and then giving results back to the Mapper. So even if there were blocks
> over the network, there really couldn't be anything done in the MapReduce
> job to ensure locality because you can't have partial tablets handled
> because of the way deletes, versioning, and aggregation work. If there are
> concerns about locality on your system, forcing a compaction will ensure
> data locality, but this really isn't necessary unless your system has had a
> lot of failures or oddly distributed ingest.
>
> John
>

--e89a8f3ba7fbec636404c46a1d5d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks, that makes perfect sense. My assumption that the mapper is pulling =
the data from the hadoop blocks was wrong. Thanks for the full explanation,=
 that really helps.<div><br></div><div>Roshan<br><br><div class=3D"gmail_qu=
ote">
On Mon, Jul 9, 2012 at 2:43 PM, John Vines <span dir=3D"ltr">&lt;<a href=3D=
"mailto:john.w.vines@ugov.gov" target=3D"_blank">john.w.vines@ugov.gov</a>&=
gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class=3D"gmail_quote"><div><div class=3D"h5">On Mon, Jul 9, 2012 at 2:=
24 PM, Roshan Punnoose <span dir=3D"ltr">&lt;<a href=3D"mailto:roshanp@gmai=
l.com" target=3D"_blank">roshanp@gmail.com</a>&gt;</span> wrote:<br><blockq=
uote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex">


This might be a very easy question, but I was wondering how the Accumulo In=
put Format handled a tablet file splitting over multiple nodes.<div><br></d=
iv><div>For example, if I have a tablet file that is 1GB large, where my ha=
doop block size is 256MB. Then there is a possibility that up to 4 nodes co=
uld be holding the data from my tablet file. However, when Accumulo Input F=
ormat creates mappers, it creates a mapper for every tablet. This might mea=
n that 3 blocks are transferred over the network to where the mapper is run=
ning to ensure data locality.=A0</div>


<div><br></div><div>Am I correct in this assumption? Or is there something =
else the TabletServer is doing underneath to make sure all the data actuall=
y resides in one server, so there is no network overhead of moving blocks b=
efore a Map Reduce job.</div>


<div><br></div><div>Thanks!</div><span><font color=3D"#888888"><div>Roshan<=
/div></font></span></blockquote></div></div><div><br>If a single file spans=
 4 HDFS blocks, there is a reasonable assumption that a single datanode pos=
sesses all 4 blocks of that one file (it&#39;s an assumption because if the=
 datanode died and data was rereplicated that guarantee is lost). The node =
which possesses all 4 blocks is the same as the tserver who wrote that data=
. More likely than not, that file was written by a tserver at major compact=
ion time. Factoring that with our attempts to do unnecessary migrations, th=
en in most cases you will see minimal data over the network. Yes, occasiona=
lly you will do some over the network transfers due to tablet migrations, d=
ata that hasn&#39;t been compacted in a while, nodes failures, etc., but th=
ese are by no means the norm.<br>


<br>For a bit more education, when using the Accumulo Input Format, the map=
per task is actually talking to the tserver, and only the tserver, for read=
ing in data. This is because the tablet server is doing a merged read of th=
e data, applying all scan time iterators (including visibility filtering), =
and then giving results back to the Mapper. So even if there were blocks ov=
er the network, there really couldn&#39;t be anything done in the MapReduce=
 job to ensure locality because you can&#39;t have partial tablets handled =
because of the way deletes, versioning, and aggregation work. If there are =
concerns about locality on your system, forcing a compaction will ensure da=
ta locality, but this really isn&#39;t necessary unless your system has had=
 a lot of failures or oddly distributed ingest.<span class=3D"HOEnZb"><font=
 color=3D"#888888"><br>


<br>John<br></font></span></div></div>
</blockquote></div><br></div>

--e89a8f3ba7fbec636404c46a1d5d--