Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC618D503 for ; Mon, 9 Jul 2012 18:53:01 +0000 (UTC) Received: (qmail 58792 invoked by uid 500); 9 Jul 2012 18:53:01 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 58764 invoked by uid 500); 9 Jul 2012 18:53:01 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 58756 invoked by uid 99); 9 Jul 2012 18:53:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jul 2012 18:53:01 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of roshanp@gmail.com designates 74.125.82.169 as permitted sender) Received: from [74.125.82.169] (HELO mail-we0-f169.google.com) (74.125.82.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jul 2012 18:52:55 +0000 Received: by werl57 with SMTP id l57so3716572wer.0 for ; Mon, 09 Jul 2012 11:52:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=FymJWWWzPjAdFzICLFbBJWooWKPd4nAub8CoicFuHDw=; b=uRY3HLHYHT5qNaMdAUSSErE0vHJTDUPv4XqB+KpXgo/lHhXO674qwp2zlSw3NJgJQs ESPcpJkGs7rxk+rRTrCHFXeE3dLpecteKEcvXXBRNGe74wBXoWzF2XpC81JELZ1NCYyU cJsxilTAZ3SFA4/O6aqOLk0HHaQdlb7xq2R7MSpU6xncir5Jhu4MLBLKwAO3OYJZieaS kU7qFH61PY+8tSCgdbGzYtV6WUnQQ1o1vsIIL9NURaw6IHSN0ctf6Lnxw9LFmX0hytZz lD/I5meVFZTdS4VD2ksbHYXGUoBXrWKIIoOdUCaenVhYo1Kc+oCzT9yj2jZ/AQU6guv0 NORw== MIME-Version: 1.0 Received: by 10.180.109.195 with SMTP id hu3mr31717777wib.8.1341859954451; Mon, 09 Jul 2012 11:52:34 -0700 (PDT) Received: by 10.216.151.101 with HTTP; Mon, 9 Jul 2012 11:52:34 -0700 (PDT) In-Reply-To: References: <52431626.34815.1341858287693.JavaMail.root@linzimmb04o.imo.intelink.gov> Date: Mon, 9 Jul 2012 14:52:34 -0400 Message-ID: Subject: Re: Accumulo Input Format over hadoop blocks From: Roshan Punnoose To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=e89a8f3ba7fbec636404c46a1d5d --e89a8f3ba7fbec636404c46a1d5d Content-Type: text/plain; charset=ISO-8859-1 Thanks, that makes perfect sense. My assumption that the mapper is pulling the data from the hadoop blocks was wrong. Thanks for the full explanation, that really helps. Roshan On Mon, Jul 9, 2012 at 2:43 PM, John Vines wrote: > On Mon, Jul 9, 2012 at 2:24 PM, Roshan Punnoose wrote: > >> This might be a very easy question, but I was wondering how the Accumulo >> Input Format handled a tablet file splitting over multiple nodes. >> >> For example, if I have a tablet file that is 1GB large, where my hadoop >> block size is 256MB. Then there is a possibility that up to 4 nodes could >> be holding the data from my tablet file. However, when Accumulo Input >> Format creates mappers, it creates a mapper for every tablet. This might >> mean that 3 blocks are transferred over the network to where the mapper is >> running to ensure data locality. >> >> Am I correct in this assumption? Or is there something else the >> TabletServer is doing underneath to make sure all the data actually resides >> in one server, so there is no network overhead of moving blocks before a >> Map Reduce job. >> >> Thanks! >> Roshan >> > > If a single file spans 4 HDFS blocks, there is a reasonable assumption > that a single datanode possesses all 4 blocks of that one file (it's an > assumption because if the datanode died and data was rereplicated that > guarantee is lost). The node which possesses all 4 blocks is the same as > the tserver who wrote that data. More likely than not, that file was > written by a tserver at major compaction time. Factoring that with our > attempts to do unnecessary migrations, then in most cases you will see > minimal data over the network. Yes, occasionally you will do some over the > network transfers due to tablet migrations, data that hasn't been compacted > in a while, nodes failures, etc., but these are by no means the norm. > > For a bit more education, when using the Accumulo Input Format, the mapper > task is actually talking to the tserver, and only the tserver, for reading > in data. This is because the tablet server is doing a merged read of the > data, applying all scan time iterators (including visibility filtering), > and then giving results back to the Mapper. So even if there were blocks > over the network, there really couldn't be anything done in the MapReduce > job to ensure locality because you can't have partial tablets handled > because of the way deletes, versioning, and aggregation work. If there are > concerns about locality on your system, forcing a compaction will ensure > data locality, but this really isn't necessary unless your system has had a > lot of failures or oddly distributed ingest. > > John > --e89a8f3ba7fbec636404c46a1d5d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks, that makes perfect sense. My assumption that the mapper is pulling = the data from the hadoop blocks was wrong. Thanks for the full explanation,= that really helps.

Roshan

On Mon, Jul 9, 2012 at 2:43 PM, John Vines <john.w.vines@ugov.gov&= gt; wrote:
On Mon, Jul 9, 2012 at 2:= 24 PM, Roshan Punnoose <roshanp@gmail.com> wrote:
This might be a very easy question, but I was wondering how the Accumulo In= put Format handled a tablet file splitting over multiple nodes.

For example, if I have a tablet file that is 1GB large, where my ha= doop block size is 256MB. Then there is a possibility that up to 4 nodes co= uld be holding the data from my tablet file. However, when Accumulo Input F= ormat creates mappers, it creates a mapper for every tablet. This might mea= n that 3 blocks are transferred over the network to where the mapper is run= ning to ensure data locality.=A0

Am I correct in this assumption? Or is there something = else the TabletServer is doing underneath to make sure all the data actuall= y resides in one server, so there is no network overhead of moving blocks b= efore a Map Reduce job.

Thanks!
Roshan<= /div>

If a single file spans= 4 HDFS blocks, there is a reasonable assumption that a single datanode pos= sesses all 4 blocks of that one file (it's an assumption because if the= datanode died and data was rereplicated that guarantee is lost). The node = which possesses all 4 blocks is the same as the tserver who wrote that data= . More likely than not, that file was written by a tserver at major compact= ion time. Factoring that with our attempts to do unnecessary migrations, th= en in most cases you will see minimal data over the network. Yes, occasiona= lly you will do some over the network transfers due to tablet migrations, d= ata that hasn't been compacted in a while, nodes failures, etc., but th= ese are by no means the norm.

For a bit more education, when using the Accumulo Input Format, the map= per task is actually talking to the tserver, and only the tserver, for read= ing in data. This is because the tablet server is doing a merged read of th= e data, applying all scan time iterators (including visibility filtering), = and then giving results back to the Mapper. So even if there were blocks ov= er the network, there really couldn't be anything done in the MapReduce= job to ensure locality because you can't have partial tablets handled = because of the way deletes, versioning, and aggregation work. If there are = concerns about locality on your system, forcing a compaction will ensure da= ta locality, but this really isn't necessary unless your system has had= a lot of failures or oddly distributed ingest.

John

--e89a8f3ba7fbec636404c46a1d5d--