Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 65A8817C16 for ; Wed, 7 Oct 2015 14:02:53 +0000 (UTC) Received: (qmail 2110 invoked by uid 500); 7 Oct 2015 14:02:48 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 2054 invoked by uid 500); 7 Oct 2015 14:02:48 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 2041 invoked by uid 99); 7 Oct 2015 14:02:48 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Oct 2015 14:02:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id AED6D1A22E1 for ; Wed, 7 Oct 2015 14:02:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id edFoKkuUzjED for ; Wed, 7 Oct 2015 14:02:44 +0000 (UTC) Received: from mail-oi0-f51.google.com (mail-oi0-f51.google.com [209.85.218.51]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 2780220F7B for ; Wed, 7 Oct 2015 14:02:44 +0000 (UTC) Received: by oixx17 with SMTP id x17so9640849oix.0 for ; Wed, 07 Oct 2015 07:02:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=etZ4mrYpQYTD+b6zVUQpWukdoBxn3ivZe865iT1YFAk=; b=Neij3up9sLBTFgGQLWRkPOCdkpOTJvYCaeCH4Tp/J0pzB7vcFli9Np4Kbxzh/vBS8Y 1+MvS+OA5oZGO0C8LcyFgv3hvdgPnEQWEyEiRPUnZUJl+2rXXj0sxQaET3JSCEgR/mqU ttRIeBLGUL01nc/REkLuEXwJnh/Q4Fa8N71EsBxM76w6QYuTyLfYZcMa3Na8AzgYm0f3 9Ao71YpTKv8eYzxPfaOvIBROgyyc6OaA/w6gjXQyxUfutlCH4MAJceg4VD56UmyDLkOj qqxCnBJNVV65LXLzpNMrz/quYHZTpdfsz1vrNQNNWrsRUBy2ru4051d/C0CuLh2ANFEh DYwg== MIME-Version: 1.0 X-Received: by 10.202.48.210 with SMTP id w201mr709347oiw.115.1444226563423; Wed, 07 Oct 2015 07:02:43 -0700 (PDT) Received: by 10.182.48.99 with HTTP; Wed, 7 Oct 2015 07:02:43 -0700 (PDT) In-Reply-To: References: Date: Wed, 7 Oct 2015 10:02:43 -0400 Message-ID: Subject: Re: How does Accumulo process a r-files for bulk ingesting? From: Eric Newton To: "user@accumulo.apache.org" Content-Type: multipart/alternative; boundary=001a113cdefc49f2530521843336 --001a113cdefc49f2530521843336 Content-Type: text/plain; charset=UTF-8 That is correct. There is no effort expended ensure locality of bulk files. -Eric On Wed, Oct 7, 2015 at 9:50 AM, Jeff Kubina wrote: > So if the HDFS has a replication factor of m and an r-file has a range > that intersects n tablets, then data-locality will never be achieved for > max(0,n-m) of the r-files, that is, they will never be on the same node as > their tablet server until compaction, correct? > > -- > Jeff Kubina > 410-988-4436 > > > On Wed, Oct 7, 2015 at 9:35 AM, Josh Elser wrote: > >> >> On Oct 7, 2015 8:47 AM, "Jeff Kubina" wrote: >> > >> > How does Accumulo process an r-file for bulk ingesting when the key >> range of an r-file is within one tablet's key range and when the key range >> of an r-file spans two or more tablets? >> > >> > If the r-file is within one tablet's range I thought the file was "just >> renamed" and added to the tablet's list of r-files. Is that correct? >> >> Bingo >> >> > If the key range of the r-file spans two or more files is the r-file >> partitioned into separate r-files for each appropriate tablet server or are >> the records "batch-written" to each appropriate tablet in memory? >> >> They're logically partitioned if memory serves (the files are not >> rewritten). So you would see multiple entries in the metadata table for a >> single file with certain offsets. No replaying of mutations by batch >> writers. >> > > --001a113cdefc49f2530521843336 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
That is correct.=C2=A0 There is no effort expended en= sure locality of bulk files.

-Eric

On Wed, Oct 7, 2015 at 9:50 AM, Jef= f Kubina <jeff.kubina@gmail.com> wrote:
So if the HDFS has a replication factor = of m and an r-file has a range that intersects n tablets, then data-localit= y will never be achieved for max(0,n-m) of the r-files, that is, they will = never be on the same node as their tablet server until compaction, correct?=

--=C2=A0
Jeff Kubina
<= a href=3D"tel:410-988-4436" value=3D"+14109884436" target=3D"_blank">410-98= 8-4436


On Wed, Oct 7, 2015 at 9:35 AM, Josh Elser <= span dir=3D"ltr"><josh.elser@gmail.com> wrote:


On Oct 7, 2015 8:47 AM, "Jeff Kubina" <jeff.kubina@gmail.com> wrote: >
> How does Accumulo process an r-file for bulk ingesting when the key ra= nge of an r-file is within one tablet's key range and when the key rang= e of an r-file spans two or more tablets?
>
> If the r-file is within one tablet's range I thought the file was = "just renamed" and added to the tablet's list of r-files. Is = that correct?

Bingo

> If the key range of the r-file spans two or more files = is the r-file partitioned into separate r-files for each appropriate tablet= server or are the records "batch-written" to each appropriate ta= blet in memory?

They're logically partitioned if memory serves (t= he files are not rewritten). So you would see multiple entries in the metad= ata table for a single file with certain offsets. No replaying of mutations= by batch writers.



--001a113cdefc49f2530521843336--