Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9C7CD17B6A for ; Wed, 7 Oct 2015 13:51:16 +0000 (UTC) Received: (qmail 68147 invoked by uid 500); 7 Oct 2015 13:51:16 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 68098 invoked by uid 500); 7 Oct 2015 13:51:16 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 68088 invoked by uid 99); 7 Oct 2015 13:51:16 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Oct 2015 13:51:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E9640180E0F for ; Wed, 7 Oct 2015 13:51:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id aRcDhe6Lwi9x for ; Wed, 7 Oct 2015 13:51:15 +0000 (UTC) Received: from mail-ig0-f173.google.com (mail-ig0-f173.google.com [209.85.213.173]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id CFB072031C for ; Wed, 7 Oct 2015 13:51:14 +0000 (UTC) Received: by igbkq10 with SMTP id kq10so111661755igb.0 for ; Wed, 07 Oct 2015 06:51:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=/V1W/xBWOHt3zFaJXQ5RenS8VIAwpfJywPee3cIm6BE=; b=q+re56EevMBie78xwk30TW/yayLj/DBDr0vFa3iRUPeu10cs8/sAv9qBZGiTHmGs3B wS0WT6O/v6gHT9yDP1nCU0Sc9F5D/6MKL88pZ0J5Lz12U02o+oTouROZb1+NFgEKrnD9 E+4GVq4PQ2yoD+mFdWE1YbzG1/cilDjj2wb65YvNK+umM9kP7HZGh6NIzOq3pJ1GQXF1 +g2ox6VSz9C3KHojBhVSviVF6hcqr5izPXd39qKEeeUoHtkJSrIvQqt1xzMGkrAy9apS AJnpodGX/8x9dZvej2J5mGEzfstkxKtOPuemdzQUlacd4tffxGEACtz3IdiGQTfdz2jP KUQA== X-Received: by 10.50.79.197 with SMTP id l5mr1949138igx.93.1444225874283; Wed, 07 Oct 2015 06:51:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.40.5 with HTTP; Wed, 7 Oct 2015 06:50:54 -0700 (PDT) In-Reply-To: References: From: Jeff Kubina Date: Wed, 7 Oct 2015 09:50:54 -0400 Message-ID: Subject: Re: How does Accumulo process a r-files for bulk ingesting? To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=089e013cc16836523d0521840acf --089e013cc16836523d0521840acf Content-Type: text/plain; charset=UTF-8 So if the HDFS has a replication factor of m and an r-file has a range that intersects n tablets, then data-locality will never be achieved for max(0,n-m) of the r-files, that is, they will never be on the same node as their tablet server until compaction, correct? -- Jeff Kubina 410-988-4436 On Wed, Oct 7, 2015 at 9:35 AM, Josh Elser wrote: > > On Oct 7, 2015 8:47 AM, "Jeff Kubina" wrote: > > > > How does Accumulo process an r-file for bulk ingesting when the key > range of an r-file is within one tablet's key range and when the key range > of an r-file spans two or more tablets? > > > > If the r-file is within one tablet's range I thought the file was "just > renamed" and added to the tablet's list of r-files. Is that correct? > > Bingo > > > If the key range of the r-file spans two or more files is the r-file > partitioned into separate r-files for each appropriate tablet server or are > the records "batch-written" to each appropriate tablet in memory? > > They're logically partitioned if memory serves (the files are not > rewritten). So you would see multiple entries in the metadata table for a > single file with certain offsets. No replaying of mutations by batch > writers. > --089e013cc16836523d0521840acf Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
So if the HDFS has a replication factor of m and an r-file= has a range that intersects n tablets, then data-locality will never be ac= hieved for max(0,n-m) of the r-files, that is, they will never be on the sa= me node as their tablet server until compaction, correct?

--=C2=A0
Jeff Kubina
410-988-4436


On Wed, Oct 7, 2015 at 9:35 AM, Josh Elser <= span dir=3D"ltr"><josh.elser@gmail.com> wrote:


On Oct 7, 2015 8:47 AM, "Jeff Kubina" <jeff.kubina@gmail.com> wrote: >
> How does Accumulo process an r-file for bulk ingesting when the key ra= nge of an r-file is within one tablet's key range and when the key rang= e of an r-file spans two or more tablets?
>
> If the r-file is within one tablet's range I thought the file was = "just renamed" and added to the tablet's list of r-files. Is = that correct?

Bingo

> If the key range of the r-file spans two or more files = is the r-file partitioned into separate r-files for each appropriate tablet= server or are the records "batch-written" to each appropriate ta= blet in memory?

They're logically partitioned if memory serves (t= he files are not rewritten). So you would see multiple entries in the metad= ata table for a single file with certain offsets. No replaying of mutations= by batch writers.


--089e013cc16836523d0521840acf--