Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 82E3210BA6 for ; Mon, 13 Jan 2014 00:27:06 +0000 (UTC) Received: (qmail 22703 invoked by uid 500); 13 Jan 2014 00:27:03 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 22341 invoked by uid 500); 13 Jan 2014 00:27:01 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 22314 invoked by uid 99); 13 Jan 2014 00:27:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jan 2014 00:27:00 +0000 X-ASF-Spam-Status: No, hits=0.3 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of dminer@clearedgeit.com does not designate 209.85.128.52 as permitted sender) Received: from [209.85.128.52] (HELO mail-qe0-f52.google.com) (209.85.128.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jan 2014 00:26:55 +0000 Received: by mail-qe0-f52.google.com with SMTP id s1so2748025qeb.39 for ; Sun, 12 Jan 2014 16:26:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:references:from:content-type:in-reply-to :message-id:date:to:content-transfer-encoding:mime-version; bh=Zpt5jiFgJ4YavlqupFhpK6SabR68VO9lPBd+6dJ++II=; b=jMrvJVcbX8IgIQMCj+sO7vrQnxRMm2bCTYZyXDHbZ0vxruQB46eB7wHCeIm6AW3htJ AkCtxBtfihx6S93lnIyFnu1WG5amFokXc/pVDdBBHu9lIY14NfrDDYnXxLmx0NxZgzer u4cw1TuE+bl1BER0yenURn5EXHSZYP9eZaXrhEWLC1rXZE6Pg2iZSdhG5I3iPDo8rkI6 U+BgiYLNdLJm+4uOvv/+Vwv+XtPFq54y71KUX7hgm6qmE1CtoloIr/PmOX0yBycYnfua eME5jeerwl1spt6IrI9l1ovafUDmbZ5GupV5QKBBcc9Nk1sI3rWEI9ZhwAfR4etSEu5f XtXw== X-Gm-Message-State: ALoCoQkt/Xw4eEXWGAKvA55ASzP5H9wiPRTsR40FBqkbHDtLPgDlC7bf2k62aCapTHqB1Yo1Xzuk X-Received: by 10.224.44.9 with SMTP id y9mr30484092qae.45.1389572793914; Sun, 12 Jan 2014 16:26:33 -0800 (PST) Received: from [192.168.1.4] (pool-173-64-72-193.bltmmd.fios.verizon.net. [173.64.72.193]) by mx.google.com with ESMTPSA id x9sm6950514qev.15.2014.01.12.16.26.32 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 12 Jan 2014 16:26:32 -0800 (PST) Subject: Re: ISAM file location vs. read performance References: <52D32849.5090500@gmail.com> From: Donald Miner Content-Type: text/plain; charset=us-ascii X-Mailer: iPhone Mail (11B554a) In-Reply-To: <52D32849.5090500@gmail.com> Message-Id: <13A9718A-F829-4DFE-ACD6-C5195FF4C253@clearedgeit.com> Date: Sun, 12 Jan 2014 19:26:32 -0500 To: "user@accumulo.apache.org" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) X-Virus-Checked: Checked by ClamAV on apache.org HDFS-385 ( https://issues.apache.org/jira/plugins/servlet/mobile#issue/HDFS-= 385 ) is for custom pluggable block placement policies and there has been so= me talk (i think) about improving mean time to recovering and data locality i= n hbase. Basically this would allow accumulo to have a policy for its blocks and cont= rol its own destiny... Instead of things like the rebalancer screwing things= up. I honestly don't know much else about this. Just thought it might be relevan= t to the conversation.=20 > On Jan 12, 2014, at 6:42 PM, Josh Elser wrote: >=20 >=20 >=20 >> On 1/12/14, 6:17 PM, Sean Busbey wrote: >> On Sun, Jan 12, 2014 at 4:42 PM, William Slacum >> > >> wrote: >>=20 >> Some data on short circuit reads would be great to have. >>=20 >>=20 >> What kind of data are you looking for? Just HDFS read rates? or >> specifically Accumulo when set up to make use of it? >=20 > I believe what Bill means, and what I'm also curious about, is specificall= y the impact on performance for Accumulo's workload: a merged read over mult= iple files. An easy test might be to create multiple RFiles (1 to 10 files?)= which contain interspersed data. Test some sort of random-read and random-s= eek+sequential-read workloads, from 1 to 10 RFiles, and with shortcircuit re= ads on an off. >=20 > Perhaps a slightly more accurate test would be to up the compaction ratio o= n a table, and then bulk import them to a single table, and then just use th= e regular client API. >=20 >> I'm unsure of how correct the "compaction leading to eventual >> locality" postulation is. It seems, to me at least, that in the case >> of a multi-block file, the file system would eventually try to >> distribute those blocks rather than leave them all on a single host. >>=20 >>=20 >>=20 >>=20 >> I know in HBase set ups, it's common to either disable the HDFS Balancer >> or just disable for a namespace containing the part of the filesystem >> that handles HBase. Otherwise, when the blocks are moved off to other >> hosts you get performance degradation until compaction can happen again. >> I would expect the same thing ought to be done for Accumulo. >=20 > AFAIK, HBase also does a lot more in regards to assigning Tablets in regar= ds to the blocks that serve them, no? To my knowledge, Accumulo doesn't do a= nything like this. I don't want users to think that disabling the HDFS balan= cer is a good idea for Accumulo unless we have actual evidence.