Return-Path: X-Original-To: apmail-accumulo-dev-archive@www.apache.org Delivered-To: apmail-accumulo-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A124D6F1 for ; Fri, 29 Jun 2012 13:06:47 +0000 (UTC) Received: (qmail 24185 invoked by uid 500); 29 Jun 2012 13:06:46 -0000 Delivered-To: apmail-accumulo-dev-archive@accumulo.apache.org Received: (qmail 24012 invoked by uid 500); 29 Jun 2012 13:06:46 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Received: (qmail 23919 invoked by uid 99); 29 Jun 2012 13:06:44 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Jun 2012 13:06:44 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id CB4661418F1 for ; Fri, 29 Jun 2012 13:06:43 +0000 (UTC) Date: Fri, 29 Jun 2012 13:06:43 +0000 (UTC) From: "Adam Fuchs (JIRA)" To: dev@accumulo.apache.org Message-ID: <1654383888.71453.1340975203837.JavaMail.jiratomcat@issues-vm> In-Reply-To: <1240219287.245.1334960853078.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (ACCUMULO-550) Colocate rfile index entries within file MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403879#comment-13403879 ] Adam Fuchs commented on ACCUMULO-550: ------------------------------------- I noticed yesterday when working on ACCUMULO-652 that the index entries at level 1 and higher are still interspersed between level 0 blocks with the current technique. Is there value in keeping indexes at a given level greater than 0 close to each other, or is that overkill? > Colocate rfile index entries within file > ---------------------------------------- > > Key: ACCUMULO-550 > URL: https://issues.apache.org/jira/browse/ACCUMULO-550 > Project: Accumulo > Issue Type: Improvement > Components: tserver > Reporter: Keith Turner > Assignee: Keith Turner > Fix For: 1.5.0, 1.4.1 > > > Before multi-level indexes were introduced, when an an rfile was written its entire index was held in memory and written out then the file was closed. With the introduction of multilevel index each index block is written when it fills up as the file is being written. This was done to handle the case where the index may not fit into memory. This leads to index blocks being sprinkled through the file. So any operation that iterates over the entire index can be slow because it turns into a lot of random accesses. > One possible solution is to buffer lots of index blocks up to some some threshold and write out alot of index blocks at once. This would make a scan of the index much faster as it would turn into a set of sequential reads of large chunks of data. > Could buffer all block at a particular level and write them out when the parent index block fills up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira