Return-Path: X-Original-To: apmail-incubator-accumulo-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-accumulo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D50EE9101 for ; Mon, 19 Mar 2012 22:52:02 +0000 (UTC) Received: (qmail 79998 invoked by uid 500); 19 Mar 2012 22:52:02 -0000 Delivered-To: apmail-incubator-accumulo-dev-archive@incubator.apache.org Received: (qmail 79963 invoked by uid 500); 19 Mar 2012 22:52:02 -0000 Mailing-List: contact accumulo-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: accumulo-dev@incubator.apache.org Delivered-To: mailing list accumulo-dev@incubator.apache.org Received: (qmail 79951 invoked by uid 99); 19 Mar 2012 22:52:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Mar 2012 22:52:02 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Mar 2012 22:51:59 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id D537B295AF for ; Mon, 19 Mar 2012 22:51:38 +0000 (UTC) Date: Mon, 19 Mar 2012 22:51:38 +0000 (UTC) From: "Keith Turner (Created) (JIRA)" To: accumulo-dev@incubator.apache.org Message-ID: <141284240.33489.1332197498875.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (ACCUMULO-473) Support binary search within RFile blocks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org Support binary search within RFile blocks ----------------------------------------- Key: ACCUMULO-473 URL: https://issues.apache.org/jira/browse/ACCUMULO-473 Project: Accumulo Issue Type: Improvement Components: tserver Reporter: Keith Turner Assignee: Keith Turner Fix For: 1.5.0 RFiles store blocks of key values using relative encoding. By default these blocks are small (100k). To find a key in an RFile the index is used to find the block, then the block is scanned for the key. It would be nice for iterators that do alot of seeking if a binary search were done instead of a scan. Relative encoding is a form of compression that serializes a key relative to the last key. For example if the row in a key is the same as the previous key, then the row is not stored again. This works well with sorted data. The current on disk format does not support random access within a block, because to read entry N all previous entries must be read. One option would be to deserialize the block into a form that supports random access and then do binary search. However this will consume memory and CPU. If the relative encoding format could be modified to support random access, then binary search could be supported in a memory and CPU efficient way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira