Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C38CF200B88 for ; Thu, 22 Sep 2016 17:28:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C221D160AE0; Thu, 22 Sep 2016 15:28:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 14A4B160AAD for ; Thu, 22 Sep 2016 17:28:22 +0200 (CEST) Received: (qmail 57984 invoked by uid 500); 22 Sep 2016 15:28:22 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 57589 invoked by uid 99); 22 Sep 2016 15:28:22 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Sep 2016 15:28:22 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 066FE2C2A64 for ; Thu, 22 Sep 2016 15:28:22 +0000 (UTC) Date: Thu, 22 Sep 2016 15:28:22 +0000 (UTC) From: "Keith Turner (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 22 Sep 2016 15:28:24 -0000 [ https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513602#comment-15513602 ] Keith Turner commented on ACCUMULO-4468: ---------------------------------------- bq. That said, maybe there would be benefits to storing all the pieces of the Key in a single byte array, and maintaining indices into it to track the individual parts, rather than several smaller arrays... Key used to be like that (one large byte array with pointers). I changed it when adding support for relative compression in rfile. The reasoning behind the change was so that when rfile deserializes a key and a field is the same as the last key, it can just point to the previous byte array. This makes equality comparisons on rows or columns that are the same really fast (because the byte array is the same and equality checks that). The code that serializes keys and transfers them across the network also does this. So it may be interesting to have the test stream keys from an RFile. bq. I think it would be worth revisiting the comparison mechanism in isEqual, too, doing something like the Unsafe method used in Hadoop's FastByteComparisons class but going in reverse. Thats sounds like an interesting line of investigation. The compare methods could also leverage this technique. We may have an issue open for this already. > accumulo.core.data.Key.equals(Key, PartialKey) improvement > ---------------------------------------------------------- > > Key: ACCUMULO-4468 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4468 > Project: Accumulo > Issue Type: Improvement > Components: core > Affects Versions: 1.8.0 > Reporter: Will Murnane > Priority: Trivial > Labels: newbie, performance > Attachments: benchmark.tar.gz, key_comparison.patch > > > In the Key.equals(Key, PartialKey) overload, the current method compares starting at the beginning of the key, and works its way toward the end. This functions correctly, of course, but one of the typical uses of this method is to compare adjacent rows to break them into larger chunks. For example, accumulo.core.iterators.Combiner repeatedly calls this method with subsequent pairs of keys. > I have a patch which reverses the comparison order. That is, if the method is called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, and finally row. This (marginally) improves the speed of comparisons in the relatively common case where only the last part is changing, with less complex code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)