Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DC2E9F6B3 for ; Mon, 1 Apr 2013 18:38:26 +0000 (UTC) Received: (qmail 25487 invoked by uid 500); 1 Apr 2013 18:38:24 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 25437 invoked by uid 500); 1 Apr 2013 18:38:24 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 25428 invoked by uid 99); 1 Apr 2013 18:38:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Apr 2013 18:38:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yuzhihong@gmail.com designates 209.85.217.180 as permitted sender) Received: from [209.85.217.180] (HELO mail-lb0-f180.google.com) (209.85.217.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Apr 2013 18:38:19 +0000 Received: by mail-lb0-f180.google.com with SMTP id t11so2191276lbi.39 for ; Mon, 01 Apr 2013 11:37:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=fQYjOFfDZ/O9/IbUJRg0kghtc3+DoMM6k9glo/cYHT0=; b=eGrF7TA1xWAVx1RzT7daDPfK+tKkzA+bgqty4PtMjUJBm9atsUpCrZpvqaT1ED91X2 GTe176isb+QP9n4XZVMDqb1iZvn9wA2aMaizw9WEFor96jwRPuvbgmWEYSXbQaD9UtyN XjLQ5mbbC30XP4RMRgub0IMSLcm61X4+Dk4x31WZaIPWfRLFvKdB5hyKqTw5lwJ6L0fl kGLrEZlYSndnwpXWcacIS+EcvWQ3GLRRlxgZ7c0odEvqPQ+N8TnH+fkLuqSMOH72jJBA Y/GriiXYG7XYMmcxfemgugrp4xf1KUeorEV3O++0hpGeZ0/bl/YW2mJvcmEZlsAgLjXr dNBA== MIME-Version: 1.0 X-Received: by 10.112.155.9 with SMTP id vs9mr6334825lbb.6.1364841478378; Mon, 01 Apr 2013 11:37:58 -0700 (PDT) Received: by 10.112.84.133 with HTTP; Mon, 1 Apr 2013 11:37:58 -0700 (PDT) In-Reply-To: <5159BE12.9060006@mechnicality.com> References: <51599F8A.60306@mechnicality.com> <5159BE12.9060006@mechnicality.com> Date: Mon, 1 Apr 2013 11:37:58 -0700 Message-ID: Subject: Re: Inconsistencies in comparisons using KeyComparator From: Ted Yu To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=089e0112c00e7e5a4804d950ebea X-Virus-Checked: Checked by ClamAV on apache.org --089e0112c00e7e5a4804d950ebea Content-Type: text/plain; charset=ISO-8859-1 Looking at http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/share/classes/sun/misc/Unsafe.java, looks like Unsafe is provided by openjdk as well. I guess this issue, though disturbing, wouldn't show up. On Mon, Apr 1, 2013 at 10:04 AM, Alan Chaney wrote: > > On 4/1/2013 9:42 AM, Stack wrote: > >> That is an interesting (disturbing) find Alan. Hopefully the fallback is >> rare. Did you have a technique for making the compare fallback to pure >> java compare? >> >> Thank you, >> St.Ack >> > > I agree its disturbing! I based my findings on reading the source code for > 0.92.1 (the CDH4.1.2 distro). > > It seems to me that, from org.apache.hadoop.hbase.**KeyValue$KVComparator > the KeyComparator calls KeyComparator.compareRows which in turn calls > > Bytes.compareTo(left, loffset, llength, righ, roffset, rlength) which in > turn calls Bytes.compareTo which calls LexicographicalCompareHolder.** > BEST_COMPARER > > which appears to be implemented thus: > > static class LexicographicalComparerHolder { > static final String UNSAFE_COMPARER_NAME = > LexicographicalComparerHolder.**class.getName() + > "$UnsafeComparer"; > > static final Comparer BEST_COMPARER = getBestComparer(); > /** > * Returns the Unsafe-using Comparer, or falls back to the pure-Java > * implementation if unable to do so. > */ > static Comparer getBestComparer() { > try { > Class theClass = Class.forName(UNSAFE_COMPARER_**NAME); > ... > } > > enum PureJavaComparer implements Comparer { > INSTANCE; > > @Override > public int compareTo(byte[] buffer1, int offset1, int length1, > ... > } > } > > So, it looks like to me that Unsafe is the default. However, its not > really very easy to debug this, except by invoking the > KeyValue.KeyComparator and seeing what you get, which is what I did. Either > I'm doing something very stupid (extremely plausible) or there is a bit of > an issue here. I was hoping that someone would point out my error! > > I've got some unit tests that appear to show the difference. > > Thanks > > Alan > > > > >> >> On Mon, Apr 1, 2013 at 7:54 AM, Alan Chaney >> wrote: >> >> Hi >>> >>> I need to write some code that sorts row keys identically to HBase. >>> >>> I looked at the KeyValue.KeyComparator code, and it seems that, by >>> default, HBase elects to use the 'Unsafe' comparator as the basis of its >>> comparison, with a fall-back to to the PureJavaComparer should Unsafe not >>> be available (for example, in tests.) >>> >>> However, I'm finding that the sort order from a call to >>> KeyValue.KeyComparator appears to be inconsistent between the two forms. >>> >>> As an example, comparing: >>> >>> (first param) (second param) >>> 0000000000000000ffffffffffffff****ffffffffffffffffff616c1b to >>> 0000000000000000ffffffffffffff****ffffffffffffffffff61741b >>> >>> gives 1 for the default (presumably, Unsafe) call, and -1 using the >>> PureJavaComparator. >>> >>> I would actually expect it to be a -ve number, based on the difference of >>> 6c to 74 in the 3rd from last byte above. >>> >>> Similarly >>> >>> 000000000000000000000000000000****000000000000000000616c1b to >>> 000000000000000000000000000000****0000000000000000061741b >>> >>> gives > 0 instead of < 0. The PureJavaComparator does a byte-by-byte >>> comparison by >>> >>> Is this expected? From the definition of lexicographical compare that I >>> found, I don't think so. There's no issue of signed comparison here, >>> because 0x6c and 0x74 are still +ve byte values. >>> >>> Regards >>> >>> Alan >>> >>> >>> >>> > --089e0112c00e7e5a4804d950ebea--