Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D68B10F06 for ; Tue, 21 Jan 2014 18:55:05 +0000 (UTC) Received: (qmail 87195 invoked by uid 500); 21 Jan 2014 18:54:57 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 87072 invoked by uid 500); 21 Jan 2014 18:54:56 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 87046 invoked by uid 99); 21 Jan 2014 18:54:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jan 2014 18:54:53 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.192.175] (HELO mail-pd0-f175.google.com) (209.85.192.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jan 2014 18:54:44 +0000 Received: by mail-pd0-f175.google.com with SMTP id w10so3288032pde.20 for ; Tue, 21 Jan 2014 10:54:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to; bh=3hTy75oiReCxNvBzueD6LyY8lnmIXbyhvbxBTsdc3gQ=; b=RWjq8FR/AFsMiUZaW75RdwAPf/cWcCzoUhttCDLkWqp5QTZRRccQd0C/iQLOOEzVgD SCEiOhAGUsb2YxrtPtkVNZKYAS/szMT/2iTukUj6RCFYGHbZfhcGV55AjkWCz5uQpebF uPetyqghE1ihwtKlD/aZYhD6z8KgZKT1I/qJhSuWWqhPCQTEvxJaUQ8+L9RuxuUJ38BY JDsU0jlx5WhHPTB6HvgKjq8cUgC2xoqcbEem7xW8ll+TcMcu7PQlKQJ90nCIXUAM/K/R 3JckLJfM6aiYSXHb6CEEocQnrQL1lyTZSZD3uujfcG9aBYnGi9nz4TXl6mJVnToxLe/K AmNw== X-Gm-Message-State: ALoCoQm27o5LnQX7WCF2vKXc1BFB0o2EwPLXnfpepgrsL+R8G3Srf+xDLdxolKpywAUv+mxzJLU5 X-Received: by 10.68.239.230 with SMTP id vv6mr26513910pbc.34.1390330463505; Tue, 21 Jan 2014 10:54:23 -0800 (PST) Received: from [172.20.12.26] (75-149-43-193-SFBA.hfc.comcastbusiness.net. [75.149.43.193]) by mx.google.com with ESMTPSA id xs1sm31025642pac.7.2014.01.21.10.54.20 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 21 Jan 2014 10:54:21 -0800 (PST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\)) Subject: Re: BytesRef equals() method From: Steven Schlansker In-Reply-To: Date: Tue, 21 Jan 2014 10:54:19 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.1827) X-Virus-Checked: Checked by ClamAV on apache.org On Jan 21, 2014, at 7:32 AM, Yann-Erwan Perio = wrote: > Hello, >=20 > I have been working a bit with BytesRef recently, and I wonder whether > the content of the equals() method, and more specifically the content > of the bytesEquals(BytesRef other) method, is the intended one. >=20 > I was made aware of this because I used a Map in the > collector, and the map would sometimes give inconsistent results. > Checking out the source code, the hashcode() method looks valid to me, > but the bytesEquals() method looks strange - because prior to > comparing the real value of the BytesRef, it checks their lengths - > and AIUI these may differ, even though BytesRef are logically equal. How can two byte arrays be equal if they have different lengths? Same way as two Strings with differing lengths can never be equal, two byte arrays with different lengths will never be equivalent. >=20 > I am not familiar at all with the internals of Lucene (this includes > the BytesRef mechanics), so I may be completely wrong here. FWIW, I > solved my problem by creating fresh BytesRef from the ones sent by the > similarity, using the copyBytes method. copyBytes doesn=92t change the length of the BytesRef, so two unequal = BytesRef instances cannot become equal solely through a copyBytes call, by my = reading? > I could also have used the > string representation of the BytesRef, but this appears to be slower > than copying the bytes, by a magnitude of about 2.5. Not all bytes are valid representations of Strings, so don=92t do this = unless you are very sure you are dealing with character data and know the = encoding. It=92s also not surprising that this is slower, given that creating a = String not only involves copying all the bytes but also decoding them into = characters. What differently-sized byte arrays would you expect to compare as = equals? Best, Steven --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org