Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 68282 invoked from network); 26 Oct 2009 09:49:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Oct 2009 09:49:28 -0000 Received: (qmail 16180 invoked by uid 500); 26 Oct 2009 09:49:26 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 16086 invoked by uid 500); 26 Oct 2009 09:49:26 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 16075 invoked by uid 99); 26 Oct 2009 09:49:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Oct 2009 09:49:26 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [85.25.71.29] (HELO mail.troja.net) (85.25.71.29) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Oct 2009 09:49:15 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.troja.net (Postfix) with ESMTP id 1638BD36009 for ; Mon, 26 Oct 2009 10:48:56 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at mail.troja.net Received: from mail.troja.net ([127.0.0.1]) by localhost (megaira.troja.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ONwfaah1Mk5q for ; Mon, 26 Oct 2009 10:48:45 +0100 (CET) Received: from VEGA (unknown [134.102.249.78]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.troja.net (Postfix) with ESMTPSA id 3DEDED36003 for ; Mon, 26 Oct 2009 10:48:45 +0100 (CET) From: "Uwe Schindler" To: References: <5d53d5770905100803q40ecd7faq15e0d3f3f421e8ef@mail.gmail.com> <16DFAC638276409D8EDAD84AF0C48951@VEGA> <26056543.post@talk.nabble.com> Subject: RE: Distinct terms values? (like in Luke) Date: Mon, 26 Oct 2009 10:48:44 +0100 Message-ID: <1F19E6E965FC41ADB162EA5E7193DDE5@VEGA> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579 In-Reply-To: Thread-Index: AcpWHgz2A28FuuFhSNew81EFYUOt8QAAgaiAAAAynpA= X-Virus-Checked: Checked by ClamAV on apache.org I forgot, an alternative to this is to use the FieldCache parsers, which automatically throw an RuntimeException, if a lower precision value is in term to stop iteration in the FieldCache uninversion: try { while (next != null && next.field().equals("trie")) { ints.add(FieldCache.NUMERIC_UTILS_INT_PARSER.parseInt(next.text())); next = termEnum.next() ? termEnum.term() : null; } } catch (RuntimeException e) {} See the code of FieldCacheImpl that does exactly that. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Uwe Schindler [mailto:uwe@thetaphi.de] > Sent: Monday, October 26, 2009 10:43 AM > To: java-user@lucene.apache.org > Subject: RE: Distinct terms values? (like in Luke) > > > @Test > > public void distinct() throws Exception { > > RAMDirectory directory = new RAMDirectory(); > > IndexWriter writer = new IndexWriter(directory, new > > WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED); > > > > for (int l = -2; l <= 2; l++) { > > Document doc = new Document(); > > doc.add(new Field("text", "the big brown", Field.Store.NO, > > Field.Index.ANALYZED)); > > doc.add(new NumericField("trie", Field.Store.NO, > > true).setIntValue(l)); > > writer.addDocument(doc); > > } > > > > writer.close(); > > > > IndexReader reader = IndexReader.open(directory, true); > > TermEnum termEnum = reader.terms(new Term("trie", "")); > > Term next = termEnum.term(); > > List ints = new ArrayList(); > > > > while (next != null && next.field().equals("trie")) { > > ints.add(NumericUtils.prefixCodedToInt(next.text())); > > next = termEnum.next() ? termEnum.term() : null; > > } > > > > reader.close(); > > > > log.info(ints.toString()); > > } > > > > ==> [-2, -1, 0, 1, 2, -16, 0, -256, 0, -4096, 0, -65536, 0, -1048576, 0, > > -16777216, 0, -268435456, 0] > > You can add a check in your while statement to break iteration, if the > next > lower precision is used: > > while (next != null && next.field().equals("trie") && > next.term().charAt(0) > == NumericUtils.SHIFT_START_INT)... > > use the same constant for float, and SHIFT_START_LONG for long and double. > > This should work. Maybe we add a method to NumericUtils that checks this > and > returns true/false if the term is not of highest precision. > > Uwe > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org