lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Distinct terms values? (like in Luke)
Date Mon, 26 Oct 2009 09:48:44 GMT
I forgot, an alternative to this is to use the FieldCache parsers, which
automatically throw an RuntimeException, if a lower precision value is in
term to stop iteration in the FieldCache uninversion:

 try {
   while (next != null && next.field().equals("trie")) {
     ints.add(FieldCache.NUMERIC_UTILS_INT_PARSER.parseInt(next.text()));
     next = termEnum.next() ? termEnum.term() : null;
   }
 } catch (RuntimeException e) {}

See the code of FieldCacheImpl that does exactly that.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Monday, October 26, 2009 10:43 AM
> To: java-user@lucene.apache.org
> Subject: RE: Distinct terms values? (like in Luke)
> 
> >     @Test
> >     public void distinct() throws Exception {
> >         RAMDirectory directory = new RAMDirectory();
> >         IndexWriter writer = new IndexWriter(directory, new
> > WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED);
> >
> >         for (int l = -2; l <= 2; l++) {
> >             Document doc = new Document();
> >             doc.add(new Field("text", "the big brown", Field.Store.NO,
> > Field.Index.ANALYZED));
> >             doc.add(new NumericField("trie", Field.Store.NO,
> > true).setIntValue(l));
> >             writer.addDocument(doc);
> >         }
> >
> >         writer.close();
> >
> >         IndexReader reader = IndexReader.open(directory, true);
> >         TermEnum termEnum = reader.terms(new Term("trie", ""));
> >         Term next = termEnum.term();
> >         List<Integer> ints = new ArrayList<Integer>();
> >
> >         while (next != null && next.field().equals("trie")) {
> >             ints.add(NumericUtils.prefixCodedToInt(next.text()));
> >             next = termEnum.next() ? termEnum.term() : null;
> >         }
> >
> >        reader.close();
> >
> >         log.info(ints.toString());
> >     }
> >
> > ==> [-2, -1, 0, 1, 2, -16, 0, -256, 0, -4096, 0, -65536, 0, -1048576, 0,
> > -16777216, 0, -268435456, 0]
> 
> You can add a check in your while statement to break iteration, if the
> next
> lower precision is used:
> 
> while (next != null && next.field().equals("trie") &&
> next.term().charAt(0)
> == NumericUtils.SHIFT_START_INT)...
> 
> use the same constant for float, and SHIFT_START_LONG for long and double.
> 
> This should work. Maybe we add a method to NumericUtils that checks this
> and
> returns true/false if the term is not of highest precision.
> 
> Uwe
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message