lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Causse <dcau...@spotter.com>
Subject Re: IndexReader.Terms - internals
Date Mon, 11 May 2009 16:21:00 GMT
Hi,
We noticed this behaviour also, so we do like this :

Map<Term, Integer> result = new HashMap<Term, Integer>();
TermEnum all;
if(matcher.fullScan()) {
        all = reader.terms(new Term(field));
} else {
        all = reader.terms(new Term(field, matcher.prefix()));
}
if(all == null) return result;
Term t;
do {
        t = all.term();
        if(t != null && matcher.match(t.text()))
                result.put(t,all.docFreq());

} while(all.next() && all.term().field() == field && (matcher.fullScan() 
? true : t.text().startsWith(matcher.prefix())));
return result;

matcher is an application level object it is designed to match complex 
word. So we loop on the TermEnum until we consider we reached the end of 
interesting information.
To summarize: you stop the loop when
1. there is no more data in TermEnum
2. the field is not the same (don't forget to intern String field if it 
comes from outside)
3. you reached non-matching Terms by checking a prefix.

If there is better way to do I'd be glad to hear of.

David.

Ian Vink a écrit :
>             IndexReader rdr = IndexReader.Open(myFolder);
>             TermEnum terms = rdr.Terms((new Term(myTermName, "")));
>
> (from .NET land, but it's all the same)
>
> This code works great, I can loop thru the terms nicely, but after it
> returns all the myTermName terms, it goes into all other terms.
>
> Is there a way to limit the rdr.Terms to return only those whose field is
> myTermName
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message