lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Term numbering and range filtering
Date Mon, 10 Nov 2008 21:58:51 GMT
Op Monday 10 November 2008 22:21:20 schreef Tim Sturge:
> Hmmm -- I hadn't thought about that so I took a quick look at the
> term vector support.
>
> What I'm really looking for is a compact but performant
> representation of a set of filters on the same (one term field).
> Using term vectors would mean an algorithm similar to:
>
> String myfield;
> String myterm;
> TermVector tv;
> for (int i = 0 ;  i < maxDoc ; i++) {
>     tv = reader.getTermFreqVector(i,country)
>     if (tv.indexOf(myterm) != -1) {
>           // include this doc...
>         }
> }
>
> The key thing I am looking to achieve here is performance comparable
> to filters. I suspect getTermFremVector() is not efficient enough but
> I'll give it a try.
>

Better use a TermDocs on myterm for this, have a look at the code of
RangeFilter.

Filters are normally created from a slower query by setting a bit in an 
OpenBitSet at "include this doc". Then they are reused for their speed.

Filter caching could help. In case memory becomes a problem
and the filters are sparse enough, try and use SortedVIntList
as the underlying data structure in the cache. (Sparse enough means
less than 1 in 8 of all docs available the index reader.)
See also LUCENE-1296 for caching another data structure than the
one used to collect the filtered docs.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message