lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Term numbering and range filtering
Date Wed, 19 Nov 2008 00:40:23 GMT
Op Wednesday 19 November 2008 00:43:56 schreef Tim Sturge:
> I've finished a query time implementation of a column stride filter,
> which implements DocIdSetIterator. This just builds the filter at
> process start and uses it for each subsequent query. The index itself
> is unchanged.
>
> The results are very impressive. Here are the results on a 45M
> document index:
>
> Firstly without an age constraint as a baseline:
>
> Query "+name:tim"
> startup: 0
> Hits: 15089
> first query: 1004
> 100 queries: 132 (1.32 msec per query)
>
> Now with a cached filter. This is ideal from a speed standpoint but
> there are too many possible start/end combinations to cache all the
> filters.
>
> Query "+name:tim age:[18 TO 35]" (ConstantScoreQuery on cached
> RangeFilter) startup: 3
> Hits: 11156
> first query: 1830
> 100 queries: 287 (2.87 msec per query)
>
> Now with an uncached filter. This is awful.
>
> Query "+name:tim age:[18 TO 35]" (uncached ConstantScoreRangeQuery)
> startup: 3
> Hits: 11156
> first query: 1665
> 100 queries: 51862 (yes, 518 msec per query, 200x slower)
>
> A RangeQuery is slightly better but still bad (and has a different
> result set)
>
> Query "+name:tim age:[18 TO 35]" (uncached RangeQuery)
> startup: 0
> Hits: 10147
> first query: 1517
> 100 queries: 27157 (271 msec is 100x slower than the filter)
>
> Now with the prebuilt column stride filter:
>
> Query "+name:tim age:[18 TO 35]" (ConstantScoreQuery on prebuilt
> column stride filter)

With "Allow Filter as clause to BooleanQuery":
https://issues.apache.org/jira/browse/LUCENE-1345
one could even skip the ConstantScoreQuery with this.
Unfortunately 1345 is unfinished for now.

> startup: 2811
> Hits: 11156
> first query: 1395
> 100 queries: 441 (back down to 4.41msec per query)
>
> This is less than 2x slower than the dedicated bitset and more than
> 50x faster than the range boolean query.
>
> Mike, Paul, I'm happy to contribute this (ugly but working) code if
> there is interest. Let me know and I'll open a JIRA issue for it.

In case you think more performance improvements based on this
are possible...

Regards,
Paul Elschot.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message