lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shay Banon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1536) if a filter can support random access API, we should use it
Date Fri, 05 Nov 2010 22:50:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928859#action_12928859
] 

Shay Banon commented on LUCENE-1536:
------------------------------------

Hi Mike,

   Wondering what are your thoughts on fixing filters correctly are? I think that the initial
thought of getting filters all the way down to postings enumeration if they support random
access is a great one. A random access doc id set can be added (interface), and if a filter
returns it (can be checked using instanceof), then the that doc set can be passed all the
way to the enumeration (and intersected per doc with the deleted docs).

   I think that any type of solution should support the great feature of Lucene queries, for
example, FilteredQuery should use that, allowing to build complex query expressions without
having the mentioned optimization only applied on the top level search.

   As most filters results do support random access, either because they use OpenBitSet, or
because they are built on top of FieldCache functionality, I think this feature will give
great speed improvements to the query execution time.

> if a filter can support random access API, we should use it
> -----------------------------------------------------------
>
>                 Key: LUCENE-1536
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1536
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, LUCENE-1536.patch,
LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
>     10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
>     means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
>     AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
>     95, 98, 99, 99.99999 (filter is non-null but all bits are set),
>     100 (filter=null, control)).
>   * Method high means I use random-access filter API in
>     IndexSearcher's main loop.  Method low means I use random-access
>     filter API down in SegmentTermDocs (just like deleted docs
>     today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
>     "high" (ie in IndexSearcher's search loop).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message