lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Vladimirov (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-2362) Add support for slow filters with batch processing
Date Fri, 02 Apr 2010 19:36:27 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Vladimirov updated LUCENE-2362:
--------------------------------------

    Attachment: BatchFilter.java
                IndexSearcherImpl.java

Example of batch slow filter interface and IndexSearch implementation.

May be it is possible to split Filter to several interfaces and allow user to select concrete
way to implement it. Like:

- Filter (interface)
  -- Fast Filter (current one)
  -- Slow Filter (new one, like the one in attachment)

> Add support for slow filters with batch processing
> --------------------------------------------------
>
>                 Key: LUCENE-2362
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2362
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0.1
>            Reporter: Sergey Vladimirov
>         Attachments: BatchFilter.java, IndexSearcherImpl.java
>
>
> Internal implementation of IndexSearch assumes that Filter and scorer has almost equal
perfomance. But in our environment we have Filter implementation that is very expensive (in
compare to scorer).
> if we have, let's say, 2k of termdocs selected by scorer (each ~250 docs) and 2k selected
by filter, then 250k docs will be fastly checked (and filtered out) by scorer, and 250k docs
will be slowly checked by our filter.
> Using straigthforward implementation makes search out of 60 seconds per query boundary,
because each next() or advance() requires N queries to database PER CHECKED DOC. Using read
ahead technique allows us to optimze it to 35 seconds per query. Still too slow.
> The solution to problem is firstly select all documents by scorer and filter them in
batch by our filter. Example of implementation (with BitSet) in attachement. Currently it
takes only ~300 millseconds per query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message