lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1410) PFOR implementation
Date Sun, 12 Oct 2008 19:50:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638888#action_12638888
] 

Michael McCandless commented on LUCENE-1410:
--------------------------------------------

bq. It should really make a difference for stop words and disjunction queries depending on
DocIdSetIterator.next().

Yes.

bq. Conjunctions that depend on skipTo(docNum) will probably make it necessary to impose an
upperbound the size of the compressed arrays.

Yes.  Though, I think when optimizing search performance we should
focus entirely on the high-latency queries.  TermQuery on very
frequent terms, disjunctions queries involving common terms,
phrase/span queries that have many matches, etc.

EG if PFOR speeds up high-latency queries say by 20% (say 10 sec -> 8
sec), but causes queries that are already fast (say 30 msec) to get a
bit slower (say 40 msec) I think that's fine.  It's the high-latency
queries that kill us because those ones limit how large a collection
you can put on one box before you're forced to shard your index.

At some point we should make use of concurrency when iterating over
large result sets.  EG if estimated # total hits is > X docs, use
multiple threads where each threads skips to it's own "chunk" and
iterates over it, and then merge the results.  Then we should be able
to cut down on the max latency query and handle more documents on a
single machine.  Computers are very quickly become very concurrent.

bq. I'm wondering whether it would make sense to add skip info to the term positions of very
large documents. Any ideas on that?

Probably we should -- yet another issue :)


> PFOR implementation
> -------------------
>
>                 Key: LUCENE-1410
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1410
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, TestPFor2.java,
TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message