lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
Date Sun, 18 Jan 2009 14:06:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664965#action_12664965
] 

Mark Miller commented on LUCENE-1483:
-------------------------------------

I think its pretty costly even for non id type fields. In your enum case, their are what,
50 unique  values? Even still, you are seeing like a 40% diff, but small enough times to not
matter.

My test example has 20,000 unique terms for 600,000 documents (lots of overlap, 2-8 char strings,
1-9, I think), so quite a bit short of a primary key - but it still was WAY faster with the
new method.

Old method non optimized, 79 segments - 1.5 million seeks, WAY slow.
Old method, optimized, 1 segment - 20,000 seeks, pretty darn fast.
New method, non optimized, 79 segments - 40,000 seeks, pretty darn fast.


bq.    While there is a big difference between searching a single segment vs multisegments
for these things, we already knew about that - thats why you optimize.

{quote}Right, but for realtime search you don't have the luxury of
optimizing. This patch makes warming time after reopen much faster
for a many-segment index for apps that use FieldCache with mostly unique String
fields.{quote}

Right, I got you - I know we can't optimize. I was just realizing that explaining why 100
segments was so slow was not explaining why the new method on 100 segments was so fast. I
still don't think I fully have why that is. I don't think getting to use the unique terms
at each segment saves enough seeks for what I am seeing. Especially in this test case, the
terms should be pretty evenly distributed across segments...


> Change IndexSearcher multisegment searches to search each individual segment using a
single HitCollector
> --------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1483
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1483
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.9
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, LUCENE-1483.patch,
LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing for individual
segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message