lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1043) Speedup merging of stored fields when field mapping "matches"
Date Fri, 02 Nov 2007 19:39:51 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539682
] 

Michael McCandless commented on LUCENE-1043:
--------------------------------------------

{quote}
Future optimizations could include bulk copying multiple documents at once (all ranges between
deleted docs). The speedup would probably be greatest for small docs, but I'm not sure if
it would be worth it or not.
{quote}

Ooh, I like that idea!  I'll explore that.

{quote}
More controversial: maybe even expand the number of docs that can be bulk copied by not bothering
removing deleted docs if it's some very small number (unless it's an optimize). This is probably
not worth it.
{quote}

That's a neat idea too but I agree likely not worth it.

Another idea: we can *almost* just concatenate the posting lists
(frq/prx) for each term, because they are "delta coded" (we write the
delta between docIDs).  The only catch is you have to "stitch up" the
boundary: you have to read the docID from the start of the next
segment, write the delta-code, then you can copy the remaining bytes.
I think this could be a big win especially when merging larger
segments.


> Speedup merging of stored fields when field mapping "matches"
> -------------------------------------------------------------
>
>                 Key: LUCENE-1043
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1043
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-1043.patch
>
>
> Robert Engels suggested the following idea, here:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/54217
> When merging in the stored fields from a segment, if the field name ->
> number mapping is identical then we can simply bulk copy the entire
> entry for the document rather than re-interpreting and then re-writing
> the actual stored fields.
> I've pulled the code from the above thread and got it working on the
> current trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message