lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: possible segment merge improvement?
Date Thu, 01 Nov 2007 14:10:27 GMT
On 11/1/07, Michael McCandless <lucene@mikemccandless.com> wrote:
> "robert engels" <rengels@ix.netcom.com> wrote:
>
> > Why not check the fields dictionary for the segments being merged,
> > and if the same, just copy the binary data directly?
>
> +1
>
> While Lucene does not have a global field schema/semantics, unlike eg
> KinoSearch, I think for many apps the fields are in fact static.
>
> In KinoSearch, merging of stored fields & term vectors is always a
> fast concatenation of the entry for that document, whereas Lucene must
> re-interpret/re-number all fields on the doc, in general.  In fact I
> think that KinoSearch stores field names directly in the index (ie,
> not numbers).
>
> If we make this change to Lucene then for those apps that effectively
> have a static field schema (because all docs always have matching
> fields), we can get the same performance that KinoSearch always gets
> during its merging of stored fields & term vectors.

Does "all docs have matching fields" mean that the fields must be
present (as well as identically typed) on each doc, or could they
still be sparse?  If they can be sparse, how do you avoid
renumbering???

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message