lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: potential indexing perormance improvement for compound index - cut IO - have more files though
Date Sun, 17 Dec 2006 07:31:55 GMT
Doug Cutting wrote:
> > Therefore, a "semi compound" segment file can be defined, that would be
> > made of 4 files (instead of 1):
> > - File 0: .fdx .tis .tvx
> > - File 1: .fdt .tii .tvd
> > - File 2: .frq .tvf
> > - File 3: .fnm .prx .fN
>
> I think this is a promising direction.  Perhaps instead of adding a
> third index format, we can significantly improve the non-compound format
> without too much effort.  For example, simply writing all the norms into
> a single file could have a large impact on total file handles and would
> be a rather simple change.  We could start with that, then see if there
> are further incremental improvements to be had.

We can start with that - at least it would set the number of segment files
to a fixed number - 11 - currently it depends on the number of fields with
norms.

One advantage of keeping the a plain non-compound format is educational /
debugging - it is often helpful to actually see the files being created on
disk. (Although, just concatenating all norms to a single file is simple
enough in that regard.)




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message