lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: potential indexing perormance improvement for compound index - cut IO - have more files though
Date Thu, 21 Dec 2006 07:43:58 GMT
Doron Cohen wrote:
> Doug Cutting wrote:
> > > Therefore, a "semi compound" segment file can be defined, that would
be
> > > made of 4 files (instead of 1):
> > > - File 0: .fdx .tis .tvx
> > > - File 1: .fdt .tii .tvd
> > > - File 2: .frq .tvf
> > > - File 3: .fnm .prx .fN
> >
> > I think this is a promising direction.  Perhaps instead of adding a
> > third index format, we can significantly improve the non-compound
format
> > without too much effort.  For example, simply writing all the norms
into
> > a single file could have a large impact on total file handles and would
> > be a rather simple change.  We could start with that, then see if there
> > are further incremental improvements to be had.
>
> We can start with that - at least it would set the number of segment
files
> to a fixed number - 11 - currently it depends on the number of fields
with
> norms.

Okay, started with this step - see issue 756
http://issues.apache.org/jira/browse/LUCENE-756

>
> One advantage of keeping the a plain non-compound format is educational /
> debugging - it is often helpful to actually see the files being created
on
> disk. (Although, just concatenating all norms to a single file is simple
> enough in that regard.)


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message