lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Staveley (Tom)" <rstave...@seseit.com>
Subject RE: Seeing what's occupying all the space in the index
Date Fri, 26 May 2006 18:11:55 GMT
Interesting. I am explicitly turning on the compound file format when I
start my application, but I am suspicious about my optimizing thread. It
*ought* to be optimising every 30 minutes, using thread synchronisation to
prevent the writer from trying to write while optimisation takes place, but
it is possible that I'm screwing up there (I'll add some diagnostics to
check that optimisation and index writing are mutually exclusive). When I
stopped my daemon and manually optimised, it took 11 minutes to optimise the
index. Is your understanding that .fdt, .frq and .prx files are working
files pre-optimisation and then when optimize() is called they should all
get absorbed into the .cfs? Manual optimisation only clawed back 1G, but I
didn't look to see if .fdt, .frq and .prx files were absorbed into the .cfs
files in the process. I'll investigate that now.

> Can you try a smaller sample in a clean directory and see what size it is
(so that it doesn't take as long to index)?

I'll try tee-ing off a message feed and index in a new index. I'm working
with a live message feed.

-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@syr.edu] 
Sent: 26 May 2006 18:38
To: java-user@lucene.apache.org
Subject: Re: Seeing what's occupying all the space in the index

It seems odd to me that if you are using the CFS format, why you would have
the .fdt, .frq and .prx files in addition to the .cfs files.  My
understanding is all files (except deletable and segment) get put inside of
the CFS file.  Looking at my indices, I only have the CFS file.  Are you
optimizing your indices after you are done indexing?  Are you turning off
compound file format?

Can you try a smaller sample in a clean directory and see what size it is
(so that it doesn't take as long to index)?

Mime
View raw message