lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: How can I merge .cfx and .cfs into a single cfs file?
Date Wed, 05 May 2010 12:24:03 GMT
Lucene considers an index with a single .cfx and a single .cfs as optimized.

Also, note that how Lucene stores files in the index is an impl detail
-- it can change from release to release -- so relying on any of these
details is dangerous.

That said, with recent Lucene versions, if you really want to force
these two files to be consolidated, you can make a custom MergePolicy
that returns a merge from the findMergesForOptimize for this case (the
default MergePolicy returns null since it thinks this case is already
optimized).

Or... if you index with two separate IndexWriter sessions, and then
call optimize, that should also merge down to one file.

Mike

2010/5/5 张志田 <zhitian.zhang@dianping.com>:
> Uwe, thank you very much.
>
> What is the mechanizm lucene will merge these two kinds of files? Sometimes I found there
was only one .cfs file, but in another time there may be one cfs and cfx. I understand the
.cfx is used to store the term vectors etc, but why does the index result not seem to be consistent?
>
> Thanks,
> Garry
> ----- Original Message -----
> From: "Uwe Goetzke" <uwe.goetzke@healy-hudson.com>
> To: <java-user@lucene.apache.org>
> Sent: Wednesday, May 05, 2010 3:57 PM
> Subject: AW: How can I merge .cfx and .cfs into a single cfs file?
>
>
> Index all into a directory and determine the size of all files in it.
>
> From http://lucene.apache.org/java/3_0_1/fileformats.html
> Starting with Lucene 2.3, doc store files (stored field values and term vectors) can
be shared in a single set of files for more than one segment. When compound file is enabled,
these shared files will be added into a single compound file (same format as above) but with
the extension .cfx.
>
> In addition to
> Compound File  .cfs  An optional "virtual" file consisting of all the other index files
for systems that frequently run out of file handles.
>
> Uwe
>
>
> -----Ursprüngliche Nachricht-----
> Von: 张志田 [mailto:zhitian.zhang@dianping.com]
> Gesendet: Mittwoch, 5. Mai 2010 08:24
> An: java-user@lucene.apache.org
> Betreff: How can I merge .cfx and .cfs into a single cfs file?
>
> Hi all,
>
> I have an index task which will index thousands of records with lucene 3.0.1. My confusion
is lucene will always create a .cfx and a .cfs file in the file system, sometimes more, while
I thought it should create a single .cfs file if I optimize the index data. Is it by design?
If yes, is there any way/configuration I can do to merge all of the index files into a singe
one?
>
> By the way, I have a logic to validate the index data, if the size of .cfs increases
dramatically comparing to the file generated last time, there may be something wrong, a warning
message will be threw. This is the reason that I want to generate a single .cfs file. Any
other suggestion about the index validation?
>
> Any body can give me a hand?
>
> Thanks in advance.
>
> Garry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message