Thank you very much for the detailed information everyone!
I will try to use the information to make my code better.
I have parsed out the optimization bits into a commandline app that runs the
optimize on another box. Its messy, but effective in keeping downtime to a
minimum. This will get the large amount of segment files under control for
now. Too bad it takes a week or more. Hopefully I will not have to reindex
it anytime soon.
I think the best way around this is transaction/agent based for the future.
That way, I can keep a read only copy for searching.
My app currently uses two services, one for writes and one for reads.
I suspect that this may be the problem that is causing the corruption.
Does anyone have any experience with this type of setup, and has seen/knows
that this can cause a corrupted lucene index?
I have heard that having more than one service attached at a time causes the
problem I am seeing.
Thanks for the links to the old Luke distros, and thanks for all the quick
responses!
Hugh
Andrzej Bialecki wrote:
>
> lowfreq wrote:
>> I have a Lucene index that is very large in size.
>> It was created using a pre 2.1 version of Lucene.net 2.0.0.4.
>>
>> The index is currently almost 20 GB, and has almost 7000 segment files.
>> The problem I am having is that I need to optimize it, and cant do this
>> without the search functionality of my app being down for a week.
>>
>> I used the Luke tool from getopt.org and it worked flawlessly, optimizing
>> the index in just over 2 hours. Problem is that my search cannot use it,
>> and
>> the error states Unknown Format Version errors, or just plain nothing
>> found.
>
> You should be careful when using Lucene Java to modify Lucene.Net
> indexes. I know for a fact that deflated data in Lucene Java is
> incompatible with the deflater implementation in .Net, so it's easy to
> create an incompatible index even when you use a supposedly compatible
> version of Lucene Java. Perhaps versions around 2.0 still worked ok, but
> no guarantees.
>
>
>>
>> I understand that versions of Lucene that are newer than what the index
>> was
>> built and is searched with can cause problems.
>>
>> What can I do to make this work? I have tried older versions of Luke, 0.7
>> was the oldest I could lay hands on, but even it uses a newer version of
>> Lucene.
>
> Here are links to older versions of Luke:
>
> http://www.getopt.org/luke/luke-0.1.zip
> http://www.getopt.org/luke/luke-0.2.zip
> http://www.getopt.org/luke/luke-0.3.zip
> http://www.getopt.org/luke/luke-0.4.zip
> http://www.getopt.org/luke/luke-0.5/luke-0.5.jar
> http://www.getopt.org/luke/luke-0.5/luke-src-0.5.zip
> http://www.getopt.org/luke/luke-0.6/lukeall-0.6.jar
> http://www.getopt.org/luke/luke-0.6/luke-src-0.6.zip
>
>
>>
>> My index version shows as 633103800023469045. The version the index is
>> written as after optimizing with Luke 7.0 is 633103800023469057.
>
> This is just a timestamp, so it doesn't say what version of Lucene
> created the index. If you open the index with Luke, in the Overview tab
> there is a line that tells what is the index format version.
>
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
--
View this message in context: http://www.nabble.com/Optimization-and-Corruption-Issues-tp25697034p25705907.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
|