lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikhil Goel <nikhil.g...@gmail.com>
Subject Re: Lucene index integrity during a system crash
Date Sat, 16 Jul 2005 21:58:07 GMT
hey Jian, 

Thats a very good thread to start and we faced the similar situation
in our production system where Lunce Index got actually corrupted coz
of non-atomiticity of wrting the index.

Your observation is correct and the only problem that could happen is
there will be zombie segments in your index since they dont get listed
in segments file before the system crash. But i am giving one warning
here, we have seen a case where somehow one segment file entry (.fdx
file) entry is there in "Segments" file but that .fdx file has a size
of 200 but in actual there was nothing in the file and hence we get
past EOF. After lots of inspection, we still couldnt figure out why
that happened. I tried to post that query to this newsgroup but
unfortunately i got no reply and it made us to stop indexing for a
while.

The approach we are following now is to write index in Database and
doing it in a transaction and hence we commit the transaction only
when the segments file and delete file gets updated otherwise we
rollback. This solution has been working well for us but its giving a
slow performance but better than losing the entire index.

I will be glad if someone can give better reasoning abt corruption. I
have seen lots of posts on this group abt it but no one really
responds to this important question.

Please let me know if you have something more to add to my explanation.
Thanks.
Nikhil


On 7/16/05, jian chen <chenjian1227@gmail.com> wrote:
> Hi, Otis,
> 
> Thanks for your email. As this is very important for using Lucene in
> our production system, I looked at the code to try to understand. Here
> is my observation why the index won't be corrupted during a system
> crash.
> 
> In the IndexWriter.java mergeSegments(...) method, there are two lines:
> segmentInfos.write(directory);    // commit before deleting
> deleteSegments(segmentsToDelete);//delete unused segments
> 
> The sgementInfos.write(...) writes the new segments file as
> "segments.new", once the write is complete, it renames "segments.new"
> to "segments".
> 
> I guess the rename operation is atomic as guaranteed by the operating
> system. Otherwise, the "segments" file will be left in an inconsistent
> state during the system crash.
> 
> It also appears to me that the "segments" file is the single point to
> switch from old set of index segments to new ones. In case of a system
> failure, the old "segments" file will be used anyway, so, no
> corruption.
> 
> Is this understanding correct and thorough?
> 
> Thanks a lot,
> 
> Jian
> 
> On 7/16/05, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
> > The only corruption that I've seen mentioned on this list so far was
> > the corruption of the segments file, and even that people have been
> > able to manually edit with a hex editor.
> >
> > Otis
> >
> >
> > --- jian chen <chenjian1227@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I know Lucene does not have transaction support at this stage.
> > > However, I want to know what will happen if there is an operating
> > > system crash during the indexing process, will the Lucene index got
> > > corrupted?
> > >
> > > Thanks,
> > >
> > > Jian
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message