lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Segments file disappears, index no longer functions.
Date Wed, 21 Jan 2009 10:42:37 GMT

Does Lucene.net allow you to set an infoStream on the writer, so that  
it gives details about when it's merging, committing, deleting, etc?   
If so, can you capture & post that?

Do you know when the segments file disappears?  Is it while an  
IndexWriter is open, or, on closing the IndexWriter?

Are you using IndexReader to do deletions?

Are you using a custom deletion policy?

Mike

adakos wrote:

>
> Hello everyone!
>
> I have implemented Lucene .Net, and have had it functioning very  
> well for
> quite some time.
>
> I have an indexing server application that indexes a very large file  
> system,
> and a searching application that the users use to search the index  
> created
> by the server application.
>
> Our index is currently ~4 GB s in size, and we have roughly 1/2  
> million
> documents that we are indexing/updating regularly.
>
> As of late, we have been having a problem with the segments file
> disappearing.
>
> I originally thought it was the indexing server crashing or  
> encountering
> some kind of error, but this wasn't the case, I ran the program in  
> debug
> mode and found that the indexing server itself didn't fault at any  
> point,
> and the indexing program itself ran into the problem of not being  
> able to
> find the segment file as well.
>
> Unfortunately every time I run the Indexing program, this problem  
> occurs.
> So as a result of running the indexer we encounter the issue with the
> segments file being deleted or disappearing, so the indexer is  
> causing the
> issue, but there doesn't appear to be any reason why.
>
> I have optimised the index and ran the program again and that still  
> doesn't
> help.
>
> All the index writers/readers have appropriately coded .Close()  
> methods in a
> try/catch/finally.
>
> Like I said, the indexer was running perfectly fine for a very long  
> period
> of time.  The only thing I can see that's changed since we started  
> using it
> is the index size getting bigger.
>
> Its obviously quite a critical problem because our users can only  
> search on
> the outdated index, and I haven't been able to find anything on this  
> issue
> anywhere.
>
> I am hoping someone might be able to figure out what's going on.
>
> The error that is received is basically when an index writer/reader/ 
> searcher
> attempts to open, it reports that it cant find the segments file.
>
> Is there any known issue where this occurs?  I know I am using  
> the .Net
> implementation but I would assume that lucene would be quite universal
> across different platforms.  I have noticed that there doesn't  
> appear to be
> much support for the .Net version, or at least I have not been able  
> to find
> any.
>
> If it helps any, below are the methods used for indexing
>
> Full Index Update
>
> Directories are searched recursively
> For each file we check to see if it is already in the index  
> (comparing size,
> modified time, etc)
> If it does, then we ignore the file
> If it doesn't, we delete the one currently in the index, then add the
> updated file
>
> Email Parsing and Check Summing
>
> I also have an email parser, and a check summer that grabs any email
> addresses from the document and calculates the checksum of the text to
> attempt to avoid duplicate documents.  If there is a document with  
> the same
> email and checksum then the document is stored but is marked as a  
> duplicate.
>
> File System Watcher
>
> Once the full text index is finished, then the indexer begins to  
> process
> files in que that are generated from a file system watcher, the file  
> system
> watcher runs constantly so indexing is done in a live state.
>
>
> -- 
> View this message in context: http://www.nabble.com/Segments-file-disappears%2C-index-no-longer-functions.-tp21579880p21579880.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>


Mime
View raw message