lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaoyang Gu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
Date Wed, 24 Nov 2010 17:29:15 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935410#action_12935410
] 

Xiaoyang Gu commented on LUCENE-2729:
-------------------------------------

regarding "I see a patch against RealtimeIndexDataLoader as well. Does it have anything to
do with the 'load manager stopped' errors I encountered?"

This class hasn't been changed since august.

fatal exception during indexing would kill the indexing thread (load manager thread). After
any such exception, you will see that error message.
At that point, we cannot recover and it makes no sense to continue indexing.

In Zoie's mbean, there's a HEALTH level value. When that value turns non-0, the system needs
some serious attention.

You can use JMX to monitor the system health remotely.

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with
zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, backup_force_failure2.log.zip,
eof-extra-logging-4-analysis.txt, eof-extra-logging-4.log.zip, LUCENE-2729-test1.patch, read-past-eof-debugging.zip
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM
system. We use the hot-backup feature of zoie to make scheduled backups of the index. This
works fine for small indexes and when there are not a lot of changes to the index when the
backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being
changed a lot (lots of document additions and/or deletions), we almost always get a 'read
past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is
unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational
again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully
someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a
SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with
the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message