lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nico Krijnen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export
Date Thu, 11 Nov 2010 14:43:19 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931024#action_12931024
] 

Nico Krijnen commented on LUCENE-2729:
--------------------------------------

We finished another run today and got some interesting results with from the additional logging.
The log times are searchable in the log file...

- 09:40:21,996: we are creating a snapshot for our backup, segments filename used for snapshot:
'segments_2j'
On the next line we can see that at that moment the segments_2j is 2353 bytes. We log the
filenames in the index + the (filesizes).

- 11:03:27,653: segments_2j is written to disk
_How can this be while this is the name of the segments filename used to create our Snapshot?_
On the next line we see that it has indeed been written to disk, but has a 0 byte file size.

- 11:08:02,731: First "read past EOF" exception
_It looks like segments filenames are reused and at a point in time well after our Snapshot
is created, the name 'segments_2j' is reused?
This somehow causes the segments file to get a size of 0 bytes. Perhaps it is being deleted
and written at the same time, or because of the name clash there's no data to be written to
the segments file?_

- 11:16:03,300: our backup is finished.

Here is an excerpt of some of the more interesting log entries:

{quote}
09:39:30,719 [http-8080-2] DEBUG com.ds.acm.logic.tools.BackupManager - Start asset index
backup

09:40:21,996 [http-8080-2] DEBUG proj.zoie.impl.indexing.internal.ZoieIndexDeletionPolicy
- Ctor Snapshot. segments filename: *segments_2j*, commit filenames: [_5f.tvd, _4c.tvf, _5f.tvf,
_4d.fdx, _4p.tis, _4c.tvd, _4p.frq, _5e.tvx, _39.tis, _3u.frq, _4d.fdt, _4p.tii, _5f.fnm,
_4c.tvx, _39.tii, _55.tis, _2q.fnm, _4p.fdx, _2q.tvf, _5e.tvf, _5e.tvd, _2q.tvd, _5f.tvx,
_4p.fdt, _55.tii, _55.nrm, _55.frq, _2q.tvx, _3u.tvx, _5f.fdx, _4c.frq, _5f.fdt, _4c.fdx,
_2q.nrm, _4c.fdt, _4p.tvf, _4c.nrm, _3v.tvf, _4c.tii, _3u.prx, _4p.tvd, _4c.tis, _3u.fdt,
_5e.fnm, _3v.tvd, _3u.fdx, _55.tvx, _39.fdt, _39.fnm, _5e.prx, _3u.tis, *segments_2j*, _3v.prx,
_2q.fdx, _4c.prx, _4d.prx, _4p.tvx, _39.fdx, _3u.tii, _2q.fdt, _4d.frq, _55.fdt, _2q.prx,
_5e.tii, _3u.fnm, _55.fdx, _5e.nrm, _5f.tii, _5e.tis, _4d.nrm, _4d.tii, _5e.fdt, _4d.tis,
_5f.frq, _5e.fdx, _39.tvd, _39.tvf, _55.fnm, _3v.frq, _2q.frq, _2q.tii, _3v.tii, _5f.prx,
_4p.nrm, _5f.nrm, _2q.tis, _3v.tvx, _4c.fnm, _3v.nrm, _55.tvf, _55.tvd, _3u.nrm, _5f.tis,
_3v.tis, _3v.fdt, _4d.fnm, _4d.tvf, _3v.fdx, _4d.tvd, _4d.tvx, _4p.prx, _3v.fnm, _5e.frq,
_39.prx, _39.tvx, _39.frq, _4p.fnm, _39.nrm, _3u.tvf, _55.prx, _3u.tvd]

11:03:27,653 [http-8080-2] TRACE com.ds.acm.engine.search.zoieimpl.core.DirectoryManager -
Writing *segments_2j*, directory contents: _2q.fdt(969134416), _2q.fdx(36652), _2q.fnm(276),
_2q.frq(4685726), _2q.nrm(9166), _2q.prx(393230403), _2q.tii(7447), _2q.tis(746299), _2q.tvd(8394),
_2q.tvf(599185081), _2q.tvx(73300), _39.fdt(2061261675), _39.fdx(1012), _39.fnm(276), _39.frq(17754579),
_39.nrm(256), _39.prx(121067407), _39.tii(137511), _39.tis(11726653), _39.tvd(185), _39.tvf(233037042),
_39.tvx(2020), _3u.fdt(1595882722), _3u.fdx(63692), _3u.fnm(330), _3u.frq(8001869), _3u.nrm(15926),
_3u.prx(647374863), _3u.tii(11319), _3u.tis(1168399), _3u.tvd(14209), _3u.tvf(986370136),
_3u.tvx(127380), _3v.fdt(2057147455), _3v.fdx(476), _3v.fnm(384), _3v.frq(15201111), _3v.nrm(122),
_3v.prx(109724024), _3v.tii(132491), _3v.tis(11457688), _3v.tvd(114), _3v.tvf(211902147),
_3v.tvx(948), _4c.fdt(2691565961), _4c.fdx(39572), _4c.fnm(276), _4c.frq(18724620), _4c.nrm(9896),
_4c.prx(590255960), _4c.tii(141243), _4c.tis(12185869), _4c.tvd(9894), _4c.tvf(932649779),
_4c.tvx(79140), _4d.fdt(2398908136), _4d.fdx(548), _4d.fnm(354), _4d.frq(24581614), _4d.nrm(140),
_4d.prx(158243133), _4d.tii(141948), _4d.tis(12259425), _4d.tvd(140), _4d.tvf(303769970),
_4d.tvx(1092), _4p.fdt(1081212027), _4p.fdx(212), _4p.fnm(354), _4p.frq(8294102), _4p.nrm(56),
_4p.prx(60513257), _4p.tii(134898), _4p.tis(11376730), _4p.tvd(56), _4p.tvf(116715012), _4p.tvx(420),
_55.fdt(787581180), _55.fdx(156), _55.fnm(354), _55.frq(5376741), _55.nrm(42), _55.prx(39539985),
_55.tii(133483), _55.tis(11351989), _55.tvd(42), _55.tvf(75958725), _55.tvx(308), _5e.prx(0),
_5z.fdt(853614415), _5z.fdx(23700), _5z.fnm(1696), _5z.frq(9041984), _5z.nrm(5928), _5z.prx(32617823),
_5z.tii(146115), _5z.tis(12387143), _5z.tvd(4142), _5z.tvf(74409815), _5z.tvx(47396), _63.fdt(212690811),
_63.fdx(8188), _63.fnm(1652), _63.frq(2492889), _63.nrm(2050), _63.prx(5672093), _63.tii(8470),
_63.tis(689993), _63.tvd(1442), _63.tvf(15371474), _63.tvx(16372), _64.fdt(4118409126), _64.fdx(1484),
_64.fnm(384), _64.frq(35294399), _64.nrm(374), _64.prx(230791431), _64.tii(143860), _64.tis(12491845),
_64.tvd(295), _64.tvf(444939185), _64.tvx(2964), _65.fdt(1369489408), _65.fdx(0), _65.fnm(1749),
index.directory(5), segments.gen(20), segments_2x(2354)
11:03:27,706 [http-8080-2] TRACE com.ds.acm.engine.search.zoieimpl.core.DirectoryManager -
Writing _3v.prx, directory contents: _2q.fdt(969134416), _2q.fdx(36652), _2q.fnm(276), _2q.frq(4685726),
_2q.nrm(9166), _2q.prx(393230403), _2q.tii(7447), _2q.tis(746299), _2q.tvd(8394), _2q.tvf(599185081),
_2q.tvx(73300), _39.fdt(2061261675), _39.fdx(1012), _39.fnm(276), _39.frq(17754579), _39.nrm(256),
_39.prx(121067407), _39.tii(137511), _39.tis(11726653), _39.tvd(185), _39.tvf(233037042),
_39.tvx(2020), _3u.fdt(1595882722), _3u.fdx(63692), _3u.fnm(330), _3u.frq(8001869), _3u.nrm(15926),
_3u.prx(647374863), _3u.tii(11319), _3u.tis(1168399), _3u.tvd(14209), _3u.tvf(986370136),
_3u.tvx(127380), _3v.fdt(2057147455), _3v.fdx(476), _3v.fnm(384), _3v.frq(15201111), _3v.nrm(122),
_3v.prx(109724024), _3v.tii(132491), _3v.tis(11457688), _3v.tvd(114), _3v.tvf(211902147),
_3v.tvx(948), _4c.fdt(2691565961), _4c.fdx(39572), _4c.fnm(276), _4c.frq(18724620), _4c.nrm(9896),
_4c.prx(590255960), _4c.tii(141243), _4c.tis(12185869), _4c.tvd(9894), _4c.tvf(932649779),
_4c.tvx(79140), _4d.fdt(2398908136), _4d.fdx(548), _4d.fnm(354), _4d.frq(24581614), _4d.nrm(140),
_4d.prx(158243133), _4d.tii(141948), _4d.tis(12259425), _4d.tvd(140), _4d.tvf(303769970),
_4d.tvx(1092), _4p.fdt(1081212027), _4p.fdx(212), _4p.fnm(354), _4p.frq(8294102), _4p.nrm(56),
_4p.prx(60513257), _4p.tii(134898), _4p.tis(11376730), _4p.tvd(56), _4p.tvf(116715012), _4p.tvx(420),
_55.fdt(787581180), _55.fdx(156), _55.fnm(354), _55.frq(5376741), _55.nrm(42), _55.prx(39539985),
_55.tii(133483), _55.tis(11351989), _55.tvd(42), _55.tvf(75958725), _55.tvx(308), _5e.prx(0),
_5z.fdt(853614415), _5z.fdx(23700), _5z.fnm(1696), _5z.frq(9041984), _5z.nrm(5928), _5z.prx(32617823),
_5z.tii(146115), _5z.tis(12387143), _5z.tvd(4142), _5z.tvf(74409815), _5z.tvx(47396), _63.fdt(212690811),
_63.fdx(8188), _63.fnm(1652), _63.frq(2492889), _63.nrm(2050), _63.prx(5672093), _63.tii(8470),
_63.tis(689993), _63.tvd(1442), _63.tvf(15371474), _63.tvx(16372), _64.fdt(4118409126), _64.fdx(1484),
_64.fnm(384), _64.frq(35294399), _64.nrm(374), _64.prx(230791431), _64.tii(143860), _64.tis(12491845),
_64.tvd(295), _64.tvf(444939185), _64.tvx(2964), _65.fdt(1369554944), _65.fdx(0), _65.fnm(1749),
index.directory(5), segments.gen(20), *segments_2j(0)*, segments_2x(2354)

IFD [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@7636d59a]: init: load commit
"*segments_2j*"
IW 1110 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@7636d59a]: init: hit exception
on init; releasing write lock
11:08:02,731 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@7636d59a] ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader
- Problem copying segments: *read past EOF*, start=0, end=0, bufferStart=0, bufferPosition=0,
bufferSize=1024

11:16:03,300 [http-8080-2] DEBUG com.ds.acm.logic.tools.BackupManager - Asset index backup
complete
{quote}

Looking in the logs there are *31 occurences of 'delete "segments_2j"' and 32 for 'wrote segments
file "segments_2j"'*. Very weird, since the ZoieIndexDeletionPolicy should prevent the segments_2j
from being deleted...

So just guessing here, but the problem seems to be that segement filenames are being 'rotated'
and getting reused?
The ZoieIndexDeletionPolicy definitely does not seem to take this into account, it uses the
segments filename as a unique id to track the references to a specific commit. But looking
at it, segment filenames are not unique over a certain period of time, and that time period
gets shorter when lots of modifications are being made to the index.
Reading the code it should still prevent any segments_2j commit from being deleted... but
we do see deletes for that file... hmmm

And then something else still seems te be going wrong, somehow the segments file does become
0 bytes. It is not because it was written with no data, we added a check for that (look for
'closing 0kb IndexOutput') but that never happens to a segments_* file.

Maybe some logic needs to be added to prevent certain segment_... numbers from being re-used
if the commit is still 'referenced' by an IndexDeletionPolicy...?

> Index corruption after 'read past EOF' under heavy update load and snapshot export
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2729
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2729
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Happens on both OS X 10.6 and Windows 2008 Server. Integrated with
zoie (using a zoie snapshot from 2010-08-06: zoie-2.0.0-snapshot-20100806.jar).
>            Reporter: Nico Krijnen
>         Attachments: 2010-11-02 IndexWriter infoStream log.zip, LUCENE-2729-test1.patch
>
>
> We have a system running lucene and zoie. We use lucene as a content store for a CMS/DAM
system. We use the hot-backup feature of zoie to make scheduled backups of the index. This
works fine for small indexes and when there are not a lot of changes to the index when the
backup is made.
> On large indexes (about 5 GB to 19 GB), when a backup is made while the index is being
changed a lot (lots of document additions and/or deletions), we almost always get a 'read
past EOF' at some point, followed by lots of 'Lock obtain timed out'.
> At that point we get lots of 0 kb files in the index, data gets lots, and the index is
unusable.
> When we stop our server, remove the 0kb files and restart our server, the index is operational
again, but data has been lost.
> I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. Hopefully
someone has some ideas where to look to fix this.
> Some more details...
> Stack trace of the read past EOF and following Lock obtain timed out:
> {code}
> 78307 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
> java.io.IOException: read past EOF
>     at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
>     at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>     at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
>     at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
>     at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
>     at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:166)
>     at org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
>     at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
>     at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
>     at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> 579336 [proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader@31ca5085] 
>     ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
>     Problem copying segments: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
> org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
>     org.apache.lucene.store.SingleInstanceLock@5ad0b895: write.lock
>     at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>     at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
>     at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:957)
>     at proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
>     at proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
>     at proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
>     at proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
>     at proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
>     at proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
> {code}
> We get exactly the same behavour on both OS X and on Windows. On both zoie is using a
SimpleFSDirectory.
> We also use a SingleInstanceLockFactory (since our process is the only one working with
the index), but we get the same behaviour with a NativeFSLock.
> The snapshot backup is being made by calling:
> *proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*
> Same issue in zoie JIRA:
> http://snaprojects.jira.com/browse/ZOIE-51

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message