lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2975) MMapDirectory on chunk size boundaries broken
Date Fri, 18 Mar 2011 16:18:29 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008492#comment-13008492
] 

Uwe Schindler commented on LUCENE-2975:
---------------------------------------

Here the output from CheckIndex (wrapped by log4j):

{noformat}
2011-03-18 15:45:02,633 INFO de.pangaea.metadataportal.harvester.Checker - Checking index
"pangaea"...
2011-03-18 15:45:02,953 INFO org.apache.lucene.index.CheckIndex - Segments file=segments_g8o
numSegments=8 version=FORMAT_3_1 [Lucene 3.1]
2011-03-18 15:45:02,955 INFO org.apache.lucene.index.CheckIndex -   1 of 8: name=_xdx docCount=644683
2011-03-18 15:45:02,955 INFO org.apache.lucene.index.CheckIndex -     compound=false
2011-03-18 15:45:02,955 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
2011-03-18 15:45:02,956 INFO org.apache.lucene.index.CheckIndex -     numFiles=12
2011-03-18 15:45:02,957 INFO org.apache.lucene.index.CheckIndex -     size (MB)=5,150.848
2011-03-18 15:45:02,957 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {optimize=true,
mergeFactor=3, os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45,
source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
2011-03-18 15:45:02,957 INFO org.apache.lucene.index.CheckIndex -     has deletions [delFileName=_xdx_1f.del]
2011-03-18 15:45:04,320 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK
[2786 deleted docs]
2011-03-18 15:45:05,203 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK
[17616 fields]
2011-03-18 15:45:06,608 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK
[235 fields]
2011-03-18 15:50:18,216 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK
[54488825 terms; 524900381 terms/docs pairs; 628086112 tokens]
2011-03-18 15:50:19,315 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......ERROR
[field data are in wrong format: java.util.zip.DataFormatException: unknown compression method]
2011-03-18 15:50:19,315 INFO org.apache.lucene.index.CheckIndex - org.apache.lucene.index.CorruptIndexException:
field data are in wrong format: java.util.zip.DataFormatException: unknown compression method
2011-03-18 15:50:19,316 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:605)
2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:377)
2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:259)
2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:934)
2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.IndexReader.document(IndexReader.java:844)
2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:702)
2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:517)
2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:298)
2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - 	at de.pangaea.metadataportal.harvester.Checker.main(Checker.java:72)
2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - Caused by: java.util.zip.DataFormatException:
unknown compression method
2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - 	at java.util.zip.Inflater.inflateBytes(Native
Method)
2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - 	at java.util.zip.Inflater.inflate(Inflater.java:238)
2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - 	at java.util.zip.Inflater.inflate(Inflater.java:256)
2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.document.CompressionTools.decompress(CompressionTools.java:106)
2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:602)
2011-03-18 15:50:19,319 INFO org.apache.lucene.index.CheckIndex - 	... 8 more
2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK
[492976 total vector count; avg 0.768 term/freq vector fields per doc]
2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - FAILED
2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex -     WARNING: fixIndex()
would remove reference to this segment; full exception:
2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - java.lang.RuntimeException:
Stored Field test failed
2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:529)
2011-03-18 15:50:41,361 INFO org.apache.lucene.index.CheckIndex - 	at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:298)
2011-03-18 15:50:41,361 INFO org.apache.lucene.index.CheckIndex - 	at de.pangaea.metadataportal.harvester.Checker.main(Checker.java:72)
2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex -   2 of 8: name=_xgc docCount=2467
2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex -     compound=true
2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex -     numFiles=2
2011-03-18 15:50:42,760 INFO org.apache.lucene.index.CheckIndex -     size (MB)=12.495
2011-03-18 15:50:42,760 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {optimize=false,
mergeFactor=10, os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45,
source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
2011-03-18 15:50:42,760 INFO org.apache.lucene.index.CheckIndex -     has deletions [delFileName=_xgc_2.del]
2011-03-18 15:50:42,830 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK
[7 deleted docs]
2011-03-18 15:50:42,838 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK
[17734 fields]
2011-03-18 15:50:42,847 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK
[235 fields]
2011-03-18 15:50:46,513 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK
[260092 terms; 1081044 terms/docs pairs; 1378854 tokens]
2011-03-18 15:50:46,656 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK
[73761 total field count; avg 29.984 fields per doc]
2011-03-18 15:50:46,790 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK
[660 total vector count; avg 0.268 term/freq vector fields per doc]
2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex -   3 of 8: name=_xgb docCount=128
2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex -     compound=true
2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex -     numFiles=2
2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex -     size (MB)=3.425
2011-03-18 15:50:46,806 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {os.version=5.10,
os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64,
java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
2011-03-18 15:50:46,806 INFO org.apache.lucene.index.CheckIndex -     has deletions [delFileName=_xgb_1.del]
2011-03-18 15:50:46,843 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK
[93 deleted docs]
2011-03-18 15:50:46,850 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK
[17734 fields]
2011-03-18 15:50:46,854 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK
[235 fields]
2011-03-18 15:50:47,063 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK
[36032 terms; 396116 terms/docs pairs; 115684 tokens]
2011-03-18 15:50:47,069 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK
[4892 total field count; avg 139.771 fields per doc]
2011-03-18 15:50:47,073 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK
[35 total vector count; avg 1 term/freq vector fields per doc]
2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex -   4 of 8: name=_xgd docCount=269
2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex -     compound=true
2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex -     numFiles=1
2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex -     size (MB)=5.458
2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {os.version=5.10,
os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64,
java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
2011-03-18 15:50:47,077 INFO org.apache.lucene.index.CheckIndex -     no deletions
2011-03-18 15:50:47,112 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK
2011-03-18 15:50:47,119 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK
[17734 fields]
2011-03-18 15:50:47,123 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK
[235 fields]
2011-03-18 15:50:47,705 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK
[43919 terms; 652617 terms/docs pairs; 816813 tokens]
2011-03-18 15:50:47,725 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK
[37116 total field count; avg 137.978 fields per doc]
2011-03-18 15:50:47,754 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK
[269 total vector count; avg 1 term/freq vector fields per doc]
2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex -   5 of 8: name=_xge docCount=13
2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex -     compound=true
2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex -     numFiles=1
2011-03-18 15:50:47,758 INFO org.apache.lucene.index.CheckIndex -     size (MB)=0.923
2011-03-18 15:50:47,758 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {os.version=5.10,
os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64,
java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
2011-03-18 15:50:47,758 INFO org.apache.lucene.index.CheckIndex -     no deletions
2011-03-18 15:50:47,788 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK
2011-03-18 15:50:47,794 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK
[17734 fields]
2011-03-18 15:50:47,798 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK
[235 fields]
2011-03-18 15:50:47,920 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK
[26350 terms; 45640 terms/docs pairs; 103403 tokens]
2011-03-18 15:50:47,922 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK
[2278 total field count; avg 175.231 fields per doc]
2011-03-18 15:50:47,925 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK
[13 total vector count; avg 1 term/freq vector fields per doc]
2011-03-18 15:50:47,926 INFO org.apache.lucene.index.CheckIndex -   6 of 8: name=_xgf docCount=13
2011-03-18 15:50:47,926 INFO org.apache.lucene.index.CheckIndex -     compound=true
2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex -     numFiles=1
2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex -     size (MB)=0.387
2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {os.version=5.10,
os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64,
java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex -     no deletions
2011-03-18 15:50:47,962 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK
2011-03-18 15:50:47,968 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK
[17734 fields]
2011-03-18 15:50:47,972 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK
[235 fields]
2011-03-18 15:50:48,005 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK
[6928 terms; 12186 terms/docs pairs; 14593 tokens]
2011-03-18 15:50:48,006 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK
[736 total field count; avg 56.615 fields per doc]
2011-03-18 15:50:48,007 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK
[13 total vector count; avg 1 term/freq vector fields per doc]
2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex -   7 of 8: name=_xgg docCount=4
2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex -     compound=true
2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex -     numFiles=1
2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex -     size (MB)=0.319
2011-03-18 15:50:48,009 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {os.version=5.10,
os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64,
java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
2011-03-18 15:50:48,009 INFO org.apache.lucene.index.CheckIndex -     no deletions
2011-03-18 15:50:48,044 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK
2011-03-18 15:50:48,051 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK
[17734 fields]
2011-03-18 15:50:48,054 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK
[235 fields]
2011-03-18 15:50:48,086 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK
[5194 terms; 7559 terms/docs pairs; 8971 tokens]
2011-03-18 15:50:48,093 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK
[436 total field count; avg 109 fields per doc]
2011-03-18 15:50:48,093 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK
[4 total vector count; avg 1 term/freq vector fields per doc]
2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -   8 of 8: name=_xgh docCount=29
2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -     compound=true
2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -     hasProx=true
2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -     numFiles=1
2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -     size (MB)=1.082
2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -     diagnostics = {os.version=5.10,
os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64,
java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex -     no deletions
2011-03-18 15:50:48,130 INFO org.apache.lucene.index.CheckIndex -     test: open reader.........OK
2011-03-18 15:50:48,137 INFO org.apache.lucene.index.CheckIndex -     test: fields..............OK
[17734 fields]
2011-03-18 15:50:48,141 INFO org.apache.lucene.index.CheckIndex -     test: field norms.........OK
[235 fields]
2011-03-18 15:50:48,268 INFO org.apache.lucene.index.CheckIndex -     test: terms, freq, prox...OK
[25231 terms; 64396 terms/docs pairs; 104657 tokens]
2011-03-18 15:50:48,271 INFO org.apache.lucene.index.CheckIndex -     test: stored fields.......OK
[4172 total field count; avg 143.862 fields per doc]
2011-03-18 15:50:48,274 INFO org.apache.lucene.index.CheckIndex -     test: term vectors........OK
[29 total vector count; avg 1 term/freq vector fields per doc]
2011-03-18 15:50:48,275 INFO org.apache.lucene.index.CheckIndex - WARNING: 1 broken segments
(containing 641897 documents) detected
2011-03-18 15:50:48,275 WARN de.pangaea.metadataportal.harvester.Checker - Finished checking
of index "pangaea": Index is corrupt.
{noformat}

> MMapDirectory on chunk size boundaries broken
> ---------------------------------------------
>
>                 Key: LUCENE-2975
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2975
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Priority: Blocker
>
> When testing the 3.1-RC1 made by Yonik on the PANGAEA (www.pangaea.de) productive system
I figured out that suddenly on a large segment (about 5 GiB) some stored fiels suddenly produce
a strange deflate decompression problem (CompressionTools) although the stored fields are
no longer pre-3.0 compressed. It seems that the header of the stored field is read incorrectly
at the buffer boundary in MultiMMapDir and then FieldsReader just incorrectly detects a deflate-compressed
field (CompressionTools).
> The error occurs reproducible on CheckIndex with MMapDirectory, but not with NIODir or
SimpleDir. The FDT file of that segment is 2.6 GiB, on Solaris the chunk size is Integer.MAX_VALUE,
so we have 2 MultiMMap IndexInputs.
> Robert and me have the index ready as a tar file, we will do tests on our local machines
and hopefully solve the bug, maybe introduced by Robert's recent changes to MMap.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message