Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 58995 invoked from network); 18 Mar 2011 16:18:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Mar 2011 16:18:58 -0000 Received: (qmail 75257 invoked by uid 500); 18 Mar 2011 16:18:57 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 75197 invoked by uid 500); 18 Mar 2011 16:18:57 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 75190 invoked by uid 99); 18 Mar 2011 16:18:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Mar 2011 16:18:57 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Mar 2011 16:18:52 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 7F5253AF247 for ; Fri, 18 Mar 2011 16:18:29 +0000 (UTC) Date: Fri, 18 Mar 2011 16:18:29 +0000 (UTC) From: "Uwe Schindler (JIRA)" To: dev@lucene.apache.org Message-ID: <146779437.11951.1300465109503.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1772225733.11944.1300464749605.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] Commented: (LUCENE-2975) MMapDirectory on chunk size boundaries broken MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008492#comment-13008492 ] Uwe Schindler commented on LUCENE-2975: --------------------------------------- Here the output from CheckIndex (wrapped by log4j): {noformat} 2011-03-18 15:45:02,633 INFO de.pangaea.metadataportal.harvester.Checker - Checking index "pangaea"... 2011-03-18 15:45:02,953 INFO org.apache.lucene.index.CheckIndex - Segments file=segments_g8o numSegments=8 version=FORMAT_3_1 [Lucene 3.1] 2011-03-18 15:45:02,955 INFO org.apache.lucene.index.CheckIndex - 1 of 8: name=_xdx docCount=644683 2011-03-18 15:45:02,955 INFO org.apache.lucene.index.CheckIndex - compound=false 2011-03-18 15:45:02,955 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:45:02,956 INFO org.apache.lucene.index.CheckIndex - numFiles=12 2011-03-18 15:45:02,957 INFO org.apache.lucene.index.CheckIndex - size (MB)=5,150.848 2011-03-18 15:45:02,957 INFO org.apache.lucene.index.CheckIndex - diagnostics = {optimize=true, mergeFactor=3, os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:45:02,957 INFO org.apache.lucene.index.CheckIndex - has deletions [delFileName=_xdx_1f.del] 2011-03-18 15:45:04,320 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK [2786 deleted docs] 2011-03-18 15:45:05,203 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17616 fields] 2011-03-18 15:45:06,608 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:18,216 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [54488825 terms; 524900381 terms/docs pairs; 628086112 tokens] 2011-03-18 15:50:19,315 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......ERROR [field data are in wrong format: java.util.zip.DataFormatException: unknown compression method] 2011-03-18 15:50:19,315 INFO org.apache.lucene.index.CheckIndex - org.apache.lucene.index.CorruptIndexException: field data are in wrong format: java.util.zip.DataFormatException: unknown compression method 2011-03-18 15:50:19,316 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:605) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:377) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:259) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:934) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.IndexReader.document(IndexReader.java:844) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:702) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:517) 2011-03-18 15:50:19,317 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:298) 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - at de.pangaea.metadataportal.harvester.Checker.main(Checker.java:72) 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - Caused by: java.util.zip.DataFormatException: unknown compression method 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - at java.util.zip.Inflater.inflateBytes(Native Method) 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - at java.util.zip.Inflater.inflate(Inflater.java:238) 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - at java.util.zip.Inflater.inflate(Inflater.java:256) 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.document.CompressionTools.decompress(CompressionTools.java:106) 2011-03-18 15:50:19,318 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:602) 2011-03-18 15:50:19,319 INFO org.apache.lucene.index.CheckIndex - ... 8 more 2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [492976 total vector count; avg 0.768 term/freq vector fields per doc] 2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - FAILED 2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - WARNING: fixIndex() would remove reference to this segment; full exception: 2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - java.lang.RuntimeException: Stored Field test failed 2011-03-18 15:50:41,360 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:529) 2011-03-18 15:50:41,361 INFO org.apache.lucene.index.CheckIndex - at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:298) 2011-03-18 15:50:41,361 INFO org.apache.lucene.index.CheckIndex - at de.pangaea.metadataportal.harvester.Checker.main(Checker.java:72) 2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex - 2 of 8: name=_xgc docCount=2467 2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:42,759 INFO org.apache.lucene.index.CheckIndex - numFiles=2 2011-03-18 15:50:42,760 INFO org.apache.lucene.index.CheckIndex - size (MB)=12.495 2011-03-18 15:50:42,760 INFO org.apache.lucene.index.CheckIndex - diagnostics = {optimize=false, mergeFactor=10, os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=merge, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:42,760 INFO org.apache.lucene.index.CheckIndex - has deletions [delFileName=_xgc_2.del] 2011-03-18 15:50:42,830 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK [7 deleted docs] 2011-03-18 15:50:42,838 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:42,847 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:46,513 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [260092 terms; 1081044 terms/docs pairs; 1378854 tokens] 2011-03-18 15:50:46,656 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [73761 total field count; avg 29.984 fields per doc] 2011-03-18 15:50:46,790 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [660 total vector count; avg 0.268 term/freq vector fields per doc] 2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex - 3 of 8: name=_xgb docCount=128 2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex - numFiles=2 2011-03-18 15:50:46,805 INFO org.apache.lucene.index.CheckIndex - size (MB)=3.425 2011-03-18 15:50:46,806 INFO org.apache.lucene.index.CheckIndex - diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:46,806 INFO org.apache.lucene.index.CheckIndex - has deletions [delFileName=_xgb_1.del] 2011-03-18 15:50:46,843 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK [93 deleted docs] 2011-03-18 15:50:46,850 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:46,854 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:47,063 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [36032 terms; 396116 terms/docs pairs; 115684 tokens] 2011-03-18 15:50:47,069 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [4892 total field count; avg 139.771 fields per doc] 2011-03-18 15:50:47,073 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [35 total vector count; avg 1 term/freq vector fields per doc] 2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex - 4 of 8: name=_xgd docCount=269 2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex - numFiles=1 2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex - size (MB)=5.458 2011-03-18 15:50:47,076 INFO org.apache.lucene.index.CheckIndex - diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:47,077 INFO org.apache.lucene.index.CheckIndex - no deletions 2011-03-18 15:50:47,112 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK 2011-03-18 15:50:47,119 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:47,123 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:47,705 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [43919 terms; 652617 terms/docs pairs; 816813 tokens] 2011-03-18 15:50:47,725 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [37116 total field count; avg 137.978 fields per doc] 2011-03-18 15:50:47,754 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [269 total vector count; avg 1 term/freq vector fields per doc] 2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex - 5 of 8: name=_xge docCount=13 2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:47,757 INFO org.apache.lucene.index.CheckIndex - numFiles=1 2011-03-18 15:50:47,758 INFO org.apache.lucene.index.CheckIndex - size (MB)=0.923 2011-03-18 15:50:47,758 INFO org.apache.lucene.index.CheckIndex - diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:47,758 INFO org.apache.lucene.index.CheckIndex - no deletions 2011-03-18 15:50:47,788 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK 2011-03-18 15:50:47,794 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:47,798 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:47,920 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [26350 terms; 45640 terms/docs pairs; 103403 tokens] 2011-03-18 15:50:47,922 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [2278 total field count; avg 175.231 fields per doc] 2011-03-18 15:50:47,925 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [13 total vector count; avg 1 term/freq vector fields per doc] 2011-03-18 15:50:47,926 INFO org.apache.lucene.index.CheckIndex - 6 of 8: name=_xgf docCount=13 2011-03-18 15:50:47,926 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex - numFiles=1 2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex - size (MB)=0.387 2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex - diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:47,927 INFO org.apache.lucene.index.CheckIndex - no deletions 2011-03-18 15:50:47,962 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK 2011-03-18 15:50:47,968 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:47,972 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:48,005 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [6928 terms; 12186 terms/docs pairs; 14593 tokens] 2011-03-18 15:50:48,006 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [736 total field count; avg 56.615 fields per doc] 2011-03-18 15:50:48,007 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [13 total vector count; avg 1 term/freq vector fields per doc] 2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex - 7 of 8: name=_xgg docCount=4 2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex - numFiles=1 2011-03-18 15:50:48,008 INFO org.apache.lucene.index.CheckIndex - size (MB)=0.319 2011-03-18 15:50:48,009 INFO org.apache.lucene.index.CheckIndex - diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:48,009 INFO org.apache.lucene.index.CheckIndex - no deletions 2011-03-18 15:50:48,044 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK 2011-03-18 15:50:48,051 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:48,054 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:48,086 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [5194 terms; 7559 terms/docs pairs; 8971 tokens] 2011-03-18 15:50:48,093 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [436 total field count; avg 109 fields per doc] 2011-03-18 15:50:48,093 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [4 total vector count; avg 1 term/freq vector fields per doc] 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - 8 of 8: name=_xgh docCount=29 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - compound=true 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - hasProx=true 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - numFiles=1 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - size (MB)=1.082 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - diagnostics = {os.version=5.10, os=SunOS, lucene.version=3.1.0 1081525 - 2011-03-14 15:32:45, source=flush, os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.} 2011-03-18 15:50:48,095 INFO org.apache.lucene.index.CheckIndex - no deletions 2011-03-18 15:50:48,130 INFO org.apache.lucene.index.CheckIndex - test: open reader.........OK 2011-03-18 15:50:48,137 INFO org.apache.lucene.index.CheckIndex - test: fields..............OK [17734 fields] 2011-03-18 15:50:48,141 INFO org.apache.lucene.index.CheckIndex - test: field norms.........OK [235 fields] 2011-03-18 15:50:48,268 INFO org.apache.lucene.index.CheckIndex - test: terms, freq, prox...OK [25231 terms; 64396 terms/docs pairs; 104657 tokens] 2011-03-18 15:50:48,271 INFO org.apache.lucene.index.CheckIndex - test: stored fields.......OK [4172 total field count; avg 143.862 fields per doc] 2011-03-18 15:50:48,274 INFO org.apache.lucene.index.CheckIndex - test: term vectors........OK [29 total vector count; avg 1 term/freq vector fields per doc] 2011-03-18 15:50:48,275 INFO org.apache.lucene.index.CheckIndex - WARNING: 1 broken segments (containing 641897 documents) detected 2011-03-18 15:50:48,275 WARN de.pangaea.metadataportal.harvester.Checker - Finished checking of index "pangaea": Index is corrupt. {noformat} > MMapDirectory on chunk size boundaries broken > --------------------------------------------- > > Key: LUCENE-2975 > URL: https://issues.apache.org/jira/browse/LUCENE-2975 > Project: Lucene - Java > Issue Type: Bug > Affects Versions: 3.1 > Reporter: Uwe Schindler > Priority: Blocker > > When testing the 3.1-RC1 made by Yonik on the PANGAEA (www.pangaea.de) productive system I figured out that suddenly on a large segment (about 5 GiB) some stored fiels suddenly produce a strange deflate decompression problem (CompressionTools) although the stored fields are no longer pre-3.0 compressed. It seems that the header of the stored field is read incorrectly at the buffer boundary in MultiMMapDir and then FieldsReader just incorrectly detects a deflate-compressed field (CompressionTools). > The error occurs reproducible on CheckIndex with MMapDirectory, but not with NIODir or SimpleDir. The FDT file of that segment is 2.6 GiB, on Solaris the chunk size is Integer.MAX_VALUE, so we have 2 MultiMMap IndexInputs. > Robert and me have the index ready as a tar file, we will do tests on our local machines and hopefully solve the bug, maybe introduced by Robert's recent changes to MMap. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org