lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Re: Re: A likely bug of TermsPosition.nextPosition
Date Thu, 07 Apr 2011 14:47:55 GMT
I committed another change to 3.x's CheckIndex (to also invoke
.nextPosition); can you run that and see if it can detect this?

Is there any way I can get this index?

One question: how large are your payloads, typically?

Does the exception still occur on the index fully rebuilt with 3.1?

Mike

http://blog.mikemccandless.com

2011/4/6 袁武 [GMail] <yuanwu.mail@gmail.com>:
> Dear Mike:
>
> I have run the CheckIndex of branch_3x, and the result report is listed
> below:
>
> [oracle@server bin]$ java -classpath ./ org.apache.lucene.index.CheckIndex /data/Index/URL/Generic/
 -fix
> NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...',
so assertions are enabled
> Opening index @ /data/Index/URL/Generic/
> Segments file=segments_ayup numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene 2.9]
>   1 of 1: name=_bym0 docCount=1344278
>     compound=false
>     hasProx=true
>     numFiles=11
>     size (MB)=15,979.593
>     diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.27.41-170.2.117.    
                                                        fc10.x86_64, os=Linux, mergeDocStores=true,
lucene.version=3.0-dev, source=merge                                                     
       , os.arch=amd64, java.version=1.6.0_21, java.vendor=Sun Microsystems Inc.}
>     no deletions
>     test: open reader.........OK
>     test: fields..............OK [11 fields]
>     test: field norms.........OK [2 fields]
>     test: terms, freq, prox...OK [1263500 terms; 601715072 terms/docs pairs; 1513780631
tokens]
>     test: stored fields.......OK [8063151 total field count; avg 5.998 fields per doc]
>     test: term vectors........OK [1344278 total vector count; avg 1 term/freq vector
fields per doc]
> No problems were detected with this index.
>
> 2011-04-06
> ________________________________
> 袁武 [GMail]
> ________________________________
> 发件人: Michael McCandless
> 发送时间: 2011-04-02  21:11:05
> 收件人: 袁武 [GMail]
> 抄送: java-user
> 主题: Re: Re: Re: A likely bug of TermsPosition.nextPosition
> So your test case still hits the exception in 3.1?
> If you fully rebuild you index in 3.1, does the exception still occur?
> Is there any way I could get access to this index?
> Do other terms besides "\1" have the problem?
> I just committed a change to the 3.x branch
> (https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x) --
> are you able to checkout this branch, build it, and run its CheckIndex
> on your index?  I'm also attaching the patch... and I could also send
> you the patched Lucene core JAR if you want.
> I'd like to see if that change to CheckIndex detects the problem with
> your index.
> Mike
> http://blog.mikemccandless.com
> 2011/4/2 袁武 [GMail] <yuanwu.mail@gmail.com>:
>> Dear Mike:
>>
>> The index was constructed using Lucene 2.9. After the problem occured, I
>> switched to recently release  Lucene 3.1, leaving the index untouched.
>>
>> Thanks to your help
>>
>> 2011-04-02
>> ________________________________
>> 袁武 [GMail]
>> ________________________________
>> 发件人: Michael McCandless
>> 发送时间: 2011-04-02  00:13:35
>> 收件人: 袁武 [GMail]
>> 抄送: java-user
>> 主题: Re: Re: A likely bug of TermsPosition.nextPosition
>> Hmm so it's not index corruption.  Curious.
>> Which Lucene version are you using?  Looks like it's 2.9, but not
>> 2.9.4?  Can you try 2.9.4 and see if you still hit the problem?
>> Can you post a small test case showing the problem, on your index?
>> Mike
>> http://blog.mikemccandless.com
>> 2011/4/1 袁武 [GMail] <yuanwu.mail@gmail.com>:
>>> Hi, Dear Mike:
>>>
>>> belows list the report of checkIndex. OS is Fedora Linux.
>>>
>>> [oracle@server bin]$ java -classpath ./ org.apache.lucene.index.CheckIndex /data/Index/URL/Generic/
-fix
>>>
>>> NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...',
so assertions are enabled
>>> Opening index @ /GeoGrid/data/Index/URL/Generic/
>>> Segments file=segments_ayup numSegments=1 version=FORMAT_DIAGNOSTICS [Lucene
2.9]
>>>   1 of 1: name=_bym0 docCount=1344278
>>>     compound=false
>>>     hasProx=true
>>>     numFiles=11
>>>     size (MB)=15,979.593
>>>     diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.27.41-170.2.117.fc10.x86_64,
os=Linux, mergeDocStores=true, lucene.version=3.0-dev, source=merge, os.arch=amd64, java.version=1.6.0_21,
java.vendor=Sun Microsystems Inc.}
>>>     no deletions
>>>     test: open reader.........OK
>>>     test: fields..............OK [11 fields]
>>>     test: field norms.........OK [2 fields]
>>>     test: terms, freq, prox...
>>> OK [1263500 terms; 601715072 terms/docs pairs; 1513780631 tokens]
>>>     test: stored fields.......OK [8063151 total field count; avg 5.998 fields
per doc]
>>>     test: term vectors........OK [1344278 total vector count; avg 1 term/freq
vector fields per doc]
>>> No problems were detected with this index.
>>>
>>>
>>>
>>> 2011-04-01
>>> ________________________________
>>> 袁武 [GMail]
>>> ________________________________
>>> 发件人: Michael McCandless
>>> 发送时间: 2011-04-01  17:58:08
>>> 收件人: java-user
>>> 抄送: 袁武 [GMail]
>>> 主题: Re: A likely bug of TermsPosition.nextPosition
>>> Hmm this could be from a corrupted index.
>>> What version of Lucene?  What OS/filesystem?
>>> Can you run CheckIndex and post the output?
>>> Mike
>>> http://blog.mikemccandless.com
>>> 2011/3/31 袁武 [GMail] <yuanwu.mail@gmail.com>:
>>>> Hi, dear experts:
>>>>
>>>> When IndexReader.termsPositions is used to access specific terms, the call
to TermsPosition.nextPosition success if TermsPosition.next is used. But if TermsPosition.skipTo
is used instead of TermsPosition.next, a java.lang.IllegalArgumentException will be thrown,
as bellows.
>>>>
>>>>  java.lang.IllegalArgumentException: Negative position
>>>>    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:610)
>>>>    at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
>>>>    at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:213)
>>>>    at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>>>>    at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:92)
>>>>    at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:181)
>>>>    at org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:75)
>>>>    at org.apache.lucene.index.SegmentTermPositions.skipPositions(SegmentTermPositions.java:130)
>>>>    at org.apache.lucene.index.SegmentTermPositions.lazySkip(SegmentTermPositions.java:168)
>>>>    at org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:69)
>>>>
>>>> In my further study, I found that if docid execeed 1044278, the exception
occurs everytime, for the small ones,  the exception never occur. BTW, the total number of
documents is about 1344278, and are never deleted.
>>>>
>>>> Thanks.
>>>>
>>>> 2011-04-01
>>>>
>>>>
>>>>
>>>> Yuan Wu [GMail]
>>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message