lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Caused by: java.io.IOException: read past EOF on Slave
Date Tue, 30 Sep 2008 12:32:02 GMT

I'm glad to hear this!

But: what was the root cause of the problems?  Why didn't your  
previous Input/Output implementations, which worked with Lucene 2.3,  
work win 2.4?  It's kinda spooky.

Mike

Marcelo Ochoa wrote:

> Michael:
>  I have OJVMDirectory working with 2.4rc2 code base.
>  I have refactored Output and Input streams classes according to
> latest implementation of Buffered base classes and works OK.
>  All of my tests suites runs OK also the bug in the 11g JITs compiler
> is not reproducible.
>  I'll test this new implementation of Input/Output stream classes if
> they are compatible with 2.3 code base to release a new binary version
> of Oracle Lucene integration with 2.3 distribution and once 2.4
> release is in production I'll release another binary version too.
>  Best regards, Marcelo.
>
> On Mon, Sep 29, 2008 at 6:58 AM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>>
>> Marcelo,
>>
>> Do you have any sense whether this is an issue with your  
>> integration (eg
>> your Directory implementation that stores data in BLOB columns) vs  
>> something
>> with Lucene 2.4?
>>
>> It seems odd to me that there would be a bug in your Directory
>> implementation that 2.3 didn't tickle but 2.4 did.
>>
>> Can you dump the index, as stored in BLOB columns, out into the  
>> filesystem,
>> and run CheckIndex on it?  (Or maybe run CheckIndex within Oracle).
>>
>> Mike
>>
>> Marcelo Ochoa wrote:
>>
>>> Mike:
>>> Actually there is more issues at first glance with OJVMDirectory
>>> integration.
>>> Note this, I am creating an index with two simple documents:
>>> INFO: Performing: SELECT /*+ DYNAMIC_SAMPLING(0) RULE NOCACHE(T1) */
>>> T1.rowid,F1,extractValue(F2,'/emp/name/text()')
>>> "name",extractValue(F2,'/emp/@id') "id" FROM LUCENE.T1 for update  
>>> nowait
>>> Sep 26, 2008 3:44:16 PM org.apache.lucene.indexer.TableIndexer index
>>> FINE: Document<stored/uncompressed,indexed<rowid:AAARLCAAEAAAm2QAAA>
>>> indexed,tokenized<F1:001> indexed,tokenized<name:ravi>
>>> indexed,tokenized<id:01>>
>>> Sep 26, 2008 3:44:16 PM org.apache.lucene.indexer.TableIndexer index
>>> FINE: Document<stored/uncompressed,indexed<rowid:AAARLCAAEAAAm2QAAB>
>>> indexed,tokenized<F1:003> indexed,tokenized<name:murthy>
>>> indexed,tokenized<id:03>>
>>> IW 10 [Root Thread]:   flush: segment=_0 docStoreSegment=_0
>>> docStoreOffset=0 flushDocs=true flushDeletes=true  
>>> flushDocStores=false
>>> numDocs=2
>>> numBufDelTerms=0
>>> IW 10 [Root Thread]:   index before flush
>>> IW 10 [Root Thread]: DW: flush postings as segment _0 numDocs=2
>>> IW 10 [Root Thread]: DW:   oldRAMSize=111616 newFlushedSize=166
>>> docs/MB=12,633.446 new/old=0.149%
>>> IFD [Root Thread]: now checkpoint "segments_1" [1 segments ;  
>>> isCommit =
>>> false]
>>> IW 10 [Root Thread]: LMP: findMerges: 1 segments
>>> IW 10 [Root Thread]: LMP:   level -1.0 to 2.2741578: 1 segments
>>> IW 10 [Root Thread]: CMS: now merge
>>> IW 10 [Root Thread]: CMS:   index: _0:C2->_0
>>> IW 10 [Root Thread]: CMS:   no more merges pending; now return
>>> IW 10 [Root Thread]: now flush at close
>>> IW 10 [Root Thread]:   flush: segment=null docStoreSegment=_0
>>> docStoreOffset=2 flushDocs=false flushDeletes=true  
>>> flushDocStores=true
>>> numDocs=0 numBufDelTerms=0
>>> IW 10 [Root Thread]:   index before flush _0:C2->_0
>>> IW 10 [Root Thread]:   flush shared docStore segment _0
>>> IW 10 [Root Thread]: DW: closeDocStore: 2 files to flush to  
>>> segment _0
>>> numDocs=2
>>> IW 10 [Root Thread]: CMS: now merge
>>> IW 10 [Root Thread]: CMS:   index: _0:C2->_0
>>> IW 10 [Root Thread]: CMS:   no more merges pending; now return
>>> IW 10 [Root Thread]: now call final commit()
>>> IW 10 [Root Thread]: startCommit(): start sizeInBytes=0
>>> IW 10 [Root Thread]: startCommit index=_0:C2->_0 changeCount=2
>>> IW 10 [Root Thread]: now sync _0.fnm
>>> IW 10 [Root Thread]: now sync _0.frq
>>> IW 10 [Root Thread]: now sync _0.prx
>>> IW 10 [Root Thread]: now sync _0.tis
>>> IW 10 [Root Thread]: now sync _0.tii
>>> IW 10 [Root Thread]: now sync _0.nrm
>>> IW 10 [Root Thread]: now sync _0.fdx
>>> IW 10 [Root Thread]: now sync _0.fdt
>>> IW 10 [Root Thread]: done all syncs
>>> IW 10 [Root Thread]: commit: pendingCommit != null
>>> IFD [Root Thread]: now checkpoint "segments_2" [1 segments ;  
>>> isCommit =
>>> true]
>>> IFD [Root Thread]: deleteCommits: now decRef commit "segments_1"
>>> IFD [Root Thread]: delete "segments_1"
>>> IW 10 [Root Thread]: commit: done
>>> IW 10 [Root Thread]: at close: _0:C2->_0
>>> Sep 26, 2008 3:44:16 PM org.apache.lucene.indexer.LuceneDomainIndex
>>> ODCIIndexCreate
>>> FINER: RETURN 0
>>>
>>> Index created.
>>>
>>> And when I am trying to read the index I got:
>>> INFO: Analyzer:  
>>> org.apache.lucene.analysis.WhitespaceAnalyzer@f2164127
>>> Sep 26, 2008 3:44:48 PM org.apache.lucene.indexer.LuceneDomainIndex
>>> ODCIStart
>>> INFO: qryStr: DESC(name:ravi)
>>> Sep 26, 2008 3:44:48 PM org.apache.lucene.indexer.LuceneDomainIndex
>>> ODCIStart
>>> INFO: storing cachingFilter: -1378376940 and searcher: 781713581
>>> qryStr: DESC(name:ravi)
>>> Sep 26, 2008 3:44:48 PM org.apache.lucene.indexer.LuceneDomainIndex
>>> getSort
>>> INFO: using sort: <score>,<doc>
>>> Exception in thread "Root Thread"  
>>> java.lang.IndexOutOfBoundsException:
>>> Index: 6, Size: 4
>>>      at java.util.ArrayList.RangeCheck(ArrayList.java)
>>>      at java.util.ArrayList.get(ArrayList.java)
>>>      at  
>>> org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java)
>>>      at  
>>> org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java)
>>>      at org.apache.lucene.index.TermBuffer.read(TermBuffer.java)
>>>      at
>>> org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java)
>>>      at
>>> org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java)
>>>      at  
>>> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java)
>>>      at  
>>> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java)
>>>      at  
>>> org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java)
>>>      at
>>> org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java)
>>>      at org.apache.lucene.search.Similarity.idf(Similarity.java)
>>>      at
>>> org.apache.lucene.search.TermQuery$TermWeight.<init>(TermQuery.java)
>>>      at  
>>> org.apache.lucene.search.TermQuery.createWeight(TermQuery.java)
>>>      at org.apache.lucene.search.Query.weight(Query.java)
>>>      at org.apache.lucene.search.Hits.<init>(Hits.java:85)
>>>      at org.apache.lucene.search.Searcher.search(Searcher.java)
>>>      at
>>> org 
>>> .apache 
>>> .lucene.indexer.LuceneDomainIndex.ODCIStart(LuceneDomainIndex.java)
>>>
>>> Which definetly means that something is not well saved at OJVM
>>> directory BLOB storage :(
>>> This are my files:
>>> SQL> select file_size,name from it1$t;
>>>
>>> FILE_SIZE NAME
>>> ---------- ------------------------------
>>>      10 parameters
>>>       1 updateCount
>>>      28 segments_1
>>>      20 segments.gen
>>>       8 _0.frq
>>>       8 _0.prx
>>>     103 _0.tis
>>>      35 _0.tii
>>>      12 _0.nrm
>>>      22 _0.fnm
>>>      48 _0.fdt
>>>      20 _0.fdx
>>>      62 segments_2
>>> I'll add some debugging information at my classes which save/load
>>> buffers to see how many calls and which arguments are used.
>>> Marcelo.
>>>
>>> On Fri, Sep 26, 2008 at 1:41 PM, Michael McCandless
>>> <lucene@mikemccandless.com> wrote:
>>>>
>>>> This one looks spooky!
>>>>
>>>> Is it easily repeated?  If you could print out which 2 terms you  
>>>> had
>>>> tried
>>>> to delete, and then zip up the index just before deleting those  
>>>> docs
>>>> (after
>>>> closing the writer) and send to me, I can try to understand  
>>>> what's wrong
>>>> with the index.  It looks as if the *.tis file for one of the  
>>>> segments is
>>>> truncated.
>>>>
>>>> If you capture the series of add/update/delete documents, can you  
>>>> get a
>>>> standalone Java test to show this?
>>>>
>>>> Does this test create an entirely new index?
>>>>
>>>> We did change the index format in 2.4 to use "true" UTF8 encoding  
>>>> for all
>>>> text content; not sure that this applies here (to  
>>>> BufferedIndexReader
>>>> it's
>>>> all bytes) but it may.
>>>>
>>>> BufferedIndexReader in general can do random IO, especially when  
>>>> reading
>>>> the
>>>> term dict file (*.tis), when you
>>>>
>>>> Mike
>>>>
>>>> Marcelo Ochoa wrote:
>>>>
>>>>> Michael:
>>>>> I just start testing 2.4rc2 running inside OJVM.
>>>>> I found a similar stack trace during indexing:
>>>>> IW 3 [Root Thread]:   flush: segment=_3 docStoreSegment=_3
>>>>> docStoreOffset=0 flushDocs=true flushDeletes=true  
>>>>> flushDocStores=false
>>>>> numDocs=2 numBufDelTerms=2
>>>>> IW 3 [Root Thread]:   index before flush _1:C2->_1 _2:C2->_2
>>>>> IW 3 [Root Thread]: DW: flush postings as segment _3 numDocs=2
>>>>> IW 3 [Root Thread]: DW:   oldRAMSize=111616 newFlushedSize=264
>>>>> docs/MB=7,943.758 new/old=0.237%
>>>>> IW 3 [Root Thread]: DW: apply 2 buffered deleted terms and 0  
>>>>> deleted
>>>>> docIDs and 0 deleted queries on 3 segments.
>>>>> IW 3 [Root Thread]: hit exception flushing deletes
>>>>> Exception in thread "Root Thread" java.io.IOException: read past  
>>>>> EOF
>>>>>    at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java)
>>>>>    at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .lucene 
>>>>> .store.BufferedIndexInput.readBytes(BufferedIndexInput.java)
>>>>>    at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .lucene 
>>>>> .store.BufferedIndexInput.readBytes(BufferedIndexInput.java)
>>>>>    at org.apache.lucene.index.TermBuffer.read(TermBuffer.java)
>>>>>    at
>>>>> org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java)
>>>>>    at
>>>>> org 
>>>>> .apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java)
>>>>>    at  
>>>>> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java)
>>>>>    at  
>>>>> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java)
>>>>>    at
>>>>> org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java)
>>>>>    at  
>>>>> org.apache.lucene.index.IndexReader.termDocs(IndexReader.java)
>>>>>    at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java)
>>>>>    at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java: 
>>>>> 918)
>>>>>    at
>>>>> org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java)
>>>>>    at  
>>>>> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java)
>>>>>    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java)
>>>>>    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java)
>>>>>    at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .lucene.indexer.LuceneDomainIndex.sync(LuceneDomainIndex.java: 
>>>>> 1308)
>>>>>
>>>>> I'll reinstall with a full debug info to see all line numbers in
>>>>> Lucene java code.
>>>>> Is there a list of semantic changes at BufferedIndeInput code?
>>>>> I mean it do sequential or random writes for example.
>>>>> But anyway, I just compiled with latest code and ran my test  
>>>>> suites,
>>>>> I'll investigate the problem a bit more.
>>>>> Best regards, Marcelo.
>>>>>
>>>>> On Fri, Sep 26, 2008 at 7:32 AM, Michael McCandless
>>>>> <lucene@mikemccandless.com> wrote:
>>>>>>
>>>>>> Can you describe the sequence of steps that your replication  
>>>>>> process
>>>>>> goes
>>>>>> through?
>>>>>>
>>>>>> Also, which filesystem is the index being accessed through?
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> rahul_k123 wrote:
>>>>>>
>>>>>>>
>>>>>>> First of all, thanks to all the people who helped me in  
>>>>>>> getting the
>>>>>>> lucene
>>>>>>> replication setup working and right now its live in our  
>>>>>>> production :-)
>>>>>>>
>>>>>>> Everything working fine, except that i am seeing some  
>>>>>>> exceptions on
>>>>>>> slaves.
>>>>>>>
>>>>>>> The following is the one which is occuring more often on slaves
>>>>>>>
>>>>>>> at
>>>>>>>
>>>>>>> java.util.concurrent.Executors 
>>>>>>> $RunnableAdapter.call(Executors.java:441)
>>>>>>>  at
>>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:

>>>>>>> 303)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>>>>>  at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> java.util.concurrent.ThreadPoolExecutor 
>>>>>>> $Worker.runTask(ThreadPoolExecutor.java:885)
>>>>>>>  at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> java.util.concurrent.ThreadPoolExecutor 
>>>>>>> $Worker.run(ThreadPoolExecutor.java:907)
>>>>>>>  at java.lang.Thread.run(Thread.java:619)
>>>>>>> Caused by: com.IndexingException: [SYSTEM_ERROR] Cannot access
 
>>>>>>> index
>>>>>>> [data_dir/index]: [read past EOF]
>>>>>>>  at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> com 
>>>>>>> .lucene 
>>>>>>> .LuceneSearchService.getSearchResults(LuceneSearchService.java:

>>>>>>> 964)
>>>>>>>  ... 12 more
>>>>>>> Caused by: java.io.IOException: read past EOF
>>>>>>>  at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .store.BufferedIndexInput.refill(BufferedIndexInput.java:146)
>>>>>>>  at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
>>>>>>>  at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:

>>>>>>> 66)
>>>>>>>  at  
>>>>>>> org.apache.lucene.store.IndexInput.readLong(IndexInput.java:89)
>>>>>>>  at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:

>>>>>>> 147)
>>>>>>>  at
>>>>>>> org 
>>>>>>> .apache.lucene.index.SegmentReader.document(SegmentReader.java:

>>>>>>> 659)
>>>>>>>  at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .index.MultiSegmentReader.document(MultiSegmentReader.java:257)
>>>>>>>  at
>>>>>>> org.apache.lucene.index.IndexReader.document(IndexReader.java:

>>>>>>> 525)
>>>>>>>
>>>>>>> and the second one is
>>>>>>>
>>>>>>> at
>>>>>>>
>>>>>>> java.util.concurrent.Executors 
>>>>>>> $RunnableAdapter.call(Executors.java:441)
>>>>>>>  at
>>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:

>>>>>>> 303)
>>>>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>>>>>  at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> java.util.concurrent.ThreadPoolExecutor 
>>>>>>> $Worker.runTask(ThreadPoolExecutor.java:885)
>>>>>>>  at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> java.util.concurrent.ThreadPoolExecutor 
>>>>>>> $Worker.run(ThreadPoolExecutor.java:907)
>>>>>>>  at java.lang.Thread.run(Thread.java:619)
>>>>>>> Caused by: java.lang.IllegalArgumentException: attempt to  
>>>>>>> access a
>>>>>>> deleted
>>>>>>> document
>>>>>>>  at
>>>>>>> org 
>>>>>>> .apache.lucene.index.SegmentReader.document(SegmentReader.java:

>>>>>>> 657)
>>>>>>>  at
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .index.MultiSegmentReader.document(MultiSegmentReader.java:257)
>>>>>>>  at
>>>>>>> org.apache.lucene.index.IndexReader.document(IndexReader.java:

>>>>>>> 525)
>>>>>>> This is on master index .
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Any help is appreciated
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>>
>>>>>>>
>>>>>>> http://www.nabble.com/Caused-by%3A-java.io.IOException%3A-read-past-EOF-on-Slave-tp19682684p19682684.html
>>>>>>> Sent from the Lucene - Java Users mailing list archive at  
>>>>>>> Nabble.com.
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>> help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Marcelo F. Ochoa
>>>>> http://marceloochoa.blogspot.com/
>>>>> http://marcelo.ochoa.googlepages.com/home
>>>>> ______________
>>>>> Do you Know DBPrism? Look @ DB Prism's Web Site
>>>>> http://www.dbprism.com.ar/index.html
>>>>> More info?
>>>>> Chapter 17 of the book "Programming the Oracle Database using  
>>>>> Java &
>>>>> Web Services"
>>>>> http://www.amazon.com/gp/product/1555583296/
>>>>> Chapter 21 of the book "Professional XML Databases" - Wrox Press
>>>>> http://www.amazon.com/gp/product/1861003587/
>>>>> Chapter 8 of the book "Oracle & Open Source" - O'Reilly
>>>>> http://www.oreilly.com/catalog/oracleopen/
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Marcelo F. Ochoa
>>> http://marceloochoa.blogspot.com/
>>> http://marcelo.ochoa.googlepages.com/home
>>> ______________
>>> Want to integrate Lucene and Oracle?
>>>
>>> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
>>> Is Oracle 11g REST ready?
>>> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
>
> -- 
> Marcelo F. Ochoa
> http://marceloochoa.blogspot.com/
> http://marcelo.ochoa.googlepages.com/home
> ______________
> Want to integrate Lucene and Oracle?
> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
> Is Oracle 11g REST ready?
> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message