lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wettin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-550) InstantiatedIndex - faster but memory consuming index
Date Fri, 17 Aug 2007 20:24:32 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520659
] 

Karl Wettin commented on LUCENE-550:
------------------------------------

I just found a bug that I can not explain. 

While scoring this one specific phrase query in this one specific corpus of mine, the scorer
calls TermPositions.nextPosition() more than TermPositions.freq() times. Never seen this error
before, and it does not do this when running against a Directory. TestIndicesEquals does however
pass, so it must be me that does not reset currentTermPosition counter, or something along
that way.

I have been debugging for hours and hours in the scorer code in order to understand the difference
between II and Directory is, but I can't figure it out. Completely lost in this (read: any)
scorer code.

It sure is a show stopper if it sometimes does not work, so I'll try to find the bug. This
is the first time I've seen it though. I mean, I do use phrase queries in other places in
conjunction with this store, and that makes it even more strange.

I have tried to come up with an isolated test case, but I can't. I can however pass the corpus
and code that produce this error to some specific person, but I'm afraid I can't post it here.


There is also a minor TermFreqVector bug that throws a NPE, solved in the next patch.

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 12
	at org.apache.lucene.store.instantiated.InstantiatedTermPositions.nextPosition(InstantiatedTermPositions.java:70)
	at org.apache.lucene.search.PhrasePositions.nextPosition(PhrasePositions.java:76)
	at org.apache.lucene.search.PhrasePositions.firstPosition(PhrasePositions.java:65)
	at org.apache.lucene.search.ExactPhraseScorer.phraseFreq(ExactPhraseScorer.java:34)
	at org.apache.lucene.search.PhraseScorer.doNext(PhraseScorer.java:94)
	at org.apache.lucene.search.PhraseScorer.next(PhraseScorer.java:81)
	at org.apache.lucene.search.DisjunctionSumScorer.initScorerDocQueue(DisjunctionSumScorer.java:105)
	at org.apache.lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:144)
	at org.apache.lucene.search.BooleanScorer2.next(BooleanScorer2.java:360)
	at org.apache.lucene.search.DisjunctionSumScorer.initScorerDocQueue(DisjunctionSumScorer.java:105)
	at org.apache.lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:144)
	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:327)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:146)
	at org.apache.lucene.search.Searcher.search(Searcher.java:118)
	at org.apache.lucene.search.Searcher.search(Searcher.java:97)

> InstantiatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
>                 Key: LUCENE-550
>                 URL: https://issues.apache.org/jira/browse/LUCENE-550
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>            Reporter: Karl Wettin
>            Assignee: Grant Ingersoll
>         Attachments: HitCollectionBench.jpg, lucene-550.jpg, LUCENE-550_20070804_no_core_changes.txt,
LUCENE-550_20070808_no_core_changes.txt, test-reports.zip, trunk.diff.bz2, trunk.diff.bz2,
trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2,
trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2
>
>
> An non file centrinc all in memory index. Consumes some 2x the memory of a RAMDirectory
(in a term satured index) but is between 3x-60x faster depending on application and how one
counts. Average query is about 8x faster. IndexWriter and IndexModifier have been realized
in InterfaceIndexWriter and InterfaceIndexModifier. 
> InstantiatedIndex is wrapped in a new top layer index facade (class Index) that comes
with factory methods for writers, readers and searchers for unison index handeling. There
are decorators with notification handling that can be used for automatically syncronizing
searchers on updates, et.c. 
> Index also comes with FS/RAMDirectory implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message