lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerven Bolleman <jerven.bolle...@isb-sib.ch>
Subject Unsupported operation in TermDocs.next() when migrating from 2.4 to 2.9
Date Tue, 29 Jun 2010 09:24:33 GMT
Hi All,

I am finally having some time to upgrade our lucene from the 2.4 series 
to the 2.9 series. And I am having a problem that while everything 
compiles great I am getting a new UnsupportedOperationException.


java.lang.UnsupportedOperationException
	at 
org.apache.lucene.index.AbstractAllTermDocs.seek(AbstractAllTermDocs.java:42)
	at 
org.apache.lucene.index.DirectoryReader$MultiTermDocs.termDocs(DirectoryReader.java:1186)
	at 
org.apache.lucene.index.DirectoryReader$MultiTermDocs.next(DirectoryReader.java:1118)
	at 
org.expasy.core.index.SubQueryFilter.fastForLargeResultSets(SubQueryFilter.java:129)

I copied in the code that calls this. See an explanation of what it 
tries to achieve underneath.

private void fastForLargeResultSets(String foreignField, BitSet bits, 
TermDocs docs, TermDocs foreignDocs, IndexReader foreignReader, BitSet 
queryResults)
	throws IOException
{
	int start = queryResults.nextSetBit(0);
	TermEnum foreignEnum = foreignReader.terms(new Term(foreignField, ""));
	while (foreignEnum.next())
		{
		Term term = foreignEnum.term();
		if (term == null || !term.field().equals(foreignField))
			break;
		if (!term.text().equals("not_null"))
		{
			foreignDocs.skipTo(start);
			foreignDocs.seek(term);
//Source of exception in my code
			while (foreignDocs.next())
			{
				int doc = foreignDocs.doc();
				if (queryResults.get(doc))
				{
					foreignDocs.skipTo(doc);
					if (term != null && term.text() != null)
						buffer.add(term.text());
				}
// Use a buffer to avoid jumping around on disk to much.
//
				if (buffer.size() >= BUFFERSIZE)
				{
					emptyBuffer(buffer, bits, docs);
				}
			}
		}
	}

	if (!buffer.isEmpty())
	{
		emptyBuffer(buffer, bits, docs);
	}
}

The purpose of this code is to fill a bitset as a filter. The filter is 
used to find documents in index a who have a linking key value to them 
in index b.

While resource intensive this code path was quite fast for when you have 
multimillion documents in index b pointing to multimillion documents in 
index b.

i.e. it creates a "join" between two queries on different indexes.

for a live example
http://www.uniprot.org/uniprot/?query=citation%3A%28author%3Afink%29
this a search for fink in the field author in the "citation" index.
For each document in the "citation" index that matches term "fink" in 
the field "author" retrieve the terms that contain an uniquely 
identifying key value for documents in the "uniprot" index. Generate a 
bitset to use in filtering the documents in the "uniprot" index (done in 
the emptybuffer method).

Is this a bug? and does anyone have ideas for an effective (maybe 
superior) work around?

Regards and thanks for a great project!

Jerven


Mime
View raw message