lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bauer, Herbert S. (Scott)" <Bauer.Sc...@mayo.edu>
Subject Exception when attempting to query using ToParentBlockJoinQuery in Lucene 5.1
Date Thu, 04 Jun 2015 21:51:21 GMT
I’m working with Lucene  5.1 to try to make use of the relational structure of the block
join index and query mechanisms.  I’m querying with the following code:

IndexReader reader =  DirectoryReader.open(index);

ToParentBlockJoinIndexSearcher searcher = new ToParentBlockJoinIndexSearcher(reader);

ToParentBlockJoinCollector collector = new ToParentBlockJoinCollector(Sort.RELEVANCE, 2, true,
true);

BitDocIdSetFilter codingScheme = new BitDocIdSetCachingWrapperFilter(

                  new QueryWrapperFilter(new QueryParser("codingSchemeName", new StandardAnalyzer(new
CharArraySet( 0, true))).parse(scheme.getCodingSchemeName())));

  Query query = new QueryParser(null, new StandardAnalyzer(new CharArraySet( 0, true))).createBooleanQuery("propertyValue",
term.getTerm(), Occur.MUST);

  ToParentBlockJoinQuery termJoinQuery = new ToParentBlockJoinQuery(

    query,

    codingScheme,

    ScoreMode.Avg);

  searcher.search(termJoinQuery, collector);


To try to get parent values, but it fails on the final line with the following stack trace:


Exception in thread "main" java.lang.IllegalStateException: child query must only match non-parent
docs, but parent docID=2147483647 matched childScorer=class org.apache.lucene.search.TermScorer

at org.apache.lucene.search.join.ToParentBlockJoinQuery$BlockJoinScorer.nextDoc(ToParentBlockJoinQuery.java:330)

at org.apache.lucene.search.join.ToParentBlockJoinIndexSearcher.search(ToParentBlockJoinIndexSearcher.java:63)

at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:428)

at org.lexevs.lucene.prototype.LuceneQueryTrial.luceneToParentJoinQuery(LuceneQueryTrial.java:78)

at org.lexevs.lucene.prototype.LuceneQueryTrial.main(LuceneQueryTrial.java:327)


I build indexes up to about 36Gb using a code similar to the following:


List<Document> list = new ArrayList<Document>();

//need a static

int staticCount = count;

ParentDocObject parent = builder.generateParentDoc(cs.getCodingSchemeName(),

cs.getVersion(), cs.getURI(), "description");

if (cs.codingSchemeName.equals(CodingScheme.THESSCHEME.codingSchemeName)) {

//One per coding Scheme

int numberOfProperties = 12;

if(!thesExactMatchDone){

ChildDocObject child1 = builder.generateChildDocWithSalt(parent,SearchTerms.BLOOD.getTerm());

Document doc1 = builder.mapToDocumentExactMatch(child1);

list.add(doc1);

count++;

numberOfProperties--;

ChildDocObject child = builder.generateChildDocWithSalt(parent,SearchTerms.CHAR.term);

Document doc = builder.mapToDocumentExactMatch(child);

count++;

list.add(doc);

numberOfProperties--;

thesExactMatchDone = true;

}

while (numberOfProperties > 0) {

if(count % 547 == 0){

ChildDocObject child = builder.generateChildDocWithSalt(parent,

builder.randomTextGenerator(

builder.randomNumberGenerator(),SearchTerms.BLOOD.getTerm()));

Document doc = builder.mapToDocument(child);

list.add(doc);

count++;numberOfProperties--;

}else if(count % 233 == 0){

ChildDocObject child = builder.generateChildDocWithSalt(parent,

builder.randomTextGenerator(

builder.randomNumberGenerator(),SearchTerms.CHAR.getTerm()));

Document doc = builder.mapToDocument(child);

list.add(doc);

count++;numberOfProperties--;

}else if(count % 71 == 0){

ChildDocObject child = builder.generateChildDocWithSalt(parent,

builder.randomTextGenerator(

builder.randomNumberGenerator(),SearchTerms.ARTICLE.getTerm()));

Document doc = builder.mapToDocument(child);

list.add(doc);

count++;numberOfProperties--;

}else if(count % 2237 == 0){

ChildDocObject child = builder.generateChildDocWithSalt(parent,

builder.randomTextGenerator(

builder.randomNumberGenerator(),SearchTerms.LUNG_CANCER.getTerm()));

Document doc = builder.mapToDocument(child);

list.add(doc);

count++;numberOfProperties--;

}else if(count % 5077 == 0){

ChildDocObject child = builder.generateChildDocWithSalt(parent,

builder.randomTextGenerator(

builder.randomNumberGenerator(),SearchTerms.LIVER_CARCINOMA.getTerm()));

Document doc = builder.mapToDocument(child);

list.add(doc);

count++;numberOfProperties--;

}else if(count % 2371 == 0){

ChildDocObject child = builder.generateChildDocWithSalt(parent,

builder.randomTextGeneratorStartsWith(

builder.randomNumberGenerator(),SearchTerms.BLOOD.getTerm()));

Document doc = builder.mapToDocumentExactMatch(child);

list.add(doc);

count++;numberOfProperties--;

}else if(count % 79 == 0){

ChildDocObject child = builder.generateChildDocWithSalt(parent,

builder.randomTextGeneratorStartsWith(

builder.randomNumberGenerator(),SearchTerms.ARTICLE.getTerm()));

Document doc = builder.mapToDocumentExactMatch(child);

list.add(doc);

count++;numberOfProperties--;

}else if(count % 3581 == 0){

ChildDocObject child = builder.generateChildDocWithSalt(parent,

builder.randomTextGeneratorStartsWith(

builder.randomNumberGenerator(),SearchTerms.LUNG_CANCER.getTerm()));

Document doc = builder.mapToDocumentExactMatch(child);

list.add(doc);

count++;numberOfProperties--;

}else if(count % 23 == 0){

ChildDocObject child = builder.generateChildDocWithSalt(parent,

builder.randomTextGeneratorStartsWith(

builder.randomNumberGenerator(),SearchTerms.CHAR.getTerm()));

Document doc = builder.mapToDocumentExactMatch(child);

list.add(doc);

count++;numberOfProperties--;

} else {

ChildDocObject child = builder.generateChildDoc(parent);

Document doc = builder.mapToDocument(child);

list.add(doc);

count++;

numberOfProperties--;

}

}

}

Document par = builder.mapToDocument(parent);

list.add(par);

writer.addDocuments(list);

}


Which works pretty well until I scale it up using several instances of this.  When the nextChildDoc
document retrieved gets to id 5874902 the line in ToParentBlockJoinQuery


        parentDoc = parentBits.nextSetBit(nextChildDoc);


Gives the value  2147483647 to the parentDoc, which is not a document id in my index if I
understand lucene and Luke correctly since my index has only 42716877 documents.

Can someone shed some light on this exception?


Thanks,

Scott Bauer





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message