lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Hrytsyuk <ihryts...@softserveinc.com>
Subject [Lucene-3.6] OutOfMemory for SpanNearQuery with many unique terms in index
Date Wed, 25 Jul 2012 17:29:10 GMT
Environment:
- Lucene-3.6 with Solr-3.6 
- JBoss-5.1.0 
- jdk1.6.0_26_x32
- 1500Mb for heap

Our Requirements:
We need to implement next feature:

     1. Each document has multiple offer windows (startDate and endDate
        paired) which are not consecutive
     2. When we query for documents we have to make sure that we only
        return those documents that have at least on offer window that
        applies to today’s date.


For example, a document would have offer windows like the following:
startDate = 2012-03-17
endDate = 2012-03-25
startDate = 2012-05-01
endDate = 2012-05-10

If the user searched on 2012-04-01, would not get a hit on the doc, but
if searched on 2012-05-02 - would.

Implementation:
We created custom query parser in Solr with next code:
"
final String date = solrParams.get("dateParameterInURL");

final SchemaField startField = schema.getField("startDate");
final MultiTermQuery startRangeQuery = (MultiTermQuery)
startField.getType()
		.getRangeQuery(extendedDismaxQParser, startField, null, date true,
true);

final SchemaField endField = schema.getField("endDate");
final MultiTermQuery endRangeQuery = (MultiTermQuery) endField.getType()
		.getRangeQuery(extendedDismaxQParser, endField, date null, true,
true);

final SpanMultiTermQueryWrapper startRangeSpanQuery = new
SpanMultiTermQueryWrapper<MultiTermQuery>(startRangeQuery);
final SpanMultiTermQueryWrapper endRangeSpanQuery = new
SpanMultiTermQueryWrapper<MultiTermQuery>(endRangeQuery);

final FieldMaskingSpanQuery maskingSpanQuery = new
FieldMaskingSpanQuery(endRangeSpanQuery,
startRangeSpanQuery.getField());
final Query spanNearQuery = new SpanNearQuery(new
SpanQuery[]{ startRangeSpanQuery, maskingSpanQuery }, -1, false);
return spanNearQuery;
"

Problem:
Our test index contains 80000 documents. Each document has 5
startDate/endDate pairs.
When all documents have the same startDate/endDate pairs - everything
works like a charm.
But when each of startDate/endDate field (800,000 fields in total) is
unique - we get OutOfMemoryError with next stack:
	at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:267)
	at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:40)
	at org.apache.lucene.store.DataInput.readVInt(DataInput.java:107)
	at
org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:217)
	at
org.apache.lucene.index.SegmentTermPositions.readDeltaPosition(SegmentTermPositions.java:76)
	at
org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:72)
	at org.apache.lucene.search.spans.TermSpans.next(TermSpans.java:57)
	at org.apache.lucene.search.spans.SpanOrQuery
$1.initSpanQueue(SpanOrQuery.java:177)
	at org.apache.lucene.search.spans.SpanOrQuery
$1.next(SpanOrQuery.java:188)
	at org.apache.lucene.search.spans.NearSpansUnordered
$SpansCell.next(NearSpansUnordered.java:84)
	at
org.apache.lucene.search.spans.NearSpansUnordered.initList(NearSpansUnordered.java:277)
	at
org.apache.lucene.search.spans.NearSpansUnordered.next(NearSpansUnordered.java:155)
	at org.apache.lucene.search.spans.SpanScorer.<init>(SpanScorer.java:46)
	at org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:79)
	at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:577)
	at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:364)
	at
org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:863)
	at
org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:635)
	at
org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:769)
	at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1209)
	at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1176)
	at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:375)
	at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:394)
	at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
	at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
	at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
	at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
	at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
	at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

Can anyone explain why this happens? And how to fix it?

Thank you in advance, Ivan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message