lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Tignor <ctig...@thinkmap.com>
Subject Getting Payload data from BooleanQuery results
Date Thu, 24 Sep 2009 16:49:21 GMT
Hello,

I have indexed documents with two fields, "ARTICLE" for an article of text
and "PUB_DATE" for the article's publication date.

Given a specific single word, I want to search my index for all documents
that contain this word within the last two weeks, and have them sorted by
date:

TermQuery tq = new TermQuery(new Term("ARTICLE",mySearchWord));
Calendar cal = Calendar.getInstance();
// Date of last two weeks
cal.add(Calendar.DATE, -14);
ConstantScoreRangeQuery csrq = new
ConstantScoreRangeQuery("PUB_DATE",DateTools.dateToString(cal.getTime(),DateTools.Resolution.HOUR),null,true,true);
BooleanQuery bq = new BooleanQuery();
bq.add(tq, BooleanClause.Occur.MUST);
bq.add(csrq, BooleanClause.Occur.MUST);
TopFieldDocs docs = searcher.search(bq, null, 10, new Sort("PUB_DATE"));

My goal now is to search through the recovered documents an obtain the Term
instances (each term position) within each document and retrieve the payload
data associated with each Term instance.

The trouble I am having is in getting access to the TermPositions following
such a query.
If I only needed to query on a single term (without my date restriction), I
could easily do (and have done) this:

SpanTermQuery query = new SpanTermQuery(new Term("ARTICLE",mySearchWord));
TermSpans spans = (TermSpans) query.getSpans(indexReader);
tp = spans.getPositions();

and then iterate over each position calling

tp.getPayload(dataBuffer,0);

for example.

But alas, I cannot seem to get access to any TermPositions from my above
BooleanQuery.
I have looked into the contributed SpanExtractorClass but
ConstantScoreRangeQuery seems unsupported
and I am at a loos as to how to best use Spans here.

Any help appreciated,

C>T>

-- 
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message