lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bratislav Stojanovic <bratislav1...@gmail.com>
Subject Re: Getting documents from suggestions
Date Fri, 22 Mar 2013 11:49:22 GMT
OK, I've played with all this solutions and basically only one gave me
satisfying results. Using build()
with TermFreqPayload argument gave me horrible performance, because it
takes more than 5 mins
to iterate through all Terms in the index and to filter them based on the
doc id. Not sure if this nested
loop can be further optimized, but my index is barely 30MB and I have
around 300K terms.

It turns out that Jack Krupansky's answer was way to go. I build
AnalyzingSuggester using
LuceneDictionary which is really fast and then filter suggestions further
by issuing a query to
the index. Here's the code in case anyone is interested :

// generate AnalyzingSuggestions
// use existing analyzer
this.as = new AnalyzingSuggester(analyzer);

as.load(new FileInputStream(new File(suggsPath)));
if (as.sizeInBytes() == 0) {
    logger.info("Building analyzer suggester...");
     as.build(new LuceneDictionary(reader, "contents"));
     as.store(new FileOutputStream(new File(suggsPath)));
}

--------------------------------------------------------

// now, in servlet, for each suggestion fire a query
List<LookupResult> suggs = as.lookup(q, false, 10); // do not pass true as
a second param!
logger.info("Found "+suggs.size()+" suggestions");
List<LookupResult> filtered = new ArrayList<LookupResult>();
for (LookupResult sug : suggs) {
    if (searchSugg(sug.key.toString(), uid)) {
filtered.add(sug);
    }
}
logger.info("Found "+filtered.size()+" filtered suggestions");

-----------------------------------------------------------------

public boolean searchSugg(String q, long uid) {
...
if (q == null) {
logger.warn("Query is null");
           return false;
}
if (q.isEmpty()) {
  logger.warn("Query is empty");
return false;
}
 Date start = new Date();
 String qStr = q.trim();
//Query query = parser.parse(qStr);
       BooleanQuery query = new BooleanQuery();
      query.add(new BooleanClause(new TermQuery(new Term("contents",
qStr)), BooleanClause.Occur.MUST));
      BytesRef ref = new BytesRef();
      NumericUtils.longToPrefixCoded(uid, 0, ref);
      query.add(new BooleanClause(new TermQuery(new Term("userid", ref)),
BooleanClause.Occur.MUST));
      logger.info("Searching for: " + query.toString("contents"));

      TopDocs results = searcher.search(query, 1);
      ScoreDoc[] hits = results.scoreDocs;

      int numTotalHits = results.totalHits;
      logger.info(numTotalHits + " total matching documents");
       Date end = new Date();
      long qTime = end.getTime()-start.getTime();
      logger.info("Search took "+qTime+" ms");

      return numTotalHits > 0;

...


On Sat, Mar 16, 2013 at 8:54 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Sat, Mar 16, 2013 at 7:47 AM, Bratislav Stojanovic
> <bratislav1983@gmail.com> wrote:
> > Hey Mike,
> >
> > Is this what I should be looking at?
> >
> https://builds.apache.org/job/Lucene-Artifacts-trunk/javadoc/suggest/org/apache/lucene/search/suggest/analyzing/package-summary.html
> >
> > Not sure how to call build(), i.e. what to pass as a parameter...Any
> > examples?
> > Where to specify my payload (which is "id" long field from the index)?
>
> build() takes a TermFreqPayload iterator, which iterates over the
> weight/input text/payload that you provide.
>
> Have a look at AnalyzingSuggesterTest, eg testKeywordWithPayloads.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Bratislav Stojanovic, M.Sc.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message