lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KK <dioxide.softw...@gmail.com>
Subject How to get top x[30 or 40] docs from result still alongwith the support for hit highlighting?
Date Tue, 02 Jun 2009 09:53:29 GMT
Hi All,
I've been using hit highlighting for some time for non-english search.
I'm indexing the fields using this,

    Document doc = new Document();
    doc.add(new Field(contentField, pageContent, Field.Store.YES,
Field.Index.TOKENIZED));
    doc.add(new Field(idField, pageId, Field.Store.YES,
Field.Index.TOKENIZED));

and used the following for searching bundled wiht hit highlighting,
# I'm using a phrase query for forming the query like this,

        PhraseQuery phrase = new PhraseQuery();
        String[] termArray = queryTerms.split(" ");
        System.out.println("array size " + termArray.length);
        for (int i=0; i<termArray.length; i++) {
            System.out.println("adding " + termArray[i]);
            phrase.add(new Term("content", termArray[i]));
        }

then instantiating a searcher as follows, with a given trueindexpath,
         String searchField = "content";
        IndexSearcher searcher = new IndexSearcher(trueIndexPath);
        QueryParser queryParser = null;
        try {
            queryParser = new QueryParser(searchField, new
WhitespaceAnalyzer());
        } catch (Exception ex) {
            ex.printStackTrace();
        }

        Hits hits = null;
        try {
            hits = searcher.search(phrase);
        } catch (Exception ex) {
            ex.printStackTrace();
        }

        hitCount = hits.length();

and finally the following for hit highlighting, I'm putitng all the field
values in a hashmap called earchresult and finally to a bigger map
resutlMap,

SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<span
class=\"highlight\">", "</span>");
        QueryScorer scorer = new QueryScorer(phrase);
        Highlighter highlighter = new Highlighter(formatter, scorer);

        for (int i = 0; i < hits.length(); i++) {
            Map eachResult = new HashMap();
            String content = hits.doc(i).get("content");
            TokenStream stream = new
WhitespaceAnalyzer().tokenStream("content", new StringReader(content));
            String fragment = highlighter.getBestFragments(stream, content,
3, "...");
            System.out.println(fragment);
            eachResult.put("id", hits.doc(i).get("id"));
            eachResult.put("content", fragment);
            resultList.add(eachResult);
        }

Now I'm not able to limit the search results to a certain limit, because say
we've 1000 results, we're not going to show all, we can limit that to some
lower value say 30 or 50 like that. Can someone let me know how to limit the
search results keeping the other things intact i.e highlighting. I googled
and found something called TopDocs but could not figure out how to plug the
same thing in the above code fragment, a good example will be helpful.
As of now I thing its the highlighter thats taking the major part of the
time consumed for search. So we can restrict the whole thing for only the
part that we are going to show on the first page. Any idea on the same is
very welcome. Thank you.

--KK.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message