lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: How to get top x[30 or 40] docs from result still alongwith the support for hit highlighting?
Date Tue, 02 Jun 2009 13:22:20 GMT
First, I'd ask how sure you are that highlighting is the problem.But
answering this should be simple, just remove the
highlighting portion.

Why can't you just stop iterating the Hits object at your limit?
Something like:
for (int i = 0; i < hits.length() && i < 50; i++)

?

That'll tell you whether you're on the right track as far as highlighting.

Best
Erick

On Tue, Jun 2, 2009 at 5:53 AM, KK <dioxide.software@gmail.com> wrote:

> Hi All,
> I've been using hit highlighting for some time for non-english search.
> I'm indexing the fields using this,
>
>    Document doc = new Document();
>    doc.add(new Field(contentField, pageContent, Field.Store.YES,
> Field.Index.TOKENIZED));
>    doc.add(new Field(idField, pageId, Field.Store.YES,
> Field.Index.TOKENIZED));
>
> and used the following for searching bundled wiht hit highlighting,
> # I'm using a phrase query for forming the query like this,
>
>        PhraseQuery phrase = new PhraseQuery();
>        String[] termArray = queryTerms.split(" ");
>        System.out.println("array size " + termArray.length);
>        for (int i=0; i<termArray.length; i++) {
>            System.out.println("adding " + termArray[i]);
>            phrase.add(new Term("content", termArray[i]));
>        }
>
> then instantiating a searcher as follows, with a given trueindexpath,
>         String searchField = "content";
>        IndexSearcher searcher = new IndexSearcher(trueIndexPath);
>        QueryParser queryParser = null;
>        try {
>            queryParser = new QueryParser(searchField, new
> WhitespaceAnalyzer());
>        } catch (Exception ex) {
>            ex.printStackTrace();
>        }
>
>        Hits hits = null;
>        try {
>            hits = searcher.search(phrase);
>        } catch (Exception ex) {
>            ex.printStackTrace();
>        }
>
>        hitCount = hits.length();
>
> and finally the following for hit highlighting, I'm putitng all the field
> values in a hashmap called earchresult and finally to a bigger map
> resutlMap,
>
> SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<span
> class=\"highlight\">", "</span>");
>        QueryScorer scorer = new QueryScorer(phrase);
>        Highlighter highlighter = new Highlighter(formatter, scorer);
>
>        for (int i = 0; i < hits.length(); i++) {
>            Map eachResult = new HashMap();
>            String content = hits.doc(i).get("content");
>            TokenStream stream = new
> WhitespaceAnalyzer().tokenStream("content", new StringReader(content));
>            String fragment = highlighter.getBestFragments(stream, content,
> 3, "...");
>            System.out.println(fragment);
>            eachResult.put("id", hits.doc(i).get("id"));
>            eachResult.put("content", fragment);
>            resultList.add(eachResult);
>        }
>
> Now I'm not able to limit the search results to a certain limit, because
> say
> we've 1000 results, we're not going to show all, we can limit that to some
> lower value say 30 or 50 like that. Can someone let me know how to limit
> the
> search results keeping the other things intact i.e highlighting. I googled
> and found something called TopDocs but could not figure out how to plug the
> same thing in the above code fragment, a good example will be helpful.
> As of now I thing its the highlighter thats taking the major part of the
> time consumed for search. So we can restrict the whole thing for only the
> part that we are going to show on the first page. Any idea on the same is
> very welcome. Thank you.
>
> --KK.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message