lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KK <dioxide.softw...@gmail.com>
Subject Re: How to get top x[30 or 40] docs from result still alongwith the support for hit highlighting?
Date Tue, 02 Jun 2009 14:03:32 GMT
Thanks for your response.
BTW, I got it done using TopDocs in place of Hits and used this

 String content = searcher.doc(topDocs.scoreDocs[i].doc).get("content");

instead of
  String content = hits.doc(i).get("content");

Thanks,
KK

On Tue, Jun 2, 2009 at 6:52 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> First, I'd ask how sure you are that highlighting is the problem.But
> answering this should be simple, just remove the
> highlighting portion.
>
> Why can't you just stop iterating the Hits object at your limit?
> Something like:
> for (int i = 0; i < hits.length() && i < 50; i++)
>
> ?
>
> That'll tell you whether you're on the right track as far as highlighting.
>
> Best
> Erick
>
> On Tue, Jun 2, 2009 at 5:53 AM, KK <dioxide.software@gmail.com> wrote:
>
> > Hi All,
> > I've been using hit highlighting for some time for non-english search.
> > I'm indexing the fields using this,
> >
> >    Document doc = new Document();
> >    doc.add(new Field(contentField, pageContent, Field.Store.YES,
> > Field.Index.TOKENIZED));
> >    doc.add(new Field(idField, pageId, Field.Store.YES,
> > Field.Index.TOKENIZED));
> >
> > and used the following for searching bundled wiht hit highlighting,
> > # I'm using a phrase query for forming the query like this,
> >
> >        PhraseQuery phrase = new PhraseQuery();
> >        String[] termArray = queryTerms.split(" ");
> >        System.out.println("array size " + termArray.length);
> >        for (int i=0; i<termArray.length; i++) {
> >            System.out.println("adding " + termArray[i]);
> >            phrase.add(new Term("content", termArray[i]));
> >        }
> >
> > then instantiating a searcher as follows, with a given trueindexpath,
> >         String searchField = "content";
> >        IndexSearcher searcher = new IndexSearcher(trueIndexPath);
> >        QueryParser queryParser = null;
> >        try {
> >            queryParser = new QueryParser(searchField, new
> > WhitespaceAnalyzer());
> >        } catch (Exception ex) {
> >            ex.printStackTrace();
> >        }
> >
> >        Hits hits = null;
> >        try {
> >            hits = searcher.search(phrase);
> >        } catch (Exception ex) {
> >            ex.printStackTrace();
> >        }
> >
> >        hitCount = hits.length();
> >
> > and finally the following for hit highlighting, I'm putitng all the field
> > values in a hashmap called earchresult and finally to a bigger map
> > resutlMap,
> >
> > SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<span
> > class=\"highlight\">", "</span>");
> >        QueryScorer scorer = new QueryScorer(phrase);
> >        Highlighter highlighter = new Highlighter(formatter, scorer);
> >
> >        for (int i = 0; i < hits.length(); i++) {
> >            Map eachResult = new HashMap();
> >            String content = hits.doc(i).get("content");
> >            TokenStream stream = new
> > WhitespaceAnalyzer().tokenStream("content", new StringReader(content));
> >            String fragment = highlighter.getBestFragments(stream,
> content,
> > 3, "...");
> >            System.out.println(fragment);
> >            eachResult.put("id", hits.doc(i).get("id"));
> >            eachResult.put("content", fragment);
> >            resultList.add(eachResult);
> >        }
> >
> > Now I'm not able to limit the search results to a certain limit, because
> > say
> > we've 1000 results, we're not going to show all, we can limit that to
> some
> > lower value say 30 or 50 like that. Can someone let me know how to limit
> > the
> > search results keeping the other things intact i.e highlighting. I
> googled
> > and found something called TopDocs but could not figure out how to plug
> the
> > same thing in the above code fragment, a good example will be helpful.
> > As of now I thing its the highlighter thats taking the major part of the
> > time consumed for search. So we can restrict the whole thing for only the
> > part that we are going to show on the first page. Any idea on the same is
> > very welcome. Thank you.
> >
> > --KK.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message