lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alixandre Santana" <alixandresant...@gmail.com>
Subject Re: Pagination
Date Mon, 02 Jul 2007 16:55:20 GMT
Mark,

The ScoreDoc[] contains only the  IDs of each lucene document. what
would be the best way of getting the entire (lucene)document ?
Should i do a new search with the ID retrivied by hpc.getScores() -
(searcher.doc(idDoc))?

thanks.

Alixandre

On 7/2/07, mark harwood <markharw00d@yahoo.co.uk> wrote:
> The Hits class is OK but can be inefficient due to re-running the query unnecessarily.
>
> The class below illustrates how to efficiently retrieve a particular page of results
and lends itself to webapps where you don't want to retain server side state (i.e. a Hits
object) for each client.
> It would make sense to put an upper limit on the "start" parameter (as Google etc do)
to avoid consuming to much RAM per client request.
>
> Cheers,
> Mark
>
> [Begin code]
>
>
>
>
> package lucene.pagination;
>
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.HitCollector;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.util.PriorityQueue;
>
> /**
>  * A HitCollector that retrieves a specific page of results
>  * @author maharwood
>  */
> public class HitPageCollector extends HitCollector
> {
>     //Demo code showing pagination
>     public static void main(String[] args) throws Exception
>     {
>         IndexSearcher s=new IndexSearcher("/indexes/nasa");
>         HitPageCollector hpc=new HitPageCollector(1,10);
>         Query q=new TermQuery(new Term("contents","sea"));
>         s.search(q,hpc);
>         ScoreDoc[] sd = hpc.getScores();
>         System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+" of "+hpc.getTotalAvailable());
>         for (int i = 0; i < sd.length; i++)
>         {
>             System.out.println(sd[i].doc);
>         }
>         s.close();
>     }
>
>     int nDocs;
>     PriorityQueue hq;
>     float minScore = 0.0f;
>     int totalHits = 0;
>     int start;
>     int maxNumHits;
>     int totalInThisPage;
>
>     public HitPageCollector(int start, int maxNumHits)
>     {
>         this.nDocs = start + maxNumHits;
>         this.start = start;
>         this.maxNumHits = maxNumHits;
>         hq = new HitQueue(nDocs);
>     }
>
>     public void collect(int doc, float score)
>     {
>         totalHits++;
>         if((hq.size()<nDocs)||(score >= minScore))
>         {
>             ScoreDoc scoreDoc = new ScoreDoc(doc,score);
>             hq.insert(scoreDoc);              // update hit queue
>             minScore = ((ScoreDoc)hq.top()).score; // reset minScore
>         }
>         totalInThisPage=hq.size();
>     }
>
>
>     public ScoreDoc[] getScores()
>     {
>         //just returns the number of hits required from the required start point
>         /*
>             So, given hits:
>                 1234567890
>             and a start of 2 + maxNumHits of 3 should return:
>                 234
>             or, given hits
>                 12
>             should return
>                 2
>             and so, on.
>         */
>         if (start <= 0)
>         {
>             throw new IllegalArgumentException("Invalid start :" + start+" - start should
be >=1");
>         }
>         int numReturned = Math.min(maxNumHits, (hq.size() - (start - 1)));
>         if (numReturned <= 0)
>         {
>             return new ScoreDoc[0];
>         }
>         ScoreDoc[] scoreDocs = new ScoreDoc[numReturned];
>         ScoreDoc scoreDoc;
>         for (int i = hq.size() - 1; i >= 0; i--) // put docs in array, working backwards
from lowest count
>         {
>             scoreDoc = (ScoreDoc) hq.pop();
>             if (i < (start - 1))
>             {
>                 break; //off the beginning of the results array
>             }
>             if (i < (scoreDocs.length + (start - 1)))
>             {
>                 scoreDocs[i - (start - 1)] = scoreDoc; //within scope of results array
>             }
>         }
>         return scoreDocs;
>     }
>
>     public int getTotalAvailable()
>     {
>         return totalHits;
>     }
>
>     public int getStart()
>     {
>         return start;
>     }
>
>     public int getEnd()
>     {
>         return start+totalInThisPage-1;
>     }
>
>     public class HitQueue extends PriorityQueue
>     {
>           public HitQueue(int size)
>           {
>             initialize(size);
>           }
>           public final boolean lessThan(Object a, Object b)
>           {
>             ScoreDoc hitA = (ScoreDoc)a;
>             ScoreDoc hitB = (ScoreDoc)b;
>             if (hitA.score == hitB.score)
>               return hitA.doc > hitB.doc;
>             else
>               return hitA.score < hitB.score;
>           }
>     }
> }
>
>
>
> ----- Original Message ----
> From: Lee Li Bin <leelb@xedge.com.sg>
> To: java-user@lucene.apache.org
> Sent: Monday, 2 July, 2007 9:59:14 AM
> Subject: RE: Pagination
>
> Hi,
>
> I still have no idea of how to get it done. Can give me some details?
>
> The web application is in jsp btw.
>
> Thanks a lot.
>
>
> Regards,
> Lee Li Bin
> -----Original Message-----
> From: Chris Lu [mailto:chris.lu@gmail.com]
> Sent: Saturday, June 30, 2007 2:21 AM
> To: java-user@lucene.apache.org
> Subject: Re: Pagination
>
> After search, you will just get an object Hits, and go through all of the
> documents by hits.doc(i).
>
> The pagination is controlled by you. Lucene is pre-caching first 200
> documents and lazy loading the rest by batch size 200.
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_m
> inutes
>
> On 6/29/07, Lee Li Bin <leelb@xedge.com.sg> wrote:
> >
> > Hi,
> >
> > does anyone knows how to do pagination on jsp page using the number of
> > hits
> > return? Or any other solutions?
> >
> >
> >
> > Do provide me with some sample coding if possible or a step by step guide.
> > Sry if I'm asking too much, I'm new to lucene.
> >
> >
> >
> > Thanks
> >
> >
> >
> >
> >
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
>
>       ___________________________________________________________
> Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
> now.
> http://uk.answers.yahoo.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Alixandre Santana

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message