lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: Pagination
Date Mon, 02 Jul 2007 10:17:32 GMT
The Hits class is OK but can be inefficient due to re-running the query unnecessarily.

The class below illustrates how to efficiently retrieve a particular page of results and lends
itself to webapps where you don't want to retain server side state (i.e. a Hits object) for
each client.
It would make sense to put an upper limit on the "start" parameter (as Google etc do) to avoid
consuming to much RAM per client request.

Cheers,
Mark

[Begin code]




package lucene.pagination;

import org.apache.lucene.index.Term;
import org.apache.lucene.search.HitCollector;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.util.PriorityQueue;

/**
 * A HitCollector that retrieves a specific page of results 
 * @author maharwood
 */
public class HitPageCollector extends HitCollector
{
    //Demo code showing pagination
    public static void main(String[] args) throws Exception
    {
        IndexSearcher s=new IndexSearcher("/indexes/nasa");
        HitPageCollector hpc=new HitPageCollector(1,10);
        Query q=new TermQuery(new Term("contents","sea"));
        s.search(q,hpc);
        ScoreDoc[] sd = hpc.getScores();
        System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+" of "+hpc.getTotalAvailable());
        for (int i = 0; i < sd.length; i++)
        {
            System.out.println(sd[i].doc);
        }
        s.close();
    }

    int nDocs;
    PriorityQueue hq;
    float minScore = 0.0f;
    int totalHits = 0;
    int start;
    int maxNumHits;
    int totalInThisPage;

    public HitPageCollector(int start, int maxNumHits)
    {
        this.nDocs = start + maxNumHits;
        this.start = start;
        this.maxNumHits = maxNumHits;
        hq = new HitQueue(nDocs);
    }

    public void collect(int doc, float score)
    {
        totalHits++;
        if((hq.size()<nDocs)||(score >= minScore))
        {
            ScoreDoc scoreDoc = new ScoreDoc(doc,score);
            hq.insert(scoreDoc);              // update hit queue
            minScore = ((ScoreDoc)hq.top()).score; // reset minScore
        }
        totalInThisPage=hq.size();
    }
    

    public ScoreDoc[] getScores()
    {
        //just returns the number of hits required from the required start point
        /*
            So, given hits:
                1234567890
            and a start of 2 + maxNumHits of 3 should return:
                234
            or, given hits
                12
            should return
                2
            and so, on.
        */
        if (start <= 0)
        {
            throw new IllegalArgumentException("Invalid start :" + start+" - start should
be >=1");
        }
        int numReturned = Math.min(maxNumHits, (hq.size() - (start - 1)));
        if (numReturned <= 0)
        {
            return new ScoreDoc[0];
        }
        ScoreDoc[] scoreDocs = new ScoreDoc[numReturned];
        ScoreDoc scoreDoc;
        for (int i = hq.size() - 1; i >= 0; i--) // put docs in array, working backwards
from lowest count
        {
            scoreDoc = (ScoreDoc) hq.pop();
            if (i < (start - 1))
            {
                break; //off the beginning of the results array
            }
            if (i < (scoreDocs.length + (start - 1)))
            {
                scoreDocs[i - (start - 1)] = scoreDoc; //within scope of results array
            }
        }
        return scoreDocs;
    }

    public int getTotalAvailable()
    {
        return totalHits;
    }

    public int getStart()
    {
        return start;
    }
    
    public int getEnd()
    {
        return start+totalInThisPage-1;
    }
    
    public class HitQueue extends PriorityQueue 
    {
          public HitQueue(int size) 
          {
            initialize(size);
          }
          public final boolean lessThan(Object a, Object b) 
          {
            ScoreDoc hitA = (ScoreDoc)a;
            ScoreDoc hitB = (ScoreDoc)b;
            if (hitA.score == hitB.score)
              return hitA.doc > hitB.doc;
            else
              return hitA.score < hitB.score;
          }
    }
}



----- Original Message ----
From: Lee Li Bin <leelb@xedge.com.sg>
To: java-user@lucene.apache.org
Sent: Monday, 2 July, 2007 9:59:14 AM
Subject: RE: Pagination

Hi,

I still have no idea of how to get it done. Can give me some details?

The web application is in jsp btw.

Thanks a lot.

 
Regards,
Lee Li Bin
-----Original Message-----
From: Chris Lu [mailto:chris.lu@gmail.com] 
Sent: Saturday, June 30, 2007 2:21 AM
To: java-user@lucene.apache.org
Subject: Re: Pagination

After search, you will just get an object Hits, and go through all of the
documents by hits.doc(i).

The pagination is controlled by you. Lucene is pre-caching first 200
documents and lazy loading the rest by batch size 200.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_m
inutes

On 6/29/07, Lee Li Bin <leelb@xedge.com.sg> wrote:
>
> Hi,
>
> does anyone knows how to do pagination on jsp page using the number of
> hits
> return? Or any other solutions?
>
>
>
> Do provide me with some sample coding if possible or a step by step guide.
> Sry if I'm asking too much, I'm new to lucene.
>
>
>
> Thanks
>
>
>
>
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org






      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message