lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binkley, Peter" <>
Subject RE: capturing page numbers
Date Mon, 14 Apr 2008 16:15:55 GMT
Tricia Williams is working on this problem in, and there is a patch you
can try (instructions at
#action_12541699). It uses Lucene payloads to carry the page
information, and requires a current version of Lucene.

The alternative is to index each page as its own Solr document, and make
your application do the work of grouping the results from each source


-----Original Message-----
Sent: Monday, April 14, 2008 9:58 AM
Subject: capturing page numbers

I have extracted text from .pdf files and I also inserted page numbers
of the .pdf file to the text. My document looks something like:

   <page no="2"> ..Some Text..</page>
   <page no="3"> ..Some Text..</page>

I indexed my data using solr and I am making highlighted
Currently I am displaying just snippets to the user, however I also want
to capture the page number of the corresponding snippet. I will give a
link to jump that page in the original pdf file.

Is there a way that I can find out which page the snipped was extracted
from, by using the <page> tags?

Any ideas and help is appreciated.

Thank you...

Be a better friend, newshound, and
know-it-all with Yahoo! Mobile.  Try it now.;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

View raw message