lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binkley, Peter" <Peter.Bink...@ualberta.ca>
Subject RE: capturing page numbers
Date Mon, 14 Apr 2008 16:15:55 GMT
Tricia Williams is working on this problem in
https://issues.apache.org/jira/browse/SOLR-380, and there is a patch you
can try (instructions at
https://issues.apache.org/jira/browse/SOLR-380?focusedCommentId=12541699
#action_12541699). It uses Lucene payloads to carry the page
information, and requires a current version of Lucene.

The alternative is to index each page as its own Solr document, and make
your application do the work of grouping the results from each source
pdf.

Peter

-----Original Message-----
From: AHMET ARSLAN [mailto:iorixxx@yahoo.com] 
Sent: Monday, April 14, 2008 9:58 AM
To: solr-user@lucene.apache.org
Subject: capturing page numbers

I have extracted text from .pdf files and I also inserted page numbers
of the .pdf file to the text. My document looks something like:

  <content>
   <page no="2"> ..Some Text..</page>
   <page no="3"> ..Some Text..</page>
   ..................................
   ...........................</page>
  </content>

I indexed my data using solr and I am making highlighted
queries.(hl.fragsize=200&hlsnippets=5).
Currently I am displaying just snippets to the user, however I also want
to capture the page number of the corresponding snippet. I will give a
link to jump that page in the original pdf file.

Is there a way that I can find out which page the snipped was extracted
from, by using the <page> tags?

Any ideas and help is appreciated.

Thank you...


 
________________________________________________________________________
____________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile.  Try it now.
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ


Mime
View raw message