lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shyam Bhaskaran <>
Subject Lucene parsing for PDF
Date Thu, 29 Dec 2005 06:40:57 GMT

I am working on a search project using Lucene and currently I am working on
parsing PDF documents. I was successful in implementing my parser using
Lucene and PDFBox. I have a doubt on how to exclude or (maybe delete) pages
from the index. I am not sure how to do this.. I mean when exactly it has to
be done.. Looking at the Lucene book it tells about removing documents using
Lucene by id or by term, but I was not successful in implementing this.. Can
anyone help me with this...


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message