lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Aristov <>
Subject Re: Lucene: If I have picture, table, or somthing others in the PDF
Date Sun, 20 Feb 2011 06:35:39 GMT
your search engine would extract text content from a PDF file and all
markup, pictures etc would be lost. and so when you search you would get
only text, highlighted or not.

Best Regards
Alexander Aristov

On 18 February 2011 21:29, Gong Li <> wrote:

> Hi,
> I am developing a PDF search engine, locally. I have used API: pdfbox and
> lucene.
> I must show the user the PDF page containing the keywords(if highlight,
> it's
> great) and sort by relevance(default in lucene). HOW???
> Maybe, if there are some pictures in the PDF page, how could it display to
> the user after index and search the extracted text???
> Thanks

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message