lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davide <>
Subject PDF documents with "MoreLikeThis" class
Date Thu, 20 Jul 2006 09:41:03 GMT
I'm using MoreLikeThis class to find similar documents... but I'm not
sure if it is correct to pass as argument a Pdf file to
** method.

Trying to be more clear:

1) In my Lucene index I add some PDF files (I use PDFBox to extract text
and add fields to index)
2) Now I want to search similar documents from a specific PDF file and I
have the PDF file name (C:\\Example.pdf)

*My question is: What is the correct way to call like() method when I
have to find similar PDF files?*

I use:
MoreLikeThis mlt = new MoreLikeThis(IndexReader);		

Query query =*new File("C:\\Example.pdf")*);

I don't sure It is the correct way because I think if I pass a file to
the like() method It is expected to receive a text file and not a PDF
file where the text is not visible...

Do I have to extract text from PDF file and then pass an InputStream
with the text inside? Or my way is ok?

Thanks for any suggestion,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message