lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davide <davidi...@libero.it>
Subject PDF documents with "MoreLikeThis" class
Date Thu, 20 Jul 2006 09:41:03 GMT
Hi,
I'm using MoreLikeThis class to find similar documents... but I'm not
sure if it is correct to pass as argument a Pdf file to
*MoreLikeThis.like()* method.

Trying to be more clear:

1) In my Lucene index I add some PDF files (I use PDFBox to extract text
and add fields to index)
2) Now I want to search similar documents from a specific PDF file and I
have the PDF file name (C:\\Example.pdf)


*My question is: What is the correct way to call like() method when I
have to find similar PDF files?*

I use:
-------------------------------------------------------
MoreLikeThis mlt = new MoreLikeThis(IndexReader);		

Query query = mlt.like(*new File("C:\\Example.pdf")*);
-------------------------------------------------------

I don't sure It is the correct way because I think if I pass a file to
the like() method It is expected to receive a text file and not a PDF
file where the text is not visible...

Do I have to extract text from PDF file and then pass an InputStream
with the text inside? Or my way is ok?

Thanks for any suggestion,
Davide.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message