lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierluca Sangiorgi <pierluca.sangio...@gmail.com>
Subject is solr the right choice for my pdf indexing purpose?
Date Mon, 11 Jun 2012 18:42:10 GMT
Hi, I'm new in Solr usage and I want to know if it's the right choice
for my problem.
I need to index pdf documents stored in filesystem and make query over them.
So i used solr with solrj as extractingrequesthandler and all works,
but I'm not interested in index pdf metadata, while in the content
text of the document.
I saw that the content is indexed entirely in a single field
("attr_content" in my case), but what i want is to index fields that
are inside the field content.

As example: I've a pdf document that contain an invoice. I need to
extract and index informations relative to recipient, price, sold
items, items description, and so on.

Is Solr the right choice for this purpose or do i need to use other
framework in addiction before posting document to Solr?

thanks in advance

Mime
View raw message