lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Indexing PDF-Files using Solr Cell
Date Mon, 17 Sep 2012 14:22:40 GMT
Add the &fmap.content=your-stored-field to the URL.

Or if your schema doesn't already have a "content" field, add one that is 
"stored" and it will automatically be used.

-- Jack Krupansky

-----Original Message----- 
From: Alexander Troost
Sent: Monday, September 17, 2012 1:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing PDF-Files using Solr Cell

Thank you for your response.

I'm writing my Bachelor-Thesis about Solr and my company doesn't want me to
use a beta-version.

I dont want to be annoying, but "how" do i direct the content to a stored
filed and so on... in the URL i use for the HTTP-POST? In a config-file?





2012/9/17 Jack Krupansky <jack@basetechnology.com>

> Be sure to direct the "content" to a "stored" field (such as "content")
> which you can add to your "fl" field list to return. Then use a copyField
> to copy that stored field to the  "text" field for searching.
>
> Again, this is all simplified in Solr 4.0-BETA.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Alexander Troost
> Sent: Sunday, September 16, 2012 11:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Indexing PDF-Files using Solr Cell
>
>
> Hi, first of all: Thank you for that quick response!
>
> But i am not sure if i am doing this right.
>
> For my point of view the command now has to look like:
>
> curl "
> http://localhost:8983/solr/**update/extract?literal.id=**
> doc11&literal.filename=markus&**fmap.content=text&commit=true<http://localhost:8983/solr/update/extract?literal.id=doc11&literal.filename=markus&fmap.content=text&commit=true>
> "
> -F "myfile=@markus.pdf"
>
> When I am seaching now for Text in the PDF, i am getting the result:
>
> <result name="response" numFound="1" start="0">
> <doc>
> <str name="author">A28240</str>
> <arr name="content_type"><str>**application/pdf</str></arr>
> <str name="id">doc11</str>
> <date name="last_modified">2012-09-**17T03:49:39Z</date>
> </doc>
> </result>
>
> SORRY for being such a newbie and sorry for my bad english. It's 6 AM here
> and i spend the whole night at the computer :-)
>
> Greetz
>
> A
>
>
> 2012/9/17 Jack Krupansky <jack@basetechnology.com>
>
>  The content will be sent to the "content" field, which you can redirect
>> using the &fmap.content=some-field request parameter. You need to
>> explicitly set the file name field yourself, using the
>> &literal.your-file-name-field=****file-name request parameter.
>>
>>
>> Also, if using Solr 4.0-BETA, you can simply use the SimplePostTool
>> (post.jar) to send documents to SolrCell, which will automatically take
>> care of these extra steps.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Alexander Troost
>> Sent: Sunday, September 16, 2012 10:16 PM
>> To: solr-user@lucene.apache.org
>> Subject: Indexing PDF-Files using Solr Cell
>>
>>
>> Hello *,
>>
>> I've got a problem indexing and searching PDF-Files.
>>
>> It seems like Solr doenst index the name of the file.
>>
>> In returning i only get
>> <result name="response" numFound="1" start="0"><doc><str
>> name="author">A28240</str><arr
>> name="content_type"><str>****application/pdf</str></arr><****str
>>
>> name="id">doc5</str><date
>> name="last_modified">2012-09-****17T01:45:39Z</date></doc></****result>
>>
>>
>> He founds the right document, but no content or title is displayed in the
>> XML-Response. Where do i config that?
>>
>> I index my documents (right now) via curl
>>
>> e.g.:
>>
>> curl 
>> "http://localhost:8983/solr/****update/extract?literal.id=**<http://localhost:8983/solr/**update/extract?literal.id=**>
>> doc7&commit=true<http://**localhost:8983/solr/update/**
>> extract?literal.id=doc7&**commit=true<http://localhost:8983/solr/update/extract?literal.id=doc7&commit=true>
>> >
>>
>> "
>> -F "myfile=@xyz.pdf"
>>
>>
>> Where is my mistake?
>>
>> Greeting
>>
>> Alex
>>
>>
> 


Mime
View raw message