lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ma, Xiaohui (NIH/NLM/LHC) [C]" <xiao...@mail.nlm.nih.gov>
Subject RE: PDF file
Date Wed, 11 Aug 2010 14:35:47 GMT
Thanks so much for your help! I got "Remote Streaming is disabled" error. Would you please
tell me if I miss something?

Thanks, 

-----Original Message-----
From: Jayendra Patil [mailto:jayendra.patil.001@gmail.com] 
Sent: Tuesday, August 10, 2010 8:51 PM
To: solr-user@lucene.apache.org
Subject: Re: PDF file

Try ...

curl "
http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?stream.file=
<Full_Path_of_File>/pub2009001.pdf&literal.id=777045&commit=true"

stream.file - specify full path
literal.<extra params> - specify any extra params if needed

Regards,
Jayendra

On Tue, Aug 10, 2010 at 4:49 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] <
xiaohui@mail.nlm.nih.gov> wrote:

> Thanks so much for your help! I tried to index a pdf file and got the
> following. The command I used is
>
> curl '
> http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=text&map.stream_name=id&commit=true'
> -F "file=@pub2009001.pdf"
>
> Did I do something wrong? Do I need modify anything in schema.xml or other
> configuration file?
>
> ********************************************
> [xiaohui@lhcinternal lhc]$ curl '
> http://lhcinternal.nlm.nih.gov:8989/solr/lhc/update/extract?map.content=text&map.stream_name=id&commit=true'
> -F "file=@pub2009001.pdf"
> <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
> <title>Error 404 </title>
> </head>
> <body><h2>HTTP ERROR: 404</h2><pre>NOT_FOUND</pre>
> <p>RequestURI=/solr/lhc/update/extract</p><p><i><small><a
href="
> http://jetty.mortbay.org/">Powered by Jetty://</a></small></i></p><br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
>
> </body>
> </html>
> *******************************************
>
> -----Original Message-----
> From: Sharp, Jonathan [mailto:JSharp@coh.org]
> Sent: Tuesday, August 10, 2010 4:37 PM
> To: solr-user@lucene.apache.org
> Subject: RE: PDF file
>
> Xiaohui,
>
> You need to add the following jars to the lib subdirectory of the solr
> config directory on your server.
>
> (path inside the solr 1.4.1 download)
>
> /dist/apache-solr-cell-1.4.1.jar
> plus all the jars in
> /contrib/extraction/lib
>
> HTH
>
> -Jon
> ________________________________________
> From: Ma, Xiaohui (NIH/NLM/LHC) [C] [xiaohui@mail.nlm.nih.gov]
> Sent: Tuesday, August 10, 2010 11:57 AM
> To: 'solr-user@lucene.apache.org'
> Subject: RE: PDF file
>
> Does anyone have any experience with PDF file? I really appreciate your
> help!
> Thanks so much in advance.
>
> -----Original Message-----
> From: Ma, Xiaohui (NIH/NLM/LHC) [C]
> Sent: Tuesday, August 10, 2010 10:37 AM
> To: 'solr-user@lucene.apache.org'
> Subject: PDF file
>
> I have a lot of pdf files. I am trying to import pdf files to solr and
> index them. I added ExtractingRequestHandler to solrconfig.xml.
>
> Please tell me if I need download some jar files.
>
> In the Solr1.4 Enterprise Search Server book, use following command to
> import a mccm.pdf.
>
> curl '
> http://localhost:8983/solr/solr-home/update/extract?map.content=text&map.stream_name=id&commit=true'
> -F "file=@mccm.pdf"
>
> Please tell me if there is a way to import pdf files from a directory.
>
> Thanks so much for your help!
>
>
>
> ---------------------------------------------------------------------
> SECURITY/CONFIDENTIALITY WARNING:
> This message and any attachments are intended solely for the individual or
> entity to which they are addressed. This communication may contain
> information that is privileged, confidential, or exempt from disclosure
> under applicable law (e.g., personal health information, research data,
> financial information). Because this e-mail has been sent without
> encryption, individuals other than the intended recipient may be able to
> view the information, forward it to others or tamper with the information
> without the knowledge or consent of the sender. If you are not the intended
> recipient, or the employee or person responsible for delivering the message
> to the intended recipient, any dissemination, distribution or copying of the
> communication is strictly prohibited. If you received the communication in
> error, please notify the sender immediately by replying to this message and
> deleting the message and any accompanying files from your system. If, due to
> the security risks, you do not wish to receive further communications via
> e-mail, please reply to this message and inform the sender that you do not
> wish to receive further e-mail from the sender.
>
> ---------------------------------------------------------------------
>
>

Mime
View raw message