lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Estrada <estrada.a...@gmail.com>
Subject Re: Indexing pdf files - question.
Date Mon, 13 Dec 2010 16:52:02 GMT
Hi,

I use the following command to post PDF files.

$ curl "http://localhost:8983/solr/update/extract?stream.file=C
:\temp\document.docx&stream.contentType=application/msword&literal.id
=esc.doc&commit=true"
$ curl "http://localhost:8983/solr/update/extract?stream.file=C
:\temp\features.pdf&stream.contentType=application/pdf&literal.id
=esc2.doc&commit=true"
$ curl "http://localhost:8983/solr/update/extract?stream.file=C
:\temp\Memo_ocrd.pdf&stream.contentType=application/pdf&literal.id
=Memo_ocrd.pdf&defaultField=text&commit=true"

The PDF's have to be OCR'd.

Adam

On Mon, Dec 13, 2010 at 11:01 AM, Siebor, Wlodek [USA] <
siebor_wlodek@bah.com> wrote:

> HI,
> Can sombody, please, send me a command for indexing a sample pdf with
> ExtractngRequestHandler file available in the /docs directory. I have
> lucidworks solr installed on linux, with standard schema.xml and
> solrconfig.xml files (unchanged). I want to pass as the unique id the name
> of the file.
> I’m trying various curl commands and so far I have either  “… missing
> required field: id” or “.. missing content stream” errors.
> Thanks for your help,
> Wlodek
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message