lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] Commented: (SOLR-284) Parsing Rich Document Types
Date Mon, 24 Nov 2008 22:11:44 GMT


Hoss Man commented on SOLR-284:

bq. if Tika returns a metadata field and you haven't made an explicit mapping from the Tika
fieldname to your Solr fieldname, then Solr will throw an exception and your document add
will fail. This doesn't seem sound very robust for a production environment, unless Tika will
only ever use a finite list of metadata field names.

I'm not familiar with the state of the patch, but i'm assuming that (by default) all of the
metadata fields produced by tika have a common naming convention -- either in terms of a common
prefix or a common suffix.  in which case people can always make a dynamicField declaration
to ignore all metadata fields not already explicitly declared.

> Parsing Rich Document Types
> ---------------------------
>                 Key: SOLR-284
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Eric Pugh
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>         Attachments:, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch,
rich.patch, rich.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf,,,,, un-hardcode-id.diff
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports
streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr.
> There is a wiki page with information here:

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message