lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Harris (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-284) Parsing Rich Document Types
Date Mon, 29 Jun 2009 16:36:48 GMT

    [ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725237#action_12725237
] 

Chris Harris commented on SOLR-284:
-----------------------------------

bq. Apologies for not reviewing this sooner after it was committed - but this is the last/best
chance to improve the interface before 1.4 is released (and this is very important new functionality).

My only request is that, if you're changing how field mapping works and maybe removing ext.ignore.und.fl,
you make sure it stays easy to say, "Tika, I don't care about any of your parsed metadata.
Please leave it out of my Solr index." In my current use case I already know all the metadata
I want, and including the Tika-parsed fields would result in index bloat. (My temptation would
be to make excluding Tika-parsed fields the default, though it sounds like other people have
the opposite inclination.)


> Parsing Rich Document Types
> ---------------------------
>
>                 Key: SOLR-284
>                 URL: https://issues.apache.org/jira/browse/SOLR-284
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Eric Pugh
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: libs.zip, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch,
rich.patch, rich.patch, SOLR-284-no-key-gen.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch,
SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip,
test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports
streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr.
> There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message