lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-284) Parsing Rich Document Types
Date Wed, 01 Jul 2009 15:49:47 GMT

    [ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726116#action_12726116
] 

Yonik Seeley commented on SOLR-284:
-----------------------------------

bq. It is burdensome to have to add the ignores for all the metadata.

It would be easy to change the default from index to ignore:
uprefix=ignored_    // ignored_ will be defined in the schema as indexed=false, stored=false
uprefix=attr_

Actually, that brings up another random question... when we get the metadata back from Tika,
is it typed (can we tell that number of pages is an integer?)


> Parsing Rich Document Types
> ---------------------------
>
>                 Key: SOLR-284
>                 URL: https://issues.apache.org/jira/browse/SOLR-284
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Eric Pugh
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: libs.zip, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch,
rich.patch, rich.patch, SOLR-284-no-key-gen.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch,
SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip,
test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports
streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr.
> There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message