lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <>
Subject [jira] [Commented] (SOLR-1605) ExtractingRequestHandler does not embed original document
Date Sat, 23 Mar 2013 00:45:16 GMT


Jan Høydahl commented on SOLR-1605:

I'm not sure this is a great idea. You could add an option to store the source as a BinaryField
or something, but what good does it do to have a 500Mb media file in your index? Or do you
want to store the parsed and structured XHTML output from Tika in a stored field? I'm afraid
that output is not meant for pretty display.
> ExtractingRequestHandler does not embed original document
> ---------------------------------------------------------
>                 Key: SOLR-1605
>                 URL:
>             Project: Solr
>          Issue Type: Wish
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 1.4
>            Reporter: Lance Norskog
> The ExtractingRequestHandler does not have the option to embed the original document
file as a saved field. 
> This would be generally useful for content management system purposes, since the search
index can also directly serve the content making for a much simpler system architecture.
> My use case is to highlight indexed HTML. Since the raw HTML text is not indexed, it
is not possible to request it highlighted.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message