jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Reutegger (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-2219) Improved background text extraction
Date Wed, 05 Aug 2009 11:30:15 GMT

    [ https://issues.apache.org/jira/browse/JCR-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739435#action_12739435

Marcel Reutegger commented on JCR-2219:

Fixed occasional test failures in revision: 801135

> Improved background text extraction
> -----------------------------------
>                 Key: JCR-2219
>                 URL: https://issues.apache.org/jira/browse/JCR-2219
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: indexing, jackrabbit-core
>            Reporter: Jukka Zitting
>            Priority: Minor
>             Fix For: 2.0.0
>         Attachments: JCR-2219.patch
> As recently discussed on the mailing list (see http://markmail.org/message/syt7lc2guzapt7la),
the current approach to text extraction in background threads doesn't work that well especially
with the Tika-based extractors that support streamed parsing of many document types.
> Also, we currently *all* of the extracted text streams are buffered into Strings before
being passed into the Lucene index. It would be good if we could somehow get back to passing
just Readers to Lucene.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message