Mailing-List: contact jackrabbit-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jackrabbit-dev@incubator.apache.org
Message-ID: <1435861619.1133342071457.JavaMail.jira@ajax.apache.org>
Date: Wed, 30 Nov 2005 10:14:31 +0100 (CET)
From: "Martin Perez (JIRA)" <jira@apache.org>
To: jackrabbit-dev@incubator.apache.org
Subject: [jira] Commented: (JCR-281) textfilters module patch: Support for
 text extraction for HTML,XML and RTF files
In-Reply-To: <1759551929.1133264670309.JavaMail.jira@ajax.apache.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

    [ http://issues.apache.org/jira/browse/JCR-281?page=comments#action_12358896 ] 

Martin Perez commented on JCR-281:
----------------------------------

There is no problem, I understand what you say Roy. 

I'll try to replace htmlparser with another solution. Personally, I prefer to use a third party library because it will be faster and more effective. Give some time to evaluate Neko and other alternatives.


> textfilters module patch: Support for text extraction for HTML,XML and RTF files
> --------------------------------------------------------------------------------
>
>          Key: JCR-281
>          URL: http://issues.apache.org/jira/browse/JCR-281
>      Project: Jackrabbit
>         Type: Improvement
>   Components: query
>     Reporter: Martin Perez
>  Attachments: patch.diff
>
> This patch adds text extraction support form XML, RTF and HTML files.
> The unique dependency is htmlparser library for handling HTML text extraction.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira