Return-Path: Delivered-To: apmail-incubator-jackrabbit-dev-archive@www.apache.org Received: (qmail 244 invoked from network); 30 Nov 2005 09:14:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 30 Nov 2005 09:14:56 -0000 Received: (qmail 69173 invoked by uid 500); 30 Nov 2005 09:14:54 -0000 Mailing-List: contact jackrabbit-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jackrabbit-dev@incubator.apache.org Delivered-To: mailing list jackrabbit-dev@incubator.apache.org Received: (qmail 69162 invoked by uid 99); 30 Nov 2005 09:14:54 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Nov 2005 01:14:52 -0800 Received: from ajax.apache.org (ajax.apache.org [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id 7044DCB for ; Wed, 30 Nov 2005 10:14:31 +0100 (CET) Message-ID: <1435861619.1133342071457.JavaMail.jira@ajax.apache.org> Date: Wed, 30 Nov 2005 10:14:31 +0100 (CET) From: "Martin Perez (JIRA)" To: jackrabbit-dev@incubator.apache.org Subject: [jira] Commented: (JCR-281) textfilters module patch: Support for text extraction for HTML,XML and RTF files In-Reply-To: <1759551929.1133264670309.JavaMail.jira@ajax.apache.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/JCR-281?page=comments#action_12358896 ] Martin Perez commented on JCR-281: ---------------------------------- There is no problem, I understand what you say Roy. I'll try to replace htmlparser with another solution. Personally, I prefer to use a third party library because it will be faster and more effective. Give some time to evaluate Neko and other alternatives. > textfilters module patch: Support for text extraction for HTML,XML and RTF files > -------------------------------------------------------------------------------- > > Key: JCR-281 > URL: http://issues.apache.org/jira/browse/JCR-281 > Project: Jackrabbit > Type: Improvement > Components: query > Reporter: Martin Perez > Attachments: patch.diff > > This patch adds text extraction support form XML, RTF and HTML files. > The unique dependency is htmlparser library for handling HTML text extraction. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira