Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@jackrabbit.apache.org
Message-ID: <1546745487.1256721962059.JavaMail.jira@brutus>
Date: Wed, 28 Oct 2009 09:26:02 +0000 (UTC)
From: "Marcel Reutegger (JIRA)" <jira@apache.org>
To: dev@jackrabbit.apache.org
Subject: [jira] Resolved: (JCR-2365) HTML Text Extractor does not extract or
 index numerics
In-Reply-To: <1547162742.1256647619709.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/JCR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcel Reutegger resolved JCR-2365.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 2.0.0
                   1.6.1

This issue does not occur in trunk because we are not using the text-extractors module anymore. Text extraction is now handled by Apache Tika.

Fixed in 1.6 branch in revision: 830478

> HTML Text Extractor does not extract or index numerics
> ------------------------------------------------------
>
>                 Key: JCR-2365
>                 URL: https://issues.apache.org/jira/browse/JCR-2365
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: indexing, jackrabbit-text-extractors
>    Affects Versions: 1.6.0
>         Environment: Win XP-Pro; Win 2003 Enterprise 32bit
>            Reporter: Jeremy Anderson
>             Fix For: 1.6.1, 2.0.0
>
>
> Numerics such as addresses/dates/financial figures are not extracted or indexed by the current HTML Extractor.  These values are handled properly and searchable when done via the PlainTextExtractor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.