jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Zhang <zhangyongji...@yahoo.com>
Subject Re: search results
Date Sun, 04 Jan 2009 12:42:12 GMT
Thanks, Jukka. Attached please my version of HTMLParser.java.


----- Original Message ----
From: Jukka Zitting <jukka.zitting@gmail.com>
To: users@jackrabbit.apache.org
Sent: Sunday, January 4, 2009 1:48:10 AM
Subject: Re: search results

Hi,

On Sun, Jan 4, 2009 at 3:08 AM, Cheng Zhang <zhangyongjiang@yahoo.com> wrote:
> It turns out that the org.apache.jackrabbit.extractor.HTMLParser eats all digits.
> in method filterAndJoin, all non-letters are removed.
> Does anybody has any idea why we do so? imo, index "hf100" makes more
> sense than indexing "hf".

I don't recall any specific reason why digits should be dropped. I'd
be happy to apply the fix if you've already fixed this and would like
to attach the patch to Jira.

> Or is there anyway I can configure to use my HTMLParser instead of the default?

Look at the textFilterClasses parameter in the <SearchIndex/>
configuration of your repository.xml and workspace.xml files.

BR,

Jukka Zitting

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message