It turns out that the org.apache.jackrabbit.extractor.HTMLParser eats all digits. in method filterAndJoin, all non-letters are removed. Does anybody has any idea why we do so? imo, index "hf100" makes more sense than indexing "hf". Or is there anyway I can configure to use my HTMLParser instead of the default? best, kevin ----- Original Message ---- From: Cheng Zhang To: users@jackrabbit.apache.org Sent: Saturday, January 3, 2009 3:02:51 PM Subject: search results Hi, I have a html file as below stored in the repository. Manufacture: CANON
Model: HF100
Title: Canon VIXIA hf100 Flash Memory High Definition Camcorder with 12x Optical Image Stabilized Zoom
However, if I search for 'hf100', it returns nothing. Any suggestion? Thanks a lot, Kevin