lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Commented: (LUCENE-1692) Contrib analyzers need tests
Date Thu, 18 Jun 2009 20:26:07 GMT


Robert Muir commented on LUCENE-1692:

Michael, I think it would be nice to fix the Thai offset bug, so highlighter will work. this
is a safe one-line fix and its an obvious error.

The SmartChineseAnalyzer empty token bug is pretty serious, i think indexing empty tokens
for every piece of punctuation could really hurt similarity computation (am i wrong, never

The Thai .type() bug is something that could be fixed later, i don't think the token type
being ALPHANUM versus NUM is really hurting anyone.

The issue where DutchAnalyzer doesnt do what it claims, i think thats not really hurting anyone,
and they can use the snowball version if they want accurate snowball behavior.
I do think the huge files in DutchAnalyzer that aren't being used can be removed if you want
to save 1MB, but I'm not sure how important that is.

Let me know your thoughts. 

> Contrib analyzers need tests
> ----------------------------
>                 Key: LUCENE-1692
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 2.9
>         Attachments: LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt, LUCENE-1692.txt
> The analyzers in contrib need tests, preferably ones that test the behavior of all the
Token 'attributes' involved (offsets, type, etc) and not just what they do with token text.
> This way, they can be converted to the new api without breakage.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message