lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Sekiguchi (JIRA)" <>
Subject [jira] Commented: (SOLR-415) LoggingFilter for debug
Date Mon, 10 Dec 2007 02:08:43 GMT


Koji Sekiguchi commented on SOLR-415:

This is for debug. One of use cases in my case for example...

We use morphological tokenizer to tokenize Japanese text. To let the tokenizer analyze text,
we have to have "character level normalization" prior to tokenization.

I'll try to explain it by using English words...

If you have a text to be analyzed that includes "colour". And your morphological tokenizer
uses American dictionary to tokenize the text, you have to normalize "colour" to "color" so
that the tokenizer can look up it in the dictionary.

To implement this, I've developed MappingReader that reads mapping.txt and normalize (Japanese)
characters prior to tokenizer:

MappingReader -> Japanese Tokenizer -> Filters...

In this case, if MappingReader normalizes "ou" to "o", this makes a trouble in highlighter.
(I used LoggingFilter to find this problem.)

To solve this problem, MappingReader has correctPosition(int pos) method to tell original
position to tokenizer.
(If this is useful for European languages (for umlaut or something...) I'm glad to open another
JIRA issue.)

Also in SOLR-319, I used LoggingFilter to see SynonymFilter outputs.

I'll try to include your suggestion into my patch soon.

Thank you.

> LoggingFilter for debug
> -----------------------
>                 Key: SOLR-415
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Koji Sekiguchi
>            Priority: Trivial
>         Attachments: SOLR-415.patch, SOLR-415.patch, SOLR-415.patch
> logging version of analysis.jsp

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message