lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization
Date Wed, 28 Nov 2012 22:31:58 GMT


Robert Muir commented on SOLR-4123:

actually a lucene issue (the factory is in analysis/icu) but doesnt matter what jira its on.

I can try to help with the implementation here, my current problem is how it should look to
the user.
factories take Map<String,String>.

the best idea i have is:

so thats just one key=value, the value is a list of files and it must follow convention (

> ICUTokenizerFactory - per-script RBBI customization
> ---------------------------------------------------
>                 Key: SOLR-4123
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 4.0
>            Reporter: Shawn Heisey
>             Fix For: 4.1, 5.0
> Initially this started out as an idea for a configuration knob on ICUTokenizer that would
allow me to tell it not to tokenize on punctuation.  Through IRC discussion on #lucene, it
sorta ballooned.  The committers had a long discussion about it that I don't really understand,
so I'll be including it in the comments.
> I am a Solr user, so I would also need the ability to access the configuration from there,
likely either in schema.xml or solrconfig.xml.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message