lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tomoko Uchida (JIRA)" <>
Subject [jira] [Commented] (LUCENE-8778) Define analyzer SPI names as static final fields and document the names in Javadocs
Date Sat, 25 May 2019 15:32:00 GMT


Tomoko Uchida commented on LUCENE-8778:

I updated the pull request.
 * Service lookup is performed on the case-insensitive map keys (as before). Preserve original
names in the auxiliary Set for reference. Also add a check to make sure that the size of the
lookup map and the original name set.
 * Restrict characters that can be used in the SPI names: only allow alphabets, digits, and
underscores. (The last one is added for possible future uses.)
 * Document about case-insensitive lookup in each Javadoc tag (I took a screenshot). It's
a bit redundant but at least they are not likely to be overlooked.

!Screenshot from 2019-05-25 23-25-24.png!

I would like to delay allowing "multiple names" or "aliases", because I don't want to implement
a feature this could never be used. If Elasticsearch team or someone else is interested in
using the analysis service loader, I think the modification is easy and we can work together

Can you please review the last changes in the service loader class? Here are the diff: [bf6fc2b|],

> Define analyzer SPI names as static final fields and document the names in Javadocs
> -----------------------------------------------------------------------------------
>                 Key: LUCENE-8778
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Tomoko Uchida
>            Priority: Minor
>         Attachments:,, Screenshot
from 2019-04-26 02-17-48.png, Screenshot from 2019-05-25 23-25-24.png,
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
> Each built-in analysis component (factory of tokenizer / char filter / token filter) 
has a SPI name but currently this is not  documented anywhere.
> The goals of this issue:
>  * Define SPI names as static final field for each analysis component so that users can
get the component by name (via {{NAME}} static field.) This also provides compile time safety.
>  * Officially document the SPI names in Javadocs.
>  * Add proper source validation rules to ant {{validate-source-patterns}} target so that
we can make sure that all analysis components have correct field definitions and documentation
> and,
>  * Lookup SPI names on the new {{NAME}} fields. Instead deriving those from class names.
> (Just for quick reference) we now have:
>  * *19* Tokenizers ({{TokenizerFactory.availableTokenizers()}})
>  * *6* CharFilters ({{CharFilterFactory.availableCharFilters()}})
>  * *118* TokenFilters ({{TokenFilterFactory.availableTokenFilters()}})

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message