lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tomoko Uchida (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8778) Define analyzer SPI names as static final fields and document the names in Javadocs
Date Fri, 24 May 2019 11:49:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847486#comment-16847486
] 

Tomoko Uchida commented on LUCENE-8778:
---------------------------------------

Hi [~thetaphi],

I did a regression test and fixed incorrect SPI names (they had been mistakenly copypateted
in previous commits).
 # List SPI names and their class names of all analysis components with master branch. ([^ListAnalysisComponents.java])
 # Make sure that all components can be looked up by (old) SPI names with my branch (pull
request). ([^TestSPINames.java])

Also I modified {{AnalysisSPILoader}} to preserve service names' letter casing. Now documented
SPI names are camel cased, so it would be better that we preserve original names as is. Instead
of lowercasing when registering the names, we can perform case-insensitive lookup. Because
the service map is small, I guess the performance degredation will not matter much in this
case (I'm not quite sure, but there might be better ways?). ([diff|https://github.com/apache/lucene-solr/pull/654/commits/fc903379b0a53b690adf1c1ca5843b92444895ec])

This branch passed ant test & precommit.

> Define analyzer SPI names as static final fields and document the names in Javadocs
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-8778
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8778
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Tomoko Uchida
>            Priority: Minor
>         Attachments: ListAnalysisComponents.java, SPINamesGenerator.java, Screenshot
from 2019-04-26 02-17-48.png, TestSPINames.java
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Each built-in analysis component (factory of tokenizer / char filter / token filter) 
has a SPI name but currently this is not  documented anywhere.
> The goals of this issue:
>  * Define SPI names as static final field for each analysis component so that users can
get the component by name (via {{NAME}} static field.) This also provides compile time safety.
>  * Officially document the SPI names in Javadocs.
>  * Add proper source validation rules to ant {{validate-source-patterns}} target so that
we can make sure that all analysis components have correct field definitions and documentation
> and,
>  * Lookup SPI names on the new {{NAME}} fields. Instead deriving those from class names.
> (Just for quick reference) we now have:
>  * *19* Tokenizers ({{TokenizerFactory.availableTokenizers()}})
>  * *6* CharFilters ({{CharFilterFactory.availableCharFilters()}})
>  * *118* TokenFilters ({{TokenFilterFactory.availableTokenFilters()}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message