lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8778) Define analyzer SPI names as static final fields and document the names in Javadocs
Date Sat, 25 May 2019 11:37:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848161#comment-16848161
] 

Uwe Schindler commented on LUCENE-8778:
---------------------------------------

Instead of slowing down the case-insensitive lookup, I'd just handle a Set with the original
names (for reference), but do the lookup on the lowercased map. You just have to be sure that
you don't generate duplicates.

I would like to have the documented names preserving their original case, deduplicate those
and lowercase them for lookup map. Also check that size of map and size of set are identical.
In addition document that the lookup is case-insensitive (which it always was).

Another idea I had was to allow "multiple names" for same component (to allow compatibility
with Elasticsearch). But I am not sure if this is worth. If Elasticsearch moves to "our" names,
they should keep a legacy mapping internally.

> Define analyzer SPI names as static final fields and document the names in Javadocs
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-8778
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8778
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: modules/analysis
>            Reporter: Tomoko Uchida
>            Priority: Minor
>         Attachments: ListAnalysisComponents.java, SPINamesGenerator.java, Screenshot
from 2019-04-26 02-17-48.png, TestSPINames.java
>
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Each built-in analysis component (factory of tokenizer / char filter / token filter) 
has a SPI name but currently this is not  documented anywhere.
> The goals of this issue:
>  * Define SPI names as static final field for each analysis component so that users can
get the component by name (via {{NAME}} static field.) This also provides compile time safety.
>  * Officially document the SPI names in Javadocs.
>  * Add proper source validation rules to ant {{validate-source-patterns}} target so that
we can make sure that all analysis components have correct field definitions and documentation
> and,
>  * Lookup SPI names on the new {{NAME}} fields. Instead deriving those from class names.
> (Just for quick reference) we now have:
>  * *19* Tokenizers ({{TokenizerFactory.availableTokenizers()}})
>  * *6* CharFilters ({{CharFilterFactory.availableCharFilters()}})
>  * *118* TokenFilters ({{TokenFilterFactory.availableTokenFilters()}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message