lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2400) FieldAnalysisRequestHandler; add information about token-relation
Date Sun, 24 Apr 2011 11:07:05 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024613#comment-13024613
] 

Uwe Schindler commented on SOLR-2400:
-------------------------------------

Hi Stefan

you seem to work again on this admin interface. How about my last proposal: Adding an internal
TokenFilter in the FieldAnalysisRequestHandler that is inserted directly after the Tokenizer
before the first TokenFilter? This one could simply count the tokens emitted by the Tokenizer
and add it as a special attribute. By this every Token emitted by Tokenizer would get a unique
ID (a integer). If some TokenFilter later splits a token, both would get the same ID. Please
note: This only works for the first Tokenizer and all TokenFilters together. If another TokenFilter
later again splits Tokens produced by a TokenFilter before, all those would get the original
ID of the Tokenizer.

Any comments? This should be quite simple to implement.

> FieldAnalysisRequestHandler; add information about token-relation
> -----------------------------------------------------------------
>
>                 Key: SOLR-2400
>                 URL: https://issues.apache.org/jira/browse/SOLR-2400
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Stefan Matheis (steffkes)
>            Priority: Minor
>         Attachments: 110303_FieldAnalysisRequestHandler_output.xml, 110303_FieldAnalysisRequestHandler_view.png
>
>
> The XML-Output (simplified example attached) is missing one small information .. which
could be very useful to build an nice Analysis-Output, and that's "Token-Relation" (if there
is special/correct word for this, please correct me).
> Meaning, that is actually not possible to "follow" the Analysis-Process (completly) while
the Tokenizers/Filters will drop out Tokens (f.e. StopWord) or split it into multiple Tokens
(f.e. WordDelimiter).
> Would it be possible to include this Information? If so, it would be possible to create
an improved Analysis-Page for the new Solr Admin (SOLR-2399) - short scribble attached

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message