lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2400) FieldAnalysisRequestHandler; add information about token-relation
Date Sun, 24 Apr 2011 11:23:05 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024615#comment-13024615
] 

Uwe Schindler commented on SOLR-2400:
-------------------------------------

After thinking a little bit more, I think it would even be possible to add this Filter after
*each* Step to track tokens. The resulting Attribute would then contain the whole tracking
of positions:
- After Tokenizer this attribute would contains "0", "1", "2",...
- After the first TokenFilter: "0.0", "1.1", "1.2", "1.3", "2.2" (while the second token (1)
emitteded by the Tokenizer was split into 3 Tokens). I think this would help? Additionally
the Filter could use PositionIncrement to track same position tokens - or this could be left
to the consumer (so if 1.2 and 1.3 have posIncr 0, the consumer knows that they all are at
same position). If the TokenFilter would use the PosIncr to increment the unique IDs, then
this would be solved (so 1.x tokens would always get "1.1" as ID if at same position).

I will think about it an supply a patch that enriches the FieldAnalysisContentHandler by this
extra attribute.

We can then iterate. But today is Easter Holiday, so little bit later...

> FieldAnalysisRequestHandler; add information about token-relation
> -----------------------------------------------------------------
>
>                 Key: SOLR-2400
>                 URL: https://issues.apache.org/jira/browse/SOLR-2400
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Stefan Matheis (steffkes)
>            Priority: Minor
>         Attachments: 110303_FieldAnalysisRequestHandler_output.xml, 110303_FieldAnalysisRequestHandler_view.png
>
>
> The XML-Output (simplified example attached) is missing one small information .. which
could be very useful to build an nice Analysis-Output, and that's "Token-Relation" (if there
is special/correct word for this, please correct me).
> Meaning, that is actually not possible to "follow" the Analysis-Process (completly) while
the Tokenizers/Filters will drop out Tokens (f.e. StopWord) or split it into multiple Tokens
(f.e. WordDelimiter).
> Would it be possible to include this Information? If so, it would be possible to create
an improved Analysis-Page for the new Solr Admin (SOLR-2399) - short scribble attached

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message