lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-2400) FieldAnalysisRequestHandler; add information about token-relation
Date Sun, 24 Apr 2011 21:24:05 GMT

     [ https://issues.apache.org/jira/browse/SOLR-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated SOLR-2400:
--------------------------------

    Attachment: SOLR-2400.patch

Here a first & quick patch for TRUNK (may not apply to 3.x).

The FieldAnalysisRequestHandler behaves as before, only tht it adds an additional property
"positionHistory" to the named lists with attributes. This property contains all positions
this token had before, the last one ist the actual position repeated. "2.2.4.4" means that
this token had position 2 after Tokenizer, after first filter still 2, but then changed to
4 after second filter. The actual position after 3rd filter is 4.

By the way, this also fixes a bug in the RequestHandler: The list of tokens is sorted on printout
(by position) and the original list is modified by that. Later Filters will then see the Tokens
in the new order, which is a bug. The new code copies the List to an array first to dont touch
the tokens. This bug only affects strange TokenStreams with negative position increments,
so we can fix this together with this issue (once it is committed).

An example output is:
[http://localhost:8983/solr/analysis/field?analysis.fieldtype=text&analysis.fieldvalue=moo-moo+dontstems+foo-bar+and+this+fucking+token]

(default schema, Solr trunk)

Hope that helps.

> FieldAnalysisRequestHandler; add information about token-relation
> -----------------------------------------------------------------
>
>                 Key: SOLR-2400
>                 URL: https://issues.apache.org/jira/browse/SOLR-2400
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Stefan Matheis (steffkes)
>            Priority: Minor
>         Attachments: 110303_FieldAnalysisRequestHandler_output.xml, 110303_FieldAnalysisRequestHandler_view.png,
SOLR-2400.patch
>
>
> The XML-Output (simplified example attached) is missing one small information .. which
could be very useful to build an nice Analysis-Output, and that's "Token-Relation" (if there
is special/correct word for this, please correct me).
> Meaning, that is actually not possible to "follow" the Analysis-Process (completly) while
the Tokenizers/Filters will drop out Tokens (f.e. StopWord) or split it into multiple Tokens
(f.e. WordDelimiter).
> Would it be possible to include this Information? If so, it would be possible to create
an improved Analysis-Page for the new Solr Admin (SOLR-2399) - short scribble attached

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message