lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-2051) analysis.jsp is incorrect for protWords etc
Date Tue, 17 Aug 2010 10:28:16 GMT

     [ https://issues.apache.org/jira/browse/SOLR-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated SOLR-2051:
--------------------------------

    Attachment: dynamic-AttributeSource.patch

Here the more dynamic AS, that adds missing attribute impls on restoreState() and copyTo().
This is just an idea, the AS test does not pass, as it checks for the exception previously
thrown.

I changed analysis.jsp to use this. Sorry for formatting changes, but my editor fixed the
tabs.

I am not sure, if this is good, as it may add tokenstreams attributes after the ctor which
is discouraged and can lead to unexspected behaviour on the consumer, especially if factories
dont match correct between source and target (in both cases, copyTo and restoreState). Ideally
on copyTo(), the AS should check that AF is identical.

> analysis.jsp is incorrect for protWords etc
> -------------------------------------------
>
>                 Key: SOLR-2051
>                 URL: https://issues.apache.org/jira/browse/SOLR-2051
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: dynamic-AttributeSource.patch, SOLR-2051.patch, SOLR-2051.patch,
SOLR-2051.patch
>
>
> Analysis.jsp gives the incorrect results if you use "protwords.txt" or "stemdict.txt"
or the like.
> This is because this is now implemented with KeywordAttribute (so you can easily override
any stemmer etc).
> For example, if your schema had "foobars" in protwords.txt, analysis.jsp would show it
being stemmed to "foobar", even though this doesnt actually happen.
> The problem is that this jsp is downconverting the entire tokenstream to Token in between
processing, so it silently discards KeywordAttribute and you get the wrong result.
> Note: this issue isnt about *displaying* other attributes such as KeywordAttribute (which
would be a new feature). Its about not throwing them away so that the analysis actually represents
what happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message