lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul taylor (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-1787) Standard Tokenizer doesn't recognise I.B.M as Acronym, it requires it ends with a dot i.e I.B.M.
Date Fri, 21 Aug 2009 16:48:14 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Paul taylor updated LUCENE-1787:
--------------------------------

    Attachment: Patch1.txt

Fix so that Acronymns without trailing dot are parsed as acronym, amended related Acronymn
test in Analyser.

(Sources were flexed and compiled using ant build, assume this uses correct Java version for
flex file generation)

> Standard Tokenizer doesn't recognise I.B.M as Acronym, it requires it ends with a dot
i.e I.B.M.
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1787
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1787
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Paul taylor
>         Attachments: Patch1.txt
>
>
> Standard Tokenzizer doesn't recognise I.B.M it requires it end with a dot i.e I.B.M.
This is particulary problematic if I.B.M is added tot the index, with the StandardAnalyser
it will get added as  IBM , a search for I.B.M will not match because I.B.M will be left as
is, I would expect a match in this scenario
> I think it could be fixed by modifying the  grammar ACRONYM_DEP  in StandardTokenizerImpl.jflex
so that it also supports
> {ALPHANUM} ("." {ALPHANUM})+
> dot only required between each character, (I'm not familiar with jflex syntax )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message