lucene-lucene-net-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENENET-366) Spellchecker issues
Date Sat, 15 May 2010 22:11:42 GMT

    [ https://issues.apache.org/jira/browse/LUCENENET-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867925#action_12867925
] 

Digy commented on LUCENENET-366:
--------------------------------

Hi Ben,

Do not comment out lines that cause the tests fails. (for ex. "assertLastSearcherOpen(4)"
in TestBuild.). 

The problem in the code is that SpellCheckerMock.createSearcher is never called
after changing 
{code}
    protected IndexSearcher createSearcher(Directory dir) 
{code}
in  SpellChecker.cs to
{code}
     public virtual IndexSearcher createSearcher(Directory dir) 
{code}
and changing SpellCheckerMock as

{code}
public class SpellCheckerMock : SpellChecker.Net.Search.Spell.SpellChecker
        {
            private TestSpellChecker enclosingInstance;
            ArrayList searchers = ArrayList.Synchronized(new ArrayList());  // <--New !!!!!!!
            public SpellCheckerMock(Directory spellIndex, TestSpellChecker inst)
                : base(spellIndex)
            {
                enclosingInstance = inst;
                enclosingInstance.searchers = searchers; //Note: this code is invoked after
createSearcher
            }

            public SpellCheckerMock(Directory spellIndex, StringDistance sd)
                : base(spellIndex, sd)
            {
            }

            public override IndexSearcher createSearcher(Directory dir)
            {
                IndexSearcher searcher = base.createSearcher(dir);
                searchers.Add(searcher);
                return searcher;
            }
        }
{code}

all tests pass.

PS: Don't you think to port JaroWinklerDistance && NGramDistance?

DIGY

> Spellchecker issues
> -------------------
>
>                 Key: LUCENENET-366
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-366
>             Project: Lucene.Net
>          Issue Type: Bug
>            Reporter: Ben West
>            Priority: Minor
>         Attachments: LUCENENET-366.patch, LuceneNet-SpellcheckFixes.patch, spellcheck-2.9-upgrade.patch
>
>
> There are several issues with the spellchecker:
> - It doesn't do duplicate checking across updates (so the same word is often indexed
many, many times)
> - The n-gram fields are stored as well as indexed, which increases the size of the index
by several orders of magnitude and provides no benefit
> - Some deprecated functions are used, which slows it down
> - Some methods aren't commented fully
> I will attach a patch that fixes these.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message