lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Naber (JIRA)" <>
Subject [jira] Created: (LUCENE-882) Spellchecker doesn't need to store ngrams
Date Thu, 17 May 2007 21:02:16 GMT
Spellchecker doesn't need to store ngrams

                 Key: LUCENE-882
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Other
    Affects Versions: 2.1
            Reporter: Daniel Naber
         Attachments: lucene-spellchecker.diff

The spellchecker in contrib stores the ngrams although this doesn't seem to be necessary.
This patch changes that, I will commit it unless someone objects. This improves indexing speed
and index size. Some numbers on a small test I did:

Input of the original index: 2200 text files, index size 5.3 MB, indexing took 17 seconds

Spell index before patch: about 60.000 documents, index size 13 MB, indexing took 62 seconds
Spell index after patch: about 60.000 documents, index size 6.3 MB, indexing took 52 seconds

BTW, the test case fails even before this patch. I'll probaby submit another issue about how
to fix that.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message