lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shalin Shekhar Mangar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-572) Spell Checker as a Search Component
Date Fri, 16 May 2008 05:14:55 GMT

    [ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597354#action_12597354
] 

Shalin Shekhar Mangar commented on SOLR-572:
--------------------------------------------

Otis, I agree that we should call "index' instead of "solr" for the type and "path" can be
renamed to "location". But indexDir refers to the target for the spell check index whereas
"path" currently refers to the source of the dictionary, so IMHO we should keep "indexDir"
as it is (It can also be a relative path).

For supporting arbitrary lucene indices, user must specify type="index", field="fieldName",
location="path/to/lucene/index/directory" which should be enough (TODO). In that case the
analyzer can be fixed as something (say WhitespaceAnalyzer or StandardAnalyzer).

I'm not sure I understand your comment on the schema. If this is for text files then I was
thinking more about having a text file which would have one word per line and all those words
would go into the same dictionary.

> Spell Checker as a Search Component
> -----------------------------------
>
>                 Key: SOLR-572
>                 URL: https://issues.apache.org/jira/browse/SOLR-572
>             Project: Solr
>          Issue Type: New Feature
>          Components: spellchecker
>    Affects Versions: 1.3
>            Reporter: Shalin Shekhar Mangar
>             Fix For: 1.3
>
>         Attachments: SOLR-572.patch
>
>
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features:
> * Allow creating a spell index on a given field and make it possible to have multiple
spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and process
each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message