lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Spencer <dave-lucene-...@tropo.com>
Subject Re: New "Did you mean" feature: How to approach?
Date Tue, 29 Mar 2005 16:31:20 GMT
Dave Spencer wrote:

> Otis Gospodnetic wrote:
>
>> Maybe the spellchecker at the bottom of the following URL will help:
>>
>>  http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/
>>  
>>
>
> Yeah, I did this, the "ngram based spelling corrector".
>
> You build a normal lucene index as you always do
> then run NGramSpeller, analyzes your index to determine which ngrams 
> are used, and saves this in a separate Lucene index
> then you call NGramSpeller.suggestUsingNGrams()  if  a users query 
> doesn't return too many results
>
> weblog entry here w/ more info and a test page:
>
> http://www.searchmorph.com/weblog/index.php?id=23


Oh and if not obvious from the above, the code is in use live.
I searchmorph has a search engine of javadoc pages.
Here I search for "hashmep" (intending 'hashmap')

http://www.searchmorph.com/kat/search.jsp?s=hashmep

See the suggestions after the text "I cannot find hashmep anywhere. 
Instead try these variations..." and note that it read my mind :) and 
hashmap is the first suggestion.


>
> -- 
>
> Some chance you'll be instested in the "more like this" similarity 
> query generator - see the "similar" tree in the sandbox
>
> -- Dave
>
>> Otis
>>
>>
>> --- "Stefan F. Keller" <sfkeller@gmail.com> wrote:
>>  
>>
>>> We would like to add "Did you mean..." to our Lucene-based search
>>> engine www.geometa.info. Doug mentioned in his recent interview that
>>> this feature would be not too complicated to implement.
>>>
>>> First I considered integrating a spelling checker (through JADT-API)
>>> but one would rather expect "nearby" words which really exist in the
>>> document pool. Some people have mentioned this feature here (or on
>>> the
>>> java-user-list).
>>>
>>> => Is anyone aware of any real developments in this area?
>>> Ideally, one would combine the data already maintained by the
>>> IndexReader class with an existing similarity search algorithm (like
>>> trigram)...
>>>
>>> => Any ideas?
>>>
>>> Stefan
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>   
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>  
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message