lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 王巍巍 <ww.wang...@gmail.com>
Subject Re: Suggestive Search
Date Thu, 09 Apr 2009 02:39:33 GMT
I test the lucene spellchecker and it doesn't support chinese spell checker,
how can i achieve this goal as google does?

2009/4/9 Karl Wettin <karl.wettin@gmail.com>

> If you use prefix grams only then you'll get a forward-only suggestion
> scheme. I've seen several implementation that use that and it works quite
> well.
>
> harry potter: ^ha, ^har, ^harr, ^harry, ^harry p, ^harry po..
> harry houdini: ^ha, ^har, ^harr, ^harry, ^harry h, ^harry ho..
>
> I prefere the trie-pattern though. Just rememberd there is an old one in
> LUCENE-625.
>
>     karl
>
> 8 apr 2009 kl. 20.50 skrev Matt Schraeder:
>
>
>  Corerct me if I'm wrong, but I don't think n-grams is really what I'm
>> looking for here.  I'm not looking for a spellchecker or phrase checker
>> style suggestive search, but only based on the exact phrases the user is
>> currently typing.  Since Lucene uses term-based searching, I'm not sure
>> how to have it search on portions of a full phrase.  Using a standard
>> lucene search typing in "harr" will result in searching for "harr" as a
>> term, which will not find "Harry Potter".  Using ngrams it would find
>> "Harry" as a term, but not at the beginning of an entire phrase.  This
>> would bring back "My Dog Harry" as a result, which isn't what I'm
>> looking for. I just want phrases from fields beginning with "Harr"
>> only.
>>
>> I could easily do this all with our database server by simply doing a
>> query for "where searchqueries like 'harr%'" but we're trying to limit
>> our hits to the database to keep speed up on the site.
>>
>>  karl.wettin@gmail.com 4/8/2009 12:49:45 PM >>>
>>>>>
>>>>
>> For this you probably want to use ngrams. Wether or not this is
>> something that fits in your current index is hard to say. My guess is
>>
>> that you want to create a new index with one document per unique
>> phrase. You might also want to try to load this index in an
>> InstantiatedIndex, that could speed things up quite a bit if the
>> corpus is not too large.
>>
>> If your suggestion text corpus is really large and you only want
>> forward-only suggestions then you might want to consider a trie-
>> pattern solution instead. These can be rather resource efficient, even
>>
>> when loaded to memory.
>>
>> If you have a lot of user load on your search eninge then it might be
>>
>> interesting to use old user queries as the base of your suggestions
>> and perhaps boost a bit on trends, i.e. the more people search for
>> something the more it get boosted in the suggestions list.
>>
>>
>>     karl
>>
>> 8 apr 2009 kl. 15.26 skrev Matt Schraeder:
>>
>>  I want to add a suggestive search similar to google's to
>>>
>> autocomplete
>>
>>> search phrases as the user types.  It doesn't have to be very
>>> elaborate
>>> and for the most part will just involve searching single fields.
>>>
>> How
>>
>>> can I perform a search  to be able to fill in autocomplete text?
>>>
>>> For instance, if I start typing "Harr" it should bring up "Harry
>>> Potter" "Harry Houdini" and "Harry S. Truman"
>>>
>>> I have tried doing search queries for "Harr*" but it's still doing
>>> term-based searching rather than searching a full field.  To make a
>>> field both searchable as the full field as well as tokenized, would
>>>
>> I
>>
>>> have to duplicate the field and make one a keyword field? Is there a
>>> more convenient way to do this? I have also considered making a
>>>
>> second
>>
>>> index for suggestive search, which would only have the fields that I
>>> want to enable suggestive search on, but this seems like it would be
>>> unneccesary duplication of data as well, though it would probably
>>>
>> make
>>
>>> suggestive search faster due to a smaller index.
>>>
>>> Ideally it would also be nice to be able to rank these terms based
>>>
>> on
>>
>>> the number of times they have been searched for so that the results
>>>
>>
>>  are
>>> tailored more to our users rather than simply just the score that
>>> Lucene
>>> chooses.
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
王巍巍(Weiwei Wang)
Department of Computer Science
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Mobile: 86-13913310569
MSN: ww.wang.cs@gmail.com
Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message