lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject RE: Can I do "Google Suggest" Like Search? - - - from - - -vikas
Date Wed, 24 May 2006 11:08:21 GMT
>>What will happen if I send PrefixQuery

A search returns a list of docs - you want a list of
words which is why I suggested using the IndexReader
"terms" APIs which PrefixQuery uses internally.

If you are not in a position to try the more complex
solution I outlined earlier (this bases suggestions on
multiple terms in the query string) then you *could*
use PrefixQuery as follows:

        IndexReader
reader=IndexReader.open("/indexes/nasa");
        String incompleteWord="Am";
        PrefixQuery pq=new PrefixQuery(new
Term("contents",incompleteWord.toLowerCase()));
        HashSet suggestedTerms=new HashSet();
       
pq.rewrite(reader).extractTerms(suggestedTerms);
        for (Iterator iter =
suggestedTerms.iterator(); iter.hasNext();)
        {
            Term term = (Term) iter.next();
            System.out.println(term.text());          
 
        }

Using this technique on a completely free-text field
(i.e. something other than the "country" field in your
example) will probably make poorly informed
suggestions and I would refer to my previous post for
a better solution.


Cheers
Mark





--- Vikas Khengare <Vikas_Khengare@symantec.com>
wrote:

>  
> 
> Hi Mark
> 
>  
> 
>       You are right; I want suggestions from doc
> content only not
> general words. What will happen if I send
> PrefixQuery in each char input
> from user then I will get results [No problem about
> number of hits to
> show user] using AJAX. So when user type "a" Onkeyup
> I will send query
> through AJAX to search engine with prefixquery then
> I will get results.
> 
> e.g. Field("Country","America")
> 
>       Field("Country","Africa")
> 
>       Field("Country","Aegentina")
> 
>  
> 
> So If search in "Country" for "a*" it will return me
> all values which
> are starting from "a" So I will get results as I
> want.
> 
>  
> 
> Is this one right?
> 
>  
> 
> Or What is other way to do so?
> 
>  
> 
>  
> 
>  
> 
>  
> 
> -----Original Message-----
> From: mark harwood [mailto:markharw00d@yahoo.co.uk] 
> Sent: Wednesday, May 24, 2006 3:37 PM
> To: java-user@lucene.apache.org
> Subject: Re: Can I do "Google Suggest" Like Search?
> - - - from - -
> -vikas
> 
>  
> 
> Tips:
> 
>  
> 
> 1) Don't send to 3 mail lists when 1 will do please
> 
> continue this conversation on java-user only.
> 
>  
> 
> 2) Most "suggest" tools work off an index of
> previous
> 
> searches (not documents). Do you have a large set of
> 
> searches? If not, making sensible suggestions based
> on
> 
> document content can be much more compute intensive.
> 
> My assumption here is you are having to work with
> doc
> 
> content.
> 
>  
> 
> 3) You don't need to go to the expense of running a
> 
> query and ranking and scoring documents - look at
> the
> 
> lower level APIs terms() and termDocs() - use them
> to
> 
> find the matching terms
> 
>  
> 
> 4) word suggestions ideally shouldn't be independent
> 
> of each other - look at completed words in the query
> 
> string and use them to inform the selection of
> 
> suggestions for the incomplete term being typed. The
> 
> termDocs()/termPositions() apis give you all the
> data
> 
> you need to establish what docs/positions exist for
> 
> completed terms and these can be cross-referenced
> with
> 
> the list of docs/positions for the "alternative"
> terms
> 
> under consideration. A high proximity between
> 
> completed term occurences and a suggested term's
> 
> occurences makes a strong candidate. A fast way to
> do
> 
> proximity tests might be to compared sorted arrays
> of
> 
> numbers where each number represents a term using a
> 
> function like:
> 
>   termspaceNumber=[DocNumber * maxNumTermsPerDoc]+
> 
> termPositionInDoc
> 
>  
> 
> You could then compare long[]completedTermOccurences
> 
> with long[]suggestedAlternativeTermOccurences
> looking
> 
> for matches where numbers differ by 1 or 2.
> 
>  
> 
> A faster (rougher) comparison solution which ignored
> 
> word proximity would be just to compare bitsets of
> doc
> 
> ids looking for high levels of
> 
> overlap(intersection/union).
> 
>  
> 
> You can use TermEnum.docFreq() to quickly rule out
> 
> very rare words from your calculations.
> 
>  
> 
> Cheers,
> 
> Mark
> 
>  
> 
> Send instant messages to your online friends
> http://uk.messenger.yahoo.com 
> 
>  
> 
>
---------------------------------------------------------------------
> 
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> 
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
>  
> 
>
========================================================================
> ==========================
> 
>  
> 
> with best regards
> 
> from .........
> 
> vikas r. khengare
> 
> Veritas Software India Private Ltd. 
> 
> Symantec Corporation
> 
> Pune, India
> 
>  
> 
>                         [ Enjoy your life today....
> because 
=== message truncated ===



		
___________________________________________________________ 
All New Yahoo! Mail – Tired of Vi@gr@! come-ons? Let our SpamGuard protect you. http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message