lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikas Khengare" <Vikas_Kheng...@symantec.com>
Subject RE: Can I do "Google Suggest" Like Search? - - - from - - -vikas
Date Wed, 24 May 2006 10:25:46 GMT
 

Hi Mark

 

      You are right; I want suggestions from doc content only not
general words. What will happen if I send PrefixQuery in each char input
from user then I will get results [No problem about number of hits to
show user] using AJAX. So when user type "a" Onkeyup I will send query
through AJAX to search engine with prefixquery then I will get results.

e.g. Field("Country","America")

      Field("Country","Africa")

      Field("Country","Aegentina")

 

So If search in "Country" for "a*" it will return me all values which
are starting from "a" So I will get results as I want.

 

Is this one right?

 

Or What is other way to do so?

 

 

 

 

-----Original Message-----
From: mark harwood [mailto:markharw00d@yahoo.co.uk] 
Sent: Wednesday, May 24, 2006 3:37 PM
To: java-user@lucene.apache.org
Subject: Re: Can I do "Google Suggest" Like Search? - - - from - -
-vikas

 

Tips:

 

1) Don't send to 3 mail lists when 1 will do please

continue this conversation on java-user only.

 

2) Most "suggest" tools work off an index of previous

searches (not documents). Do you have a large set of

searches? If not, making sensible suggestions based on

document content can be much more compute intensive.

My assumption here is you are having to work with doc

content.

 

3) You don't need to go to the expense of running a

query and ranking and scoring documents - look at the

lower level APIs terms() and termDocs() - use them to

find the matching terms

 

4) word suggestions ideally shouldn't be independent

of each other - look at completed words in the query

string and use them to inform the selection of

suggestions for the incomplete term being typed. The

termDocs()/termPositions() apis give you all the data

you need to establish what docs/positions exist for

completed terms and these can be cross-referenced with

the list of docs/positions for the "alternative" terms

under consideration. A high proximity between

completed term occurences and a suggested term's

occurences makes a strong candidate. A fast way to do

proximity tests might be to compared sorted arrays of

numbers where each number represents a term using a

function like:

  termspaceNumber=[DocNumber * maxNumTermsPerDoc]+

termPositionInDoc

 

You could then compare long[]completedTermOccurences

with long[]suggestedAlternativeTermOccurences looking

for matches where numbers differ by 1 or 2.

 

A faster (rougher) comparison solution which ignored

word proximity would be just to compare bitsets of doc

ids looking for high levels of

overlap(intersection/union).

 

You can use TermEnum.docFreq() to quickly rule out

very rare words from your calculations.

 

Cheers,

Mark

 

Send instant messages to your online friends
http://uk.messenger.yahoo.com 

 

---------------------------------------------------------------------

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

For additional commands, e-mail: java-user-help@lucene.apache.org

 

========================================================================
==========================

 

with best regards

from .........

vikas r. khengare

Veritas Software India Private Ltd. 

Symantec Corporation

Pune, India

 

                        [ Enjoy your life today.... because yesterday
had gone.... and tommorow may never come . ]

 


Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message