lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhenjian YU" <zhenj...@gmail.com>
Subject Re: Can I do "Google Suggest" Like Search? - - - from - - -vikas
Date Thu, 25 May 2006 03:03:42 GMT
yes, PrefixQuery will help.

On 5/24/06, mark harwood <markharw00d@yahoo.co.uk> wrote:
>
> >>What will happen if I send PrefixQuery
>
> A search returns a list of docs - you want a list of
> words which is why I suggested using the IndexReader
> "terms" APIs which PrefixQuery uses internally.
>
> If you are not in a position to try the more complex
> solution I outlined earlier (this bases suggestions on
> multiple terms in the query string) then you *could*
> use PrefixQuery as follows:
>
>         IndexReader
> reader=IndexReader.open("/indexes/nasa");
>         String incompleteWord="Am";
>         PrefixQuery pq=new PrefixQuery(new
> Term("contents",incompleteWord.toLowerCase()));
>         HashSet suggestedTerms=new HashSet();
>
> pq.rewrite(reader).extractTerms(suggestedTerms);
>         for (Iterator iter =
> suggestedTerms.iterator(); iter.hasNext();)
>         {
>             Term term = (Term) iter.next();
>             System.out.println(term.text());
>
>         }
>
> Using this technique on a completely free-text field
> (i.e. something other than the "country" field in your
> example) will probably make poorly informed
> suggestions and I would refer to my previous post for
> a better solution.
>
>
> Cheers
> Mark
>
>
>
>
>
> --- Vikas Khengare <Vikas_Khengare@symantec.com>
> wrote:
>
> >
> >
> > Hi Mark
> >
> >
> >
> >       You are right; I want suggestions from doc
> > content only not
> > general words. What will happen if I send
> > PrefixQuery in each char input
> > from user then I will get results [No problem about
> > number of hits to
> > show user] using AJAX. So when user type "a" Onkeyup
> > I will send query
> > through AJAX to search engine with prefixquery then
> > I will get results.
> >
> > e.g. Field("Country","America")
> >
> >       Field("Country","Africa")
> >
> >       Field("Country","Aegentina")
> >
> >
> >
> > So If search in "Country" for "a*" it will return me
> > all values which
> > are starting from "a" So I will get results as I
> > want.
> >
> >
> >
> > Is this one right?
> >
> >
> >
> > Or What is other way to do so?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: mark harwood [mailto:markharw00d@yahoo.co.uk]
> > Sent: Wednesday, May 24, 2006 3:37 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Can I do "Google Suggest" Like Search?
> > - - - from - -
> > -vikas
> >
> >
> >
> > Tips:
> >
> >
> >
> > 1) Don't send to 3 mail lists when 1 will do please
> >
> > continue this conversation on java-user only.
> >
> >
> >
> > 2) Most "suggest" tools work off an index of
> > previous
> >
> > searches (not documents). Do you have a large set of
> >
> > searches? If not, making sensible suggestions based
> > on
> >
> > document content can be much more compute intensive.
> >
> > My assumption here is you are having to work with
> > doc
> >
> > content.
> >
> >
> >
> > 3) You don't need to go to the expense of running a
> >
> > query and ranking and scoring documents - look at
> > the
> >
> > lower level APIs terms() and termDocs() - use them
> > to
> >
> > find the matching terms
> >
> >
> >
> > 4) word suggestions ideally shouldn't be independent
> >
> > of each other - look at completed words in the query
> >
> > string and use them to inform the selection of
> >
> > suggestions for the incomplete term being typed. The
> >
> > termDocs()/termPositions() apis give you all the
> > data
> >
> > you need to establish what docs/positions exist for
> >
> > completed terms and these can be cross-referenced
> > with
> >
> > the list of docs/positions for the "alternative"
> > terms
> >
> > under consideration. A high proximity between
> >
> > completed term occurences and a suggested term's
> >
> > occurences makes a strong candidate. A fast way to
> > do
> >
> > proximity tests might be to compared sorted arrays
> > of
> >
> > numbers where each number represents a term using a
> >
> > function like:
> >
> >   termspaceNumber=[DocNumber * maxNumTermsPerDoc]+
> >
> > termPositionInDoc
> >
> >
> >
> > You could then compare long[]completedTermOccurences
> >
> > with long[]suggestedAlternativeTermOccurences
> > looking
> >
> > for matches where numbers differ by 1 or 2.
> >
> >
> >
> > A faster (rougher) comparison solution which ignored
> >
> > word proximity would be just to compare bitsets of
> > doc
> >
> > ids looking for high levels of
> >
> > overlap(intersection/union).
> >
> >
> >
> > You can use TermEnum.docFreq() to quickly rule out
> >
> > very rare words from your calculations.
> >
> >
> >
> > Cheers,
> >
> > Mark
> >
> >
> >
> > Send instant messages to your online friends
> > http://uk.messenger.yahoo.com
> >
> >
> >
> >
> ---------------------------------------------------------------------
> >
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> >
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> >
> >
> >
> >
> ========================================================================
> > ==========================
> >
> >
> >
> > with best regards
> >
> > from .........
> >
> > vikas r. khengare
> >
> > Veritas Software India Private Ltd.
> >
> > Symantec Corporation
> >
> > Pune, India
> >
> >
> >
> >                         [ Enjoy your life today....
> > because
> === message truncated ===
>
>
>
>
> ___________________________________________________________
> All New Yahoo! Mail – Tired of Vi@gr@! come-ons? Let our SpamGuard protect
> you. http://uk.docs.yahoo.com/nowyoucan.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message