lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Weiwei Wang <ww.wang...@gmail.com>
Subject Re: Need help regarding implementation of autosuggest using jquery
Date Tue, 01 Dec 2009 22:23:33 GMT
Hi, dudes,

I finished an search suggestion module a few months ago.

The framework is as below:
1. Log all your search keywords or retrieve all your segmented terms which
can be searched
2. Index all the keywords and or terms with N-Gram tech
3. Search this index with a same analyzer based on user input

You can find a example in lucene contribute materials: SpellCheck or you can
find the source code of our project here:
http://code.google.com/p/askrosa/source/browse/trunk/RosaCrawler/src/autocomplete/AutoCompleter.java
This code is not really up to date, the parameters in the EdgeNGramTokenFilter
should be 2 and 5 or some values not too small and not too large. I
recommend you read some thing about N-Gram(don't worry, it's very easy to
understand)

You can do any modifying to the code and redeploy.

Hope you will enjoy the search tools you are building.

On Wed, Dec 2, 2009 at 6:09 AM, Otis Gospodnetic <otis_gospodnetic@yahoo.com
> wrote:

> Hi,
>
> Have a look at http://www.sematext.com/products/autocomplete/index.html
>
> It handles Chinese and large volumes of data.
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
> ----- Original Message ----
> > From: fulin tang <tangfulin@gmail.com>
> > To: java-user@lucene.apache.org
> > Sent: Thu, November 26, 2009 9:10:41 PM
> > Subject: Re: Need help regarding implementation of autosuggest using
> jquery
> >
> > By the way , we search Chinese words, so Trie tree looks not perfect
> > for us either
> >
> >
> > 2009/11/27 fulin tang :
> > > We have the same needs in our music search, and we found this is not a
> > > good approach for performance reason .
> > >
> > > Did any one have experience of implement the autosuggestion in a heavy
> > > product environment ?
> > > Any suggestions ?
> > >
> > >
> > > 2009/11/26 Anshum :
> > >> Try this,
> > >> Change the code as required:
> > >> ---------
> > >>
> > >>
> > >> import java.io.IOException;
> > >>
> > >> import org.apache.lucene.index.CorruptIndexException;
> > >> import org.apache.lucene.index.IndexReader;
> > >> import org.apache.lucene.index.Term;
> > >> import org.apache.lucene.index.TermEnum;
> > >>
> > >> /**
> > >>  * @author anshum
> > >>  *
> > >>  */
> > >> public class GetTermsToSuggest {
> > >>
> > >> private static void getTerms(String inputText) {
> > >> IndexReader reader = null;
> > >>  try {
> > >> reader = IndexReader.open("/home/anshum/index/testindex");
> > >>  String field = "fieldname";
> > >> field = field.intern();
> > >> TermEnum tenum = reader.terms(new Term("fieldname", ""));
> > >>  Boolean hasRun = false;
> > >> try {
> > >> do {
> > >>  final Term term = tenum.term();
> > >> if (term == null || term.field() != field)
> > >>  break;
> > >> final String termText = term.text();
> > >> if (termText.startsWith(inputText)) {
> > >>  System.out.println(termText);
> > >> hasRun = true;
> > >> } else if (hasRun == true)
> > >>  break;
> > >> } while (tenum.next());
> > >> tenum.close();
> > >>  } catch (IOException e) {
> > >> e.printStackTrace();
> > >> }
> > >>  } catch (CorruptIndexException e2) {
> > >> e2.printStackTrace();
> > >> } catch (IOException e2) {
> > >>  e2.printStackTrace();
> > >> }
> > >>
> > >> }
> > >>
> > >> /**
> > >>  * @param args
> > >>  */
> > >>  public static void main(String[] args) {
> > >> GetTermsToSuggest.getTerms(args[0]);
> > >>  }
> > >> }
> > >>
> > >>
> > >> --
> > >> Anshum Gupta
> > >> Naukri Labs!
> > >> http://ai-cafe.blogspot.com
> > >>
> > >> The facts expressed here belong to everybody, the opinions to me. The
> > >> distinction is yours to draw............
> > >>
> > >>
> > >> On Thu, Nov 26, 2009 at 3:19 PM, Uwe Schindler wrote:
> > >>
> > >>> You can fix this if you just create the initial term not with "",
> instead
> > >>> with your prefix:
> > >>> TermEnum tenum = reader.terms(new Term(field,prefix));
> > >>>
> > >>> And inside the while loop just break out,
> > >>>
> > >>> if (!termText.startsWith(prefix)) break;
> > >>>
> > >>> -----
> > >>> Uwe Schindler
> > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >>> http://www.thetaphi.de
> > >>> eMail: uwe@thetaphi.de
> > >>>
> > >>>
> > >>> > -----Original Message-----
> > >>> > From: DHIVYA M [mailto:dhivyakrishnan87@yahoo.com]
> > >>> > Sent: Thursday, November 26, 2009 10:39 AM
> > >>> > To: java-user@lucene.apache.org
> > >>> > Subject: RE: Need help regarding implementation of autosuggest
> using
> > >>> > jquery
> > >>> >
> > >>> > Sir,
> > >>> >
> > >>> > Your suggestion was fantastic.
> > >>> >
> > >>> > I tried the below mentioned code but it is showing me the entire
> result
> > >>> of
> > >>> > indexed words starting from the letter that i give as input.
> > >>> > Ex:
> > >>> > if i give "fo"
> > >>> > am getting all the indexes from the word starting with fo upto
> words
> > >>> > starting with z.
> > >>> > i.e. it starts displaying from the word matching the search word
> and ends
> > >>> > up with the last word available in the index file.
> > >>> >
> > >>> > Kindly suggest me a solution for this problem
> > >>> >
> > >>> > Thanks in advance,
> > >>> > Dhivya
> > >>> >
> > >>> > --- On Wed, 25/11/09, Uwe Schindler wrote:
> > >>> >
> > >>> >
> > >>> > From: Uwe Schindler
> > >>> > Subject: RE: Need help regarding implementation of autosuggest
> using
> > >>> > jquery
> > >>> > To: java-user@lucene.apache.org
> > >>> > Date: Wednesday, 25 November, 2009, 9:54 AM
> > >>> >
> > >>> >
> > >>> > Hi Dhivya,
> > >>> >
> > >>> > you can iterate all terms in the index using a TermEnum, that
can
> be
> > >>> > retrieved using IndexReader.terms(Term startTerm).
> > >>> >
> > >>> > If you are interested in all terms from a specific field, position
> the
> > >>> > TermEnum on the first possible term in this field ("") and iterate
> until
> > >>> > the
> > >>> > field name changes. As terms in the TermEnum are first ordered
by
> field
> > >>> > name
> > >>> > then by term text (in UTF-16 order), the loop would look like
this:
> > >>> >
> > >>> > IndexReader reader = ...
> > >>> > String field = ....
> > >>> > Field = field.intern(); // important for the while loop
> > >>> > TermEnum tenum = reader.terms(new Term(field,""));
> > >>> > try {
> > >>> >     do {
> > >>> >         final Term term = tenum.term();
> > >>> >         if (term==null || term.field()!=field) break;
> > >>> >         final String termText = term.text();
> > >>> >         // do something with the termText
> > >>> >     } while (tenum.next());
> > >>> > } finally {
> > >>> >     tenum.close();
> > >>> > }
> > >>> >
> > >>> >
> > >>> > -----
> > >>> > Uwe Schindler
> > >>> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > >>> > http://www.thetaphi.de
> > >>> > eMail: uwe@thetaphi.de
> > >>> >
> > >>> >
> > >>> > > -----Original Message-----
> > >>> > > From: DHIVYA M [mailto:dhivyakrishnan87@yahoo.com]
> > >>> > > Sent: Wednesday, November 25, 2009 8:06 AM
> > >>> > > To: java user
> > >>> > > Subject: Need help regarding implementation of autosuggest
using
> jquery
> > >>> > >
> > >>> > > Hi all,
> > >>> > >
> > >>> > > Am using lucene 2.3.2 as a search engine in my e-paper site.
So
> that i
> > >>> > > want the user to search the news. I achieved that objective
but
> now am
> > >>> > > trying to implement autosuggest so that user can pick a choice
> from the
> > >>> > > drop down and no need of typing in the entire sentence or
so.
> > >>> > >
> > >>> > > I have download Jquery for this purpose and am trying to
> implement it.
> > >>> > > The collections of data to refer for the suggestion is given
in
> an
> > >>> > > arraylist or jus with in a string.
> > >>> > >
> > >>> > > But for my application, i need to populate the suggestions
with
> the
> > >>> > > indexed words available in the index file created during
indexing
> > >>> > > operation.
> > >>> > >
> > >>> > > Can anyone give an idea to read the contents from the index
file
> and
> > >>> > make
> > >>> > > it available as suggestions? or anyother idea to achieve
this
> > >>> objective?
> > >>> > >
> > >>> > > Thanks in advance,
> > >>> > > Dhivya
> > >>> > >
> > >>> > >
> > >>> > >       The INTERNET now has a personality. YOURS! See your
Yahoo!
> > >>> > Homepage.
> > >>> > > http://in.yahoo.com/
> > >>> >
> > >>> >
> > >>> >
> ---------------------------------------------------------------------
> > >>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >>> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> >       The INTERNET now has a personality. YOURS! See your Yahoo!
> > >>> Homepage.
> > >>> > http://in.yahoo.com/
> > >>>
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>>
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > 梦的开始挣扎于城市的边缘
> > > 心的远方执着在脚步的瞬间
> > > 我的宿命埋藏了寂寞的永远
> > >
> >
> >
> >
> > --
> > 梦的开始挣扎于城市的边缘
> > 心的远方执着在脚步的瞬间
> > 我的宿命埋藏了寂寞的永远
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message