Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 68654 invoked from network); 31 Oct 2008 13:08:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 Oct 2008 13:08:30 -0000 Received: (qmail 77356 invoked by uid 500); 31 Oct 2008 13:08:29 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 77334 invoked by uid 500); 31 Oct 2008 13:08:29 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 77323 invoked by uid 99); 31 Oct 2008 13:08:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Oct 2008 06:08:29 -0700 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=DNS_FROM_OPENWHOIS,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Oct 2008 13:07:12 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1KvtjE-0003Xp-59 for java-user@lucene.apache.org; Fri, 31 Oct 2008 06:07:52 -0700 Message-ID: <20265608.post@talk.nabble.com> Date: Fri, 31 Oct 2008 06:07:52 -0700 (PDT) From: Albert Juhe To: java-user@lucene.apache.org Subject: Re: wizard for search in Lucene In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Nabble-From: albertjuhe@gmail.com References: <19900220.post@talk.nabble.com> <20081009164925.GA22879@stud.ntnu.no> X-Virus-Checked: Checked by ClamAV on apache.org Hi, This is my first version, it isn't fast, because I want to get this information without modifying index. Now I'm working to improve it (including freeling). public String docsTerme(IndexReader reader, String terme) { String resultat =3D ""; TermPositions tP; ArrayList alDocs =3D new ArrayList(); long start =3D new Date().getTime(); int veinsTrobats =3D 0; //neightbours find it //Where is the term try { tP =3D reader.termPositions(new Term("contingut", terme)); //Documents where the term is found. while (tP.next()) { infoTerme it =3D new infoTerme(terme, tP.doc(), tP.freq()); resultat +=3D it.toString(); for (int i =3D 0; i < it.getFrequencia(); i++) { it.add(tP.nextPosition()); } alDocs.add(it); //we store: term, document id, positions resultat +=3D "(" + it.toStringPosicions() + ")
"; } } catch (IOException e) { System.out.println("Error trobant documents termes: " + e); return null; } //Terms in a document for (int i =3D 0; i < alDocs.size(); i++) { infoTerme iT =3D (infoTerme) alDocs.get(i); //We need term,id document and positions resultat +=3D "
" + iT.getId_document() + ":
"; //Id document try { TermFreqVector[] tfv =3D reader.getTermFreqVectors(iT.getId_document()); //All the terms found in a document int j =3D 0; String[] llistatTermes =3D tfv[j].getTerms(); int paraulesAnalitzades =3D 0; veinsTrobats =3D 0; while (veinsTrobats < iT.getFrequencia() && paraulesAnalitzades < llistatTermes.length) { resultat +=3D "," + llistatTermes[paraulesAnalitzades]; TermPositions termP =3D reader.termPositions(new Term("contingut", llistatTermes[paraulesAnalitzades]));//Documents on apareix el terme while (termP.next()) {=20 if (termP.doc() =3D=3D iT.getId_document()) { //The= word it's found in the same id document, maybe neightbours boolean veins =3D false; int ind =3D 0; while (!veins && ind < termP.freq()) { int posicio =3D termP.nextPosition(); if (iT.sonVeins(posicio)) { veins =3D true; resultat +=3D "
" + veinsTrobats + = "/" + iT.getFrequencia() + " They are neightbours (proximity 1):" + iT.getTerme() + " i " + llistatTermes[paraulesAnalitzades] + "(" + posicio = + ")
"; veinsTrobats++; } else { ind++; } } } } paraulesAnalitzades++; } } catch (IOException e) { System.out.println("Error I cant find terms: " + e); return null; } } long end =3D new Date().getTime(); resultat +=3D "
Time elapsed: " + (end - start) + "ms"; return resultat; } http://www.nabble.com/file/p20265608/infoTerme.java infoTerme.java=20 thank you, Albert Aleksander M. Stensby wrote: >=20 > From what I can understand, you want to insert the word "history" and > then =20 > get proposed "related" terms in combination with your input query. > In essense this would be to do a "look-up" on top-terms in the subset of = =20 > documents matching the initial query "history". Exactly how you could do = =20 > this is a bit uncertain from my knowledge, but I suggest you read up on = =20 > term-frequency and the tf-idf scheme. >=20 > Also: take a look at the org.apache.lucene.search.similar package: > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apach= e/lucene/search/similar/package-summary.html > and read the motivation email listed in the first segment of > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apach= e/lucene/search/similar/MoreLikeThis.html >=20 > I couldn't really see how you would autocomplete after the word history = =20 > without listing a bunch of un-interesting terms as suggestions... But i = =20 > might be wrong... Of course, if it was autocompletion you were looking = =20 > for=C2=B8 Asbj=C3=B8rn answered that one just fine:) >=20 > Best regards, > Aleksander M. Stensby >=20 >=20 > On Thu, 09 Oct 2008 18:49:26 +0200, Asbj=C3=B8rn A. Fellinghaug =20 > wrote: >=20 >> Albert Juhe: >>> >>> Hi, >>> >>> I want to make a wizard that can help to find n-grams terms. >>> For example: >>> If i want to search History, after write it the system propose you the >>> following searches: >>> history europe >>> history spain >>> history ..... >>> Consulting the terms indexed. >>> >>> Does it exits in Lucene? >> >> Hi. >> >> I interpret your question in such a way that you want autocompletion in >> your search system? In that case, I believe there are some Analyzer's >> which does this in the 'contrib' package. Also, I've created an Analyzer >> which creates "bigrams" (n-gram of size 2) in my master thesis. >> Feel free to download it from this page: >> http://asbjorn.fellinghaug.com/blog/2008/08/the-code-for-my-master-thesi= s/ >> >> Also, have a look at the package org.apache.lucene.analysis.ngram: >> http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/analysis/ngram= /package-summary.html >> >=20 >=20 >=20 > --=20 > Aleksander M. Stensby > Senior Software Developer > Integrasco A/S > +47 41 22 82 72 > aleksander.stensby@integrasco.no >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org >=20 >=20 >=20 --=20 View this message in context: http://www.nabble.com/wizard-for-search-in-Lu= cene-tp19900220p20265608.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org