lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Albert Juhe <albertj...@gmail.com>
Subject Re: wizard for search in Lucene
Date Fri, 31 Oct 2008 13:07:52 GMT

Hi,

This is my first version, it isn't fast, because I want to get this
information without modifying index.
Now I'm working to improve it (including freeling).

public String docsTerme(IndexReader reader, String terme) {
        String resultat = "";
        TermPositions tP;
        ArrayList alDocs = new ArrayList();
        long start = new Date().getTime();
        int veinsTrobats = 0; //neightbours find it

        //Where is the term
        try {
            tP = reader.termPositions(new Term("contingut", terme));
//Documents where the term is found.
            while (tP.next()) {
                infoTerme it = new infoTerme(terme, tP.doc(), tP.freq());
                resultat += it.toString();
                for (int i = 0; i < it.getFrequencia(); i++) {
                    it.add(tP.nextPosition());
                }
                alDocs.add(it); //we store: term, document id, positions
                resultat += "(" + it.toStringPosicions() + ")<br/>";
            }

        } catch (IOException e) {
            System.out.println("Error trobant documents termes: " + e);
            return null;
        }

        //Terms in a document
        for (int i = 0; i < alDocs.size(); i++) {
            infoTerme iT = (infoTerme) alDocs.get(i); //We need term,id
document and positions
            resultat += "<br/>" + iT.getId_document() + ":<br/>"; //Id
document
            try {
                TermFreqVector[] tfv =
reader.getTermFreqVectors(iT.getId_document()); //All the terms found in a
document
                int j = 0;
                String[] llistatTermes = tfv[j].getTerms();
                int paraulesAnalitzades = 0;
                veinsTrobats = 0;
                while (veinsTrobats < iT.getFrequencia() &&
paraulesAnalitzades < llistatTermes.length) {
                    resultat += "," + llistatTermes[paraulesAnalitzades];
                    TermPositions termP = reader.termPositions(new
Term("contingut", llistatTermes[paraulesAnalitzades]));//Documents on
apareix el terme
                    while (termP.next()) { 
                        if (termP.doc() == iT.getId_document()) { //The word
it's found in the same id document, maybe neightbours
                            boolean veins = false;
                            int ind = 0;
                            while (!veins && ind < termP.freq()) {
                                int posicio = termP.nextPosition();
                                if (iT.sonVeins(posicio)) {
                                    veins = true;
                                    resultat += "<br/>" + veinsTrobats + "/"
+ iT.getFrequencia() + " They are neightbours (proximity 1):" +
iT.getTerme() + " i " + llistatTermes[paraulesAnalitzades] + "(" + posicio +
")<br/>";
                                    veinsTrobats++;
                                } else {
                                    ind++;
                                }
                            }
                        }
                    }
                    paraulesAnalitzades++;
                }

            } catch (IOException e) {
                System.out.println("Error I cant find terms: " + e);
                return null;
            }
        }
        long end = new Date().getTime();
        resultat += "<br/>Time elapsed: " + (end - start) + "ms";
        return resultat;
    }
http://www.nabble.com/file/p20265608/infoTerme.java infoTerme.java 

thank you,
Albert


Aleksander M. Stensby wrote:
> 
>  From what I can understand, you want to insert the word "history" and
> then  
> get proposed "related" terms in combination with your input query.
> In essense this would be to do a "look-up" on top-terms in the subset of  
> documents matching the initial query "history". Exactly how you could do  
> this is a bit uncertain from my knowledge, but I suggest you read up on  
> term-frequency and the tf-idf scheme.
> 
> Also: take a look at the org.apache.lucene.search.similar package:
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/similar/package-summary.html
> and read the motivation email listed in the first segment of
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/similar/MoreLikeThis.html
> 
> I couldn't really see how you would autocomplete after the word history  
> without listing a bunch of un-interesting terms as suggestions... But i  
> might be wrong... Of course, if it was autocompletion you were looking  
> for¸ Asbjørn answered that one just fine:)
> 
> Best regards,
>   Aleksander M. Stensby
> 
> 
> On Thu, 09 Oct 2008 18:49:26 +0200, Asbjørn A. Fellinghaug  
> <asbjorn@fellinghaug.com> wrote:
> 
>> Albert Juhe:
>>>
>>> Hi,
>>>
>>> I want to make a wizard that can help to find n-grams terms.
>>> For example:
>>> If i want to search History, after write it the system propose you the
>>> following searches:
>>> history europe
>>> history spain
>>> history .....
>>> Consulting the terms indexed.
>>>
>>> Does it exits in Lucene?
>>
>> Hi.
>>
>> I interpret your question in such a way that you want autocompletion in
>> your search system? In that case, I believe there are some Analyzer's
>> which does this in the 'contrib' package. Also, I've created an Analyzer
>> which creates "bigrams" (n-gram of size 2) in my master thesis.
>> Feel free to download it from this page:
>> http://asbjorn.fellinghaug.com/blog/2008/08/the-code-for-my-master-thesis/
>>
>> Also, have a look at the package org.apache.lucene.analysis.ngram:
>> http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/analysis/ngram/package-summary.html
>>
> 
> 
> 
> -- 
> Aleksander M. Stensby
> Senior Software Developer
> Integrasco A/S
> +47 41 22 82 72
> aleksander.stensby@integrasco.no
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/wizard-for-search-in-Lucene-tp19900220p20265608.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message