lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <ansh...@gmail.com>
Subject Re: Please Help
Date Fri, 21 Jan 2011 05:46:48 GMT
Hi,
You could just try the following code to print the term freq for individual
terms.
************************
public static void printTermFreq(String indexPath) throws
CorruptIndexException, IOException{
    IndexReader ir = IndexReader.open(new NIOFSDirectory(new
File(indexPath)));
    TermEnum te = ir.terms();
    while(te.next()){
      System.out.println(te.term().text()+":"+te.docFreq());
    }
    ir.close();
  }
*************************

For phrases, you'd have to build/use a different tokenizer that tokenizes
the input text into those phrases (stored as a single term). Something of an
ngram, and then treat those phrases at terms.
Doing it at runtime would not be a feasible option.


--
Anshum Gupta
http://ai-cafe.blogspot.com


On Thu, Jan 20, 2011 at 3:30 PM, Ashish Pancholi <apancholi@chambal.com>wrote:

>
> Using Lucene_3.0.3. we would like to implement following:
> The number of occurrences of the term in the entire index.
>      For Example :
>      If we have indexed following text : amazon, amazon s3, amazon
> simpledb, amazon aws;
>      Then we are supposed to get results :
>
>            amazon    4
>            s3           1
>            simpledb   1
>            aws         1
>
> Showing phrases by combining the words, in the same sequence as they were
> in
> original text, before Indexing.
>
>      Lets suppose, we have indexed following text : amazon, amazon s3,
> amazon simpledb, amazon aws;
>      And for two word pharses we are supposed to get results
>
>                amazon amazon
>                amazon s3
>                s3 amazon
>                amazon simpledb
>                simpledb amazon
>                amazon aws
>
>      And for three word pharses we are supposed to get results :
>
>               amazon amazon s3
>               amazon s3 amazon
>               s3 amazon simpledb
>               amazon simpledb amazon
>               simpledb amazon aws
>
> Any help will be appreciated.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Please-Help-tp2293832p2293832.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message