lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: uniqueWords, and termDocs
Date Tue, 24 Jun 2008 14:42:24 GMT
I have this uncommitted class locally (forgot its origins), which you'll like:

$ svn st
?      contrib/miscellaneous/src/java/org/apache/lucene/misc/AllTerms.java


Slap the package statement and add imports and you have it.  Read this into some data structure
and pick random terms from there.

/**
 * <code>AllTerms</code> class extracts terms and their frequencies out
 * of an existing Lucene index.
 *
 * @version $Id: HighFreqTerms.java 376393 2006-02-09 19:17:14Z otis $
 */
public class AllTerms {

  public static void main(String[] args) throws Exception {
    IndexReader reader = null;
    String field = null;
    if (args.length == 1) {
      reader = IndexReader.open(args[0]);
    } else if (args.length == 2) {
      reader = IndexReader.open(args[0]);
      field = args[1];
    } else {
      usage();
      System.exit(1);
    }

    TermEnum terms = reader.terms();

    if (field != null) { 
      while (terms.next()) {
        if (terms.term().field().equals(field)) {
          System.out.println(terms.term() + ": " + terms.docFreq());
        }
      }
    }
    else {
      while (terms.next()) {
        System.out.println(terms.term() + ": " + terms.docFreq());
      }
    }

    reader.close();
  }

  private static void usage() {
    System.out.println(
         "\n\n"
         + "java org.apache.lucene.misc.AllTerms <index dir> [field]\n\n");
  }
}
 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Erick Erickson <erickerickson@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Tuesday, June 24, 2008 9:26:03 AM
> Subject: Re: uniqueWords, and termDocs
> 
> Isn't asking for unique words (actually tokens) equivalent to enumerating
> all the terms in a field?
> 
> I have no idea how to select a random word. Seems like you'd have to
> somehow use a TermEnum, but I don't think there's anything built in.
> 
> Best
> Erick
> 
> On Mon, Jun 23, 2008 at 6:03 PM, Cam Bazz wrote:
> 
> > Hello,
> >
> > I need to be able to select a random word out of all the words in my index.
> > how can I do this tru termDocs() ?
> >
> > Also, I need to get a list of unique words as well. Is there a way to ask
> > this to lucene?
> >
> > Best Regards,
> > -C.B.
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message