lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Did you mean...
Date Tue, 17 Feb 2004 15:13:42 GMT
On Feb 17, 2004, at 9:58 AM, lucene@nitwit.de wrote:
> On Tuesday 17 February 2004 15:18, Erik Hatcher wrote:
>> You would do them separately.  I'm not clear on what you are trying to
>> do.  The Analyzer does all this during indexing automatically for you,
>> but it sounds like you are just trying to emulate what an Analyzer
>> already does to extract words from text?
>
> I am still doing this:
>
> TokenStream in = analyzer.tokenStream("contents", new
> StringReader(reader.document(i).getField("contents").stringValue()));
>
> And I want to extract all words from all Fields.


The "words" (or "terms") are already in the index ready to be read very 
rapidly and accurately.  IndexReader is what you want to investigate if 
your fields are indexed.

Look into IndexReader and pull the terms directly rather than 
re-"analyzing" the text.  Provided "contents" was an indexed field, you 
could do something like this (taken from a mini-project I'm tinkering 
with right now):

   public String[] wordsThatStartWith(char c) throws IOException {
     String letter = new String("" + c).toLowerCase();

     ArrayList words = new ArrayList();
     if (reader == null) {
       reader = IndexReader.open(indexPath);
     }

     TermEnum terms = reader.terms(new Term("word", letter));
     while ("word".equals(terms.term().field())) {

       String word = terms.term().text();
       if (word.startsWith(letter)) {
         words.add(word);
       } else {
         break;
       }

       if (!terms.next()) {
         break;
       }
     }

     Collections.sort(words);

     String[] sortedWords = (String[]) words.toArray(new String[0]);

     return sortedWords;
   }

You'll need to do some adapting of this code to your environment and 
field(s), as what is here is designed to pull all the "word"'s that 
start with a specified letter.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message