lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <DCutt...@grandcentral.com>
Subject RE: Re : How does Lucene handle phrases containing words that are not indexed?
Date Thu, 14 Feb 2002 17:33:25 GMT
> From: Halácsy Péter [mailto:halacsy.peter@axelero.com]
> 
> I'd like to index documents that are described by keywords. 
> One document can have zero or more keywords and a keyword can 
> be related to one ore more documents. Assume two keywords:
> "human computer interaction"
> "computer science"
> 
> If I add these keywords to a documents in a field and one 
> search with query human science the document'll be found, 
> won't it? I could use - say - 16 distinct fields for the max 
> 16 keywords and translate the query keyword:"human science" 
> to keyword1:"human science" or keyword2:"human science" ... 
> keyword16:"human science" but this solution isn't prefered by me.

This sounds like a good case for an untokenized field.

When you index, use something like:

  Document doc = new Document();
  doc.add(Field.keyword("keyword", "computer science"));
  doc.add(Field.keyword("keyword", "human computer interaction"));
  ...
  indexReader.add(doc);

Then you can either add query keywords "manually":

  BooleanQuery query = (BooleanQuery)queryParser.parse("other terms",
analyzer);
  query.add(new TermQuery(new Term("keyword", "computer science")), true,
false);

or you can integrate this with the query parser by making an analyzer that
constructs terms for the field named "keyword" using exactly the provided
input:

  public class MyAnalyzer extends Analyzer {
    private Analyzer standard = new StandardAnalyzer();
    public TokenStream tokenStream(String field, final Reader reader) {
      if ("keyword".equals(field)) {
        return new CharTokenizer(reader) {
          protected boolean isTokenChar(char c) { return true; }
        };
      } else {
        return standard.tokenStream(field, reader);
      }
    }
  }

  Analyzer analyzer = new MyAnalyzer();
  Query query = queryParser.parse("keyword:\"computer science\"", analyzer);

I haven't tested the above code, but I hope you get the idea.

Doug


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message