lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierrick Brihaye <pierrick.brih...@free.fr>
Subject Re: token type question
Date Sat, 16 Apr 2005 06:31:32 GMT
ethandev@comcast.net a écrit :

> I am working on a program to index/search chemical element/compound. Say I write an analyzer
to filter out chemical terms, such as H2O. I noticed that I can specify a tocken's type. Can
I construct a token as 
> new Token ("H2", start, end, "chem");
> 
> My questions is
> How do I search all the tokens with "chem" type token, such as H2O, O2, etc? Any sample
like this? 
> 
> If this approach doesn't work, what's the best approach?

You may assign a type to the tokens, and then you may filter them 
according to their type *but* the index forgets this info since it 
stores *terms* (field/value pairs).

Compare :
http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Token.html
and
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/Term.html

Notice however that the terms have also their relative position (the 
Token's positionIncrement, default = 1) stored in the index ; this 
allows proximity searches.

So... how to do ?

1) use a dedicated field "chem" where only chemical content is allowed 
(filter out every token whose type is different from "chem")
2) manipulate your termText : "chem_H2" ; the same for your queries
3) play with the query rather than with the index content : filter out 
what is not chemical

There may be other solutions...

Cheers,

p.b.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message