lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ethan...@comcast.net
Subject Re: token type question
Date Fri, 22 Apr 2005 02:49:04 GMT
Thanks Pierrick.

Are you say that I should construct Token in analyzer like
new Token ("chem_H2O", 100, 103, "chem");

note that chem_ is added prefix to H2O, and 100 to 103 is length of H2O rather than chem_H2O?

I also have some further problem and not sure if can be solved by this approch.

I want to index H2O in a compound, say H2O-CH2. say I want a query to find out H2O in a compound.
How can I do that?

Thanks,
Ethan

-------------- Original message -------------- 

> ethandev@comcast.net a écrit : 
> 
> > I am working on a program to index/search chemical element/compound. Say I 
> write an analyzer to filter out chemical terms, such as H2O. I noticed that I 
> can specify a tocken's type. Can I construct a token as 
> > new Token ("H2", start, end, "chem"); 
> > 
> > My questions is 
> > How do I search all the tokens with "chem" type token, such as H2O, O2, etc? 
> Any sample like this? 
> > 
> > If this approach doesn't work, what's the best approach? 
> 
> You may assign a type to the tokens, and then you may filter them 
> according to their type *but* the index forgets this info since it 
> stores *terms* (field/value pairs). 
> 
> Compare : 
> http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Token.html 
> and 
> http://lucene.apache.org/java/docs/api/org/apache/lucene/index/Term.html 
> 
> Notice however that the terms have also their relative position (the 
> Token's positionIncrement, default = 1) stored in the index ; this 
> allows proximity searches. 
> 
> So... how to do ? 
> 
> 1) use a dedicated field "chem" where only chemical content is allowed 
> (filter out every token whose type is different from "chem") 
> 2) manipulate your termText : "chem_H2" ; the same for your queries 
> 3) play with the query rather than with the index content : filter out 
> what is not chemical 
> 
> There may be other solutions... 
> 
> Cheers, 
> 
> p.b. 
> 
> 
> 
> --------------------------------------------------------------------- 
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org 
> For additional commands, e-mail: java-user-help@lucene.apache.org 
> 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message