lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierrick Brihaye <pierrick.brih...@culture.gouv.fr>
Subject Re: token type question
Date Fri, 22 Apr 2005 07:36:32 GMT
Hi,

ethandev@comcast.net a écrit :

> Thanks Pierrick.
> 
> Are you say that I should construct Token in analyzer like
> new Token ("chem_H2O", 100, 103, "chem");
> 
> note that chem_ is added prefix to H2O, and 100 to 103 is length of H2O rather than chem_H2O?

Well... 100 to 103 are offsets provided by the reader (an are thus 
usually offsets in the source file). These offsets may help you to make 
some computations but they are lost when the token is indexed.

> I want to index H2O in a compound, say H2O-CH2. say I want a query to find out H2O in
a compound. How can I do that?

Dirty solution : use wildcard queries (chem_H20-*).

Smart solution :

consider that you have "words", i.e. chem_H2O followed by chem_CH2 
(strange enough ;-)... and make some phrase queries or 
MultiPhraseQueries. See :

http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/test/org/apache/lucene/search/TestPhraseQuery.java?rev=150740&view=markup
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/test/org/apache/lucene/search/TestMultiPhraseQuery.java?rev=150733&view=markup

Setting the phrase slop may help you.

You may also want to play with the position of the tokens to 
allow/prevent hits from your PhraseQuery. Give your Tokens a relevant 
positionIncrement. See : 
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/test/org/apache/lucene/search/TestPositionIncrement.java?rev=150585&view=markup

Cheers,

-- 
Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne
mailto:pierrick.brihaye@culture.gouv.fr
+33 (0)2 99 29 67 78

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message