lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Armbrust, Daniel C." <>
Subject Type information on Tokens?
Date Thu, 24 Apr 2003 17:09:12 GMT
If I wanted to build an index where all of the words were tagged with part of speech information,
its seems that the type field of the Token would be the place to put this.

But, as I understand it, lucene does not keep track of the type fields that are assigned during
tokenizing, and therefore doesn't use them while searching.

How could I go about keeping track of part of speech information in my index?  

So far, I can only think of two ways to accomplish this, 1, is to build it into my tokens,
i.e. my tokens would look something like "<noun>patient".  I'm afraid there may be some
pit-falls with this approach that I haven't identified yet, however, since I haven't tried
it out.

Or, I could make lucene use the type field in its index.  But, would I be correct in assuming
this would not be a trivial change?  I have looked over the source a bit, but I don't yet
have a full grasp of how hits are found and scored.  



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message