lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maurits van Wijland" <m.vanwijl...@quicknet.nl>
Subject Re: Token retrieval question
Date Fri, 12 Oct 2001 07:32:26 GMT

Hi,

This is a nice discussion :)

> >
> Yes, I see that. One additional problem that I need to solve for my 
> application is that I need to map from stemmed forms of the terms to at 
> least one un-stemmed form. Ideally it would be all un-stemmed forms, but 
> I can live with the first one. I realize that Lucene does not ealisy 
> support this because of the separation of church and state (I mean the 
> term filtering prior to indexing and querying), but I still need this 
> functionality... So, the question is, is this going to be common enough 
> to add a concept of a TermDictionary to Lucene and provide methods to 
> access it on the IndexReader and IndexWriter? If not, I could implement 
> this externally, but then I would not be able to use the IO framework 
> and whole concept of directories. Also, since the Term numbers are going 
> to be euphemeral just like doc numbers, externally I would have to refer 
> to them by text, slowing dow the translation process, etc., etc., etc..

I think that this is common enough to be added to Lucene. Have a mapping 
between the stems and unstemmed items is very valuable.  It could be used
as an alternative method for inflections.

Maurits


Mime
View raw message