lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Friedman, Eric" <e...@ConveySoftware.com>
Subject RE: Adding a TermExpansionQuery
Date Wed, 15 May 2002 16:18:51 GMT
Hi Peter,

My preference would be for Lucene to have configuration APIs but for the
source of the configuration data to be left up to developers.

My opinion about the storing non-index information in the index is that we
shouldn't do it. At various points, the lucene docs say that one way to
accomodate certain concurrency scenarios is to build a directory off in a
corner and then swap it with the original.  It would be unfortunate if such
techniques required developers (or the SDK itself) to copy data that doesn't
change from index to index.

My 2 cents,
Eric

> -----Original Message-----
> From: Peter Carlson [mailto:carlson@bookandhammer.com]
> Sent: Wednesday, May 15, 2002 7:07 AM
> To: Lucene Developers List
> Subject: Re: Adding a TermExpansionQuery
> 
> 
> Hi Eric,
> 
> Thanks for the feedback. My intention was to abstract the 
> source, but one of
> my questions was, does Lucene set a configuration file which 
> will use this
> "Thesaurus" query, or will that have to be setup manually by 
> the developer.
> 
> Currently, Lucene does not provide a configuration file.
> 
> As far as if the information is in the index directory. I was 
> thinking this
> might be a nice place for this information to exist, then it 
> doesn't add any
> other overhead to the system (i.e. No configuration file) and might be
> easier to support multiple sources since the index has already been
> abstracted. If you wanted to share the "Thesaurus" across 
> many different
> indices you could "copy" or "merge" that index component into the data
> source. This could even be part of the build process for a 
> file system.
> 
> --Peter
> 
> On 5/15/02 6:45 AM, "Eric D. Friedman" 
> <eric@conveysoftware.com> wrote:
> 
> > Whichever storage mechanism you choose, you should be sure 
> to abstract its
> > interface so that people can make other choices.  With that 
> out of the way,
> > it doesn't matter too much whether you pick a properties 
> file or an XML
> > file.
> > 
> > That said, I wouldn't expect to find this data stored in the index
> > directory, since it's not part of the index and since users 
> may want to
> > share the data across several indices.  I would also lean toward the
> > XML file (for a file solution, that is -- an RDBMS should 
> be supported
> > too), since that lends itself more naturally to describing 
> one-to-many
> > relations than a properties file does.
> > 
> > Personal opinion: "Thesaurus" is a more descriptive term than
> > "TermExpansion." To me, term expansion suggests some kind of text
> > globbing, whereas a thesaurus is a reference (a "lookup table") that
> > provides *semantic* expansions of the kind you describe.  Oracle's
> > intermedia indexing engine has thesaurus features similar 
> to what you
> > describe and calls them by that name.
> 
> 
> --
> To unsubscribe, e-mail:   
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message