lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: Lucene 2.9 and deprecated IR.open() methods
Date Mon, 05 Oct 2009 07:35:41 GMT
On Mon, Oct 05, 2009 at 08:27:20AM +0200, Uwe Schindler wrote:

> Pass a Properties or Map<String,?> to the ctor/open. The keys are predefined
> constants. Maybe our previous idea of an IndexConfiguration class is a
> subclass of HashMap<String,?> with all the constants and some easy-to-use
> setter methods for very often-used settings (like index dir) and some
> reasonable defaults.

Interesting.  The design we worked out for Lucy's Segment class (prototype in
KS devel branch) uses hash/array/string data to store arbitrary metadata on
behalf of segment components, written as JSON to seg_NNN/segmeta.json.  In
that case, though, each component is responsible for generating and consuming
its own data.  That's different from having the user supply data via such a
format.

I still think you're going to want an extensible builder class.

> This allows us to pass these properties to any flex indexing component
> without need to modify/extend it to support the additional properties. The
> flexible indexing component just defines its own property names (e.g. as
> URNs, URLs, using its class name as prefix,...). 

But how do you determine what the flex indexing components *are*?  In theory,
you can pass class names and sufficient arguments to build them up via your
big ball of data, but then you're essentially creating a new language, with
all the headaches that entails. 

In KS, Indexer/IndexReader configuration is divided between three classes.

  * Schema: field definitions.
  * Architecture: Settings that never change for the life of the index.
  * IndexManager: Settings that can change per index/search session.

Schema isn't worth discussing -- Lucy will have it, Lucene won't, end of
story.  Architecture and IndexManager, though, are fairly close to what's
being discussed.

Architecture is responsible for e.g. determining which plugabble components
get registered.  It's the builder class.

IndexManager is where things like merging and locking policies reside.

> Property names are always String, values any type (therefore Map<String,?>).
> With Java 5, integer props and so on are no "bad syntax" problem because of
> autoboxing (no need to pass new Integer() or Integer.valueOf()).

Argument validation gets to be a headache when you pass around complex data
structures.  It's doable, but messy and hard to grok.  Going through dedicated
methods is cleaner and safer.

> Another good thing is, that implementors of e.g. XML config files like in
> Solr, can simple pass all elements in config to this map.

I go back and forth on this.  At some point, the volume of data becomes
overwhelming and it becomes easier to swap in the name of a class where most
of the data can reside in nice, reliable, structured code.

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message