lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <>
Subject RE: incorrect OO in lucene source?
Date Tue, 20 Apr 2004 17:24:54 GMT
Please don't take my questions as criticism... I asked them "because" Lucene
is held up as an example of good OO design, and I was trying to reconcile my
knowledge of OO design with the Lucene developers.

It just "feels like" IndexReader and IndexWriter should be interfaces... My
Lucene IndexReader has no knowledge of directories (and doesn't need it),
yet a IDE looking at my class shows many available methods dealing with
directories, locks, etc. when none are appropriate (or useful).

It seems like IndexReader should be an interface, with AbstractIndexReader
providing some helper methods (potentially), and DirectoryIndexReader
providing the concrete path/directory related methods. Alternatively, a
AbstractIndex which implemented both, and then DirectoryIndex as the
concreate implementation - since the index reading and writing are so
coupled (in format). Then you could have CompoundDirectoryIndex, etc. ...

Even though my class is also an index writer, I could not implement it
(since I was already implementing IndexReader), so I ended up with 'outside'
methods to do the writing. I could have created a MyIndexWriter that
extended IndexWriter, but there is really no point, since none of the
methods in IndexWriter are useful.

As for the why my 'Filter' interface would make the world a better place...
there may be cases where the filter is rather sparse, but the document set
is quite large. The requirement to return a BitSet is a lot of overhead for
a large document set - when a an alternative direct filter (for example,
look up a document in the db, check some value, exclude document), would be
far more efficient. Actually if the filter domain is quite large, my
interface might be MUCH better because you would not have to read the entire
filter criteria, if only a few documents were being checked against it. The
reason this is important for us, is that we filter returned documents
through a security check, and the security check lends itself to dynamic
checking. It might be more appropriate if the interface was

boolean filter(Document document);

rather than

boolean filter(int docnum);

but this would require reading each document (stored in cache?), before
performing the filter.

Don't get me wrong, Lucene is a GREAT product !

Robert Engels

-----Original Message-----
From: Doug Cutting []
Sent: Tuesday, April 20, 2004 11:39 AM
To: Lucene Developers List
Subject: Re: incorrect OO in lucene source?

Robert Engels wrote:
> Lucene is often cited as an excellent example of OO design.

That is kind, but the primary goal of Lucene is to provide
functionality, not to use "correct" OO design.  The two are not always
in accord.

> It seems the abstract base classes were created so there was a place to
> the static 'helper' methods. Would it not be better to have separate
> interfaces and abstract base classes that the implementations could extend
> if necessary?

Functionally, the only thing an interface has that cannot be implemented
by a class is multiple inheritance.  Multiple inheritance is useful for
simple behaviour mixins, like Comparable and Cloneable, but is rarely
useful with more complex method sets.  So, in cases where every
implementation needs the same helper methods and multiple inheritance is
not needed, interfaces add nothing functionally except more code to be
maintained.  That said, an interface does have some documentation value.

> The main problem seems to be the use of IndexReader in many interfaces,
> it should be using a Searchable. Otherwise, what is the point of

Searchable was added to provide an API appropriate for remote
invocation, to support RemoteSearchable.

> The following object use chain shows the 'pointlessness' of Searchable:
> Searchable->Query->Weight->IndexReader

IndexReader is required by methods which implement search algorithms,
run in a potentially remote JVM.  Only the Query and TopDocs are
transmitted over the wire, so only these are handled by Searchable
methods.  Other, higher-bandwidth data is handled further downstream in
the call chain, and thus is processed locally on the remote machine.

> The Query object has the method:
> public Query rewrite(IndexReader reader) throws IOException...
> but the entire search package is based on 'Searchable's, should it not be
> public Query rewrite(Searchable searchable) throws IOException...
> If not, the entire idea of 'external' searchable implementations break,
> because the end up having to 'know' an IndexReader, and IndexReader is a
> baseclass, etc.

IndexReader has a much richer API than Searchable.  It is required for
rewrite, which can, e.g., iterate over all terms in the index.  Such
iteration is not appropriate over RPCs.

> The same goes for
> public Weight weight(Searcher searcher);
> should this not be a 'Searchable'?

I think this could be changed without harming anything.  Weights are
only ever constructed locally, so it wouldn't really make a difference.

> The Weight interface has the following:
> Scorer scorer(IndexReader reader) throws IOException;
> Should it not be
> Scorer scorer(Searchable searchable) throws IOException;
> The same goes for explain().

Again, IndexReader's API is required to implement these methods.
Searchable has only a very small API, that which can be efficiently
invoked remotely.

> It seems that Searchable was created because IndexReader was not an
> interface, but the methods have not been changed to use a Searchable
> of IndexReader.

No, Searchable and Searcher were created to provide the functionality of
MultiSearcher and RemoteSearchable.  There are search operations (like
HitCollector-based searching) which are defined by Searcher, not
Searchable, that may not be run remotely, but may be invoked locally
over multiple indexes.

> It seems many of the 'search' package classes use a 'Searcher', when they
> should be using a 'Searchable'. If these core classes really need the
> methods defined in 'Searcher', then these methods should be moved to
> 'Searchable'.

The Searchable interface should be restricted to methods which are
useful in implementing effective, remotely invokable search.

> Shouldn't 'Filter' just be an interface, with the method
> boolean filter(int docnum);
> as to whether or not the document should be filtered?

How would this make the world a better place?


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message