lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Fieldable, AbstractField, Field
Date Wed, 19 Mar 2008 18:01:33 GMT
robert engels wrote:
> The problem with abstract classes, is that any methods you provide 
> "know" something of the implementation, unless the methods are 
> implemented solely by calling other abstract methods (which is rarely 
> the case if the abstract class contains ANY private members).

Yes, abstract classes should generally avoid private fields that don't 
have both setters and getters.

> This is possible because the interfaces were designed very well. You 
> MUST completely understand the problem domain in abstract terms in order 
> to define proper interfaces.

That works for static problem domains.  If Lucene is resolved to only 
make bugfix releases, and not to substantially evolve its feature set, 
then this might be appropriate.

> IndexReader and IndexWriter should have been interfaces. If they were, 
> lots of the code would not have been structured as it was, and many 
> problems people had in producing "other" implementations could have been 
> avoided.

The problem is not that they were not interfaces, but that they were not 
originally intended to be abstract and replaceable.  The original design 
was that indexing would be the primary implementation that Lucene 
provided, and that things around indexing would be extensible, but that 
indexing itself would not be.  Extensibility was retrofitted onto an 
existing design, and it still shows some.

If IndexReader and IndexWriter were originally written to be extensible 
it would have been foolish to implement them as interfaces given the 
amount that these have evolved.  Each release would have broken every 

> As for future expansion, it is improbable in most cases that adding new 
> abstract methods will work - if that is the case, they can easily be 
> added to a static utility class. If the API is really changing/adding, 
> it is easy to create 'interfaceV2 extends interfaceV1'. If the code 
> worked before, and you want to support backwards code compatibility 
> between versions, this is a fool proof way to accomplish it.

This is not foolproof.  Not all extension is the addition of new 
methods.  In Hadoop, for example, we wish to move from Mapper#map(key, 
value, reporter) to Mapper#map(MapContext), where MapContext has 
getKey(), getValue(), getReporter() and other methods.  If Mapper were 
an abstract class, back-compatibility would be easy, since we could 
provide a default implementation of Mapper#map(MapContext) that calls 
Mapper#map(key, value, reporter).  With interfaces things are much more 
complicated, since, for back-compatibility, we must support both 
versions of the interface for a time, dynamically determining what 
version of the interface the application has specified and calling it 
accordingly.  This is ugly code that we could have avoided if we'd stuck 
to abstract classes.  And the impact is not only where the Mapper is 
run, but also where it is specfied (JobConf).  So instead of localizing 
the change to, we have to add lots of runtime support and 
public API methods in other classes.  Yuck.

On the other hand, Hadoop's FileSystem is an abstract class.  It has 
evolved considerably and applications have been able to upgrade without 
pain.  Lucene's Directory has also evolved profitably without breaking 
external Directory implementations.

Interfaces look elegant, but looks can deceive.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message