lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nadav Har'El" <>
Subject Re: [jira] Created: (LUCENE-1439) Inconsistent API
Date Tue, 11 Nov 2008 18:55:25 GMT
Hi Ivan,

On Fri, Nov 07, 2008, Ivan.S (JIRA) wrote about "[jira] Created: (LUCENE-1439) Inconsistent
> The API of Lucene is totally inconsistent:

I think this statement is a bit too harsh. I have experience with several
search engine APIs, and in many areas, in my opinion Lucene's is the
cleanest one. Of course, there are undoubtably also things that should be
improved in Lucene's APIs, and these can be discussed and done.

Especially now that in Lucene 3.0 we will be able to (?) change APIs
without backward-compatability being required, these kind of issues should
be delt with.

> 1)
> There are a lot of containers which don't implement an interface which indicates this
> (for pre-java-1.5 Lucene it could be Collection, for post-ajva-1.5 Lucene it could be
more general Iterable)
> Example:
>  IndexSearcher: "int maxDoc()" and "doc(int i)"

This is not as simple as it sounds, I think. Like you said yourself, before
Java 1.5, the "Iterable" interface did not exist. The Collection interface
(of Java 1.4) is way too broad to be used in this context, because it has
*writing* methods like add(), clear() - what are these supposed to do in the
IndexReader class, for example? And what about the contains() method?
remove()? toArray()? No, I would not like to see IndexReader (for example)
implement Collection.

That being said, I would personally like to see Lucene move to Java 1.5
as soon as possible, and a serious effort undertaken to beautify Lucene's
API and also implementation using new Java 1.5 features. You're right - one
of these would be to use Iterable more.

A wish of mine related to your Iterable wish is that Lucene stops using
"String" almost everywhere, and start using the CharSequence interface.
This can be done even in Java 1.4, because CharSequence exists in 1.4
(but did not in 1.3). The only reason I see not to do this change is if
we determine that it hurts performance (I didn't test, but I doubt it).

> 2)
> There are a lot of classes having non-final public accessible fields.

Can you point us to examples? Maybe we can fix some of these even now?

> 3)
> Some methods which return values are named something() others are named getSomething()
> Best one is: Fieldable:
> without get: String stringValue(), Reader readerValue(), byte[] binaryValue(), ...
> with get: byte[] getBinaryValue(), int getBinaryLength(), ...

Maybe you have a valid point here, but you didn't give a good example.
There are both
	byte[] binaryValue
	byte[] getBinaryValue
and they do different things: getBinaryValue() returns the raw array stored
in the Fieldable, but not all of this array is relevant, and you need to
use getBinaryLength() and getBinaryOffset() to determine what is relevant.
All these methods are called "get*" because they only get a field which
already exist.
On the other hand, binaryValue() does something different - if I understand
correctly, it may need may need to do array copying to get a byte[] which
it can return.

So this API is not at all inconsistent - maybe it is just a bit redundant
and a bit confusing or not documented well enough (although I don't think
the latter is true).


Nadav Har'El                        |    Tuesday, Nov 11 2008, 14 Heshvan 5769
IBM Haifa Research Lab              |-----------------------------------------
                                    |Cats aren't clean, they're just covered           |with cat spit.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message