lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Time for a cleaner API?
Date Thu, 09 Aug 2007 15:51:39 GMT
On 8/9/07, Jonathan Woods <jonathan.woods@scintillance.com> wrote:
> I'm moving a content management system's Lucene library over to Solr to reap
> the benefits, but along the way I'm meeting some problems which I imagine
> affect everyone doing the same kind of thing.
>
> I realise that Solr began life as something which was primarily designed to
> be used over HTTP or with non-Java clients.

I think that's still the primary use... a search server like a
database that can be accessed from many clients in differing
languages.  I think HTTP as the primary interface, and plugins or
embedding as a last resort (again, similar to most databases).

> That explains the use of String
> name-value maps and other String representations in the outer API.  However,
> constructs like NamedList are used right down into the code - e.g. in clever
> classes like SimpleFacets - which means that you might as well not be using
> an object oriented language at all.  Instead of being able to use an IDE to
> tell at a glance what's inside facetInfo or highlightingInfo, for example,
> you have to resort to reading the Wiki or to searching code for all
> instances of "rsp.add(...)".

Most users would need to make sense of the structure of the serialized
response (XML or JSON), and I think it's relatively self-documenting
on that point, but some clients do construct objects.

But I understand where you are coming from as an integrator/embedder
rather than a user.  Much of the internals of Solr was the fastest way
to get from point A to B, and was not meant as a Java interface.  The
end goal was to get the bits on the wire quickly.  It was also meant
to enable query plugins to add info to a response or create their own
response info w/o having to worry about the details of serialization
into XML/JSON, etc.

Say we were to come up with a FacetResult class... it would be
undesirable to have to add support in the response writers for
specific classes that will keep growing over time.  So I guess a
FacetResult class would have to tell the writers how to access that
info, or export some sort of generic interface that served the same
purpose as the NamedList does now.

>  Having written software like this in the past,
> in which object structures are in developers' heads rather than in the code,
> I bet it's made things more difficult along the way.
>
> I get the impression that Solr is ready for a bit of refactoring to give it
> a more Java-friendly API.  This API should be the primary means of access
> into Solr functionality;

You mean primary for embedded use or for some kind of integration, right?

> it should explicitly model searches (i.e. filters
> plus queries plus sorts plus facet and highlighting cues), search results,
> hits (SolrHit which has a SolrDocument plus scoring info, by way of analogy
> with Lucene Hit) and hit documents (i.e. SolrDocument, so that's already
> fine).  This API should be used _by_ the String-oriented request handlers,
> not the other way round; request handlers (and all uses of NamedList) should
> be reserved for implementations of that API which deal with non-Java-native
> clients.  At the moment, the non-Java use cases are calling the shots in the
> Java implementation, and that seems a pity.
>
> Some of these considerations are clearly driving the implementation of
> org.apache.solr.client.solrj, which is an important development - I bet
> that's where most people start with Solr now.

> But I think two things need
> to happen here: (i) the work here should be moved into org.apache.solr,
> because with the right API at the server end you don't _need_ any code for a
> Java client - it would just call into the API,

The Java client was meant primarily for remote querying of solr... but
it can be transparently used locally.

> and would be a 'client' only
> in the sense that any caller of a method is that method's client. And (ii)
> the API which is currently in org.apache.solr.client.solrj should be using
> the kinds of classes I listed above, with UpdateResponse etc containing
> fields and getters which model what's actually returned (and do so without
> recourse to NamedList).
>
> I realise that some of this is already happening, but I think with 1.3 still
> in its early stages now might be a good time to go the whle way.  With a
> more heavily modelled and self-documenting API in place, people would find
> it a lot easier to develop Solr integrations, and I expect it would speed up
> the process of developing new core Solr functionality.
>
> Any thoughts?

I'm certainly not against moving toward nicer internal Java APIs (but
back compatibility is an issue here), but I think the external APIs
are much more important.

-Yonik

Mime
View raw message