lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Payloads and TrieRangeQuery
Date Fri, 12 Jun 2009 16:20:31 GMT
On Thu, Jun 11, 2009 at 4:58 PM, Yonik Seeley<> wrote:

> In Solr land we can quickly hack something together, spend some time
> thinking about the external HTTP interface, and immediately make it
> available to users (those using nightlies at least).  It would be a
> huge burden to say to Solr that anything of interest to the Lucene
> community should be pulled out into a module that Solr should then
> use.

Sure, new and exciting things should still stay private to Solr...

> As a separate project, Solr is (and should be) free to follow
> what's in it's own best interest.

Of course!

I see your point, that moving things down into Lucene is added cost:
we have to get consensus that it's a good thing to move (but should
not be hard for many things), do all the mechanics to "transplant" the
code, take Lucene's "different" requirements into account (that the
consumability & stability of the Java API is important), etc.

But, there is a huge benefit to having it in Lucene: you get a wider
community involved to help further improve it, you make Lucene
stronger which improves its & Solr's adoption, etc.

What's good for Lucene is good for Solr.

Eg why hasn't NumberUtils been folded into Lucene, aeons ago?  I
realize it's not the perfect solution (and trie* seems to be better),
but it's certainly better than the "nothing" we've had for a long

Why not the custom fragmenters (Gap, Regex) that Solr has to improve
highlighting?  EG it looks like Solr can approximately produce
sentences as fragments.  This would be a great addition to Lucene's

(NOTE: I fully realize that a large number of things do get moved from
Solr to Lucene, over time, and that's great; I'm saying we should very
much keep that up).

But of course we are as usual resource starved...

>> For example, Solr would presumably prefer that trie* remain in contrib?
> From a capabilities perspective, it doesn't matter much if it's in
> contrib or core I think.  It's a small amount of work to adapt to
> class name changes, but nothing to complain about.
> But it doesn't seem like Trie should be treated specially somehow...

Trie is *very* useful.  It plugs a serious weakness in Lucene (ootb
handling of numeric fields).  The things one must now do to have a
numeric field work "properly" are crazy.  Trie makes Lucene more
useful & consumable; it's a powerful feature.  It should be treated

But: I certainly see your point, that Solr could care less about such
consumability, and leaving trie in contrib would be just fine (from
Solr's standpoint).

> seems to go down a path that makes customer provided filters
> second-class citizens.  I hate it when Java does stuff like that to me
> (their provided classes can do more than mine).

I don't really see the connection here.  If we make trie* the default
for handling of numeric fields in Lucene, how does that hurt customer
provided filters?

>> There's a single set of Solr developers, but a very wide range of
>> direct Lucene users.  I don't see how Lucene having good consumability
>> actually makes Solr's life harder.  Those raw APIs would still be
>> accessible to Solr...  simple things should be simple (direct Lucene
>> users) and complex things should be possible (Solr).
> But with changes come deprecations - forced changes when the
> deprecations are removed.  Sometimes those are easy to adapt to,
> sometimes not.  If those required changes don't actually add any
> functionality to Solr, it's a net negative if you're looking at it
> from Solr's point of view.  That doesn't mean Lucene shouldn't - and
> I've not complained to Lucene in the past because it wasn't Lucene's
> responsibility.

Right, so this is why the move of trie from contrib -> core is net
negative for Solr (things work fine now, and it only creates work for

But, if it improves Lucene's adoption, because Lucene is more
consumable, that then becomes a positive for Solr.

And BTW you should "complain" to Lucene more if we're doing things
that are not Solr friendly.  Honestly, if anything, we don't hear
enough from you ;)

Such complaints will presumably often match this one ("Solr wants a
raw engine; Lucene wants consumability") and we'll just have to agree
to disagree.  But other times I'm sure we'd get something net/net good
out of the resulting discussion.

>>> and taking it out of Solr's release cycle and easy ability to change -
>>> if Solr needs to make a change to one of the moved classes, it's
>>> necessary to get it through the Lucene change process and then upgrade
>>> to the latest Lucene trunk - all or nothing.
>> "Getting through Lucene's change process" should be real simple for
>> you all :)
> Y'r kidding, right? ;-)
> It's sometimes hard enough to get stuff through either community, let
> alone both.

Actually I wasn't kidding!

Sure there's the "normal" open-source challenges -- getting someone's
attention, the mechanics of making a patch & iterating, sometimes
massive unrelated discussions spin off by accident, etc., but those
are, well, normal.

Seriously, if someone got the energy up to say move Solr's function
queries or highlighter improvements or neat tokenizers back into
Lucene, I think there'd a lot of support to make that happen.

> For code that's in Solr, we only have to worry about Solr's concerns,
> not about all users of Lucene.  Big difference.

I completely agree moving something from Solr -> Lucene entails work.
There's no question.  I'm saying, for many things that are in Solr
now, the longe term benefit (to both Solr & Lucene) way offsets that
short term cost.  It's a strategic, not tactical, decision.

>> And, Solr upgrades Lucene's JAR fairly often already?
> And a lot of our users don't like it.
> It's also become much more difficult due to all the Lucene changes
> lately.  It's something we should be doing less of, not more of....
> unless we formally merge the projects or something ;-)

With the great "modularization" proposal, merging the projects is
actually not inconceivable... while it would be an enormous
undertaking I think the better communication & sharing would be a win
for both Solr & Lucene.

>> NumericField would enforce one token, during indexing.
> That only changes the point at which the user realizes "uh, this won't
> work", and it's still at a point after they've written their code.
> Checking like this doesn't even feel like it belongs at the indexing
> level.

Checking at indexing instead of searching catches any problems much
sooner?  Ie if you intend to sort on this field, it needs to have one
value, so why wait until search time to check that?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message