lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <>
Subject Re: Lucene's default settings & back compatibility
Date Fri, 22 May 2009 03:44:59 GMT
> Your example confused me.

You're right. I Wrote it with one eye closed already. I meant to say that if
I'm a 2.4 user and something gets deprecated in trunk (afterwards), it is
carried through 2.4.X and 2.5 and then removed in 2.6. So only 1 full minor

It's somewhat crazy, but what if we deprecate stuff and rename it?

I absolutely love that idea ! But it means that:
1) We cannot support jar drop-in ability in those cases (which I'm fine with
because people can upgrade to 2.4.X to get bug fixes) not just because the
API does something different, but because it may not compile. For example,
the changes I'm doing in 1614 would have changed next() and skipTo()
signature, and so someone who wrote a DISI which has a next() that returns
boolean will fail to compile.
2) We give the deprecated API the mediocre names. (A funny thought: we can
give those methods/classes really stupid/nasty names, to emphasize the
beauty of the existing API, to encourage people to stick with the better API
:) ).
3) We document clearly what needs to be done in order to use the deprecated

One thing we didn't address here fully are methods added to
interfaces/abstract classes. When we add a method to an abstract class with
a default impl, that's ok. But what if we need to make it abstract (like we
had to do in 1575 for the Collector versions)?

I guess for interfaces we should first move all of them to abstract classes.
I like interfaces. but abstract classes give us slightly more freedom when
we face back-compat issues. Maybe to support Earwin's idea, we use the name
for a new abstract class, and give the interface a different name? That way
to upgrade people just need to change implements to extends (I hope that
won't cause any problems if their classes already extend something else).

But if we apply this policy to interfaces, I think more users will need to
touch their code when upgrading even minor releases.

So Mike, about actsAsVersion ... I think I'm starting to get used to it. I
do relate to what Marvin writes though, about two different apps running in
the same JVM with different settings. We have such a case - two teams
develop two search solutions (for two back-ends). They live in the same JVM
but have different development plans/schedules. So it's not just a
hypothetical problem to me.

If we could have the app saying something
Version.getInstance(appId).actAsVersion(2.4) that would solve it because
each will have its own Id, and the Version class would maintain a map
between the Id and an instance. But I've still yet to resolve (in my mind)
how the Lucene code will use it, since the same code runs in two apps with
different IDs, and so won't know which appId to pass.

Oh well .. we're going to change the way those two teams work anyway, so for
me at least, this problem will be gone soon :)

I also agree that actsAsVersion breaks the localilty principle, in which
when you see a bug you should check in the surroundings where the bug
happened, and not realize the bug stems from files away. But I don't like
passing version information in the constructors also ...

What if we continue to process Marvin's proposal on saving that information
in the index. I think, Mike, that I asked you a similar question a while
ago, about whether Lucene has the ability to store index versions. Index
versions are important and can save some of the problems here - not just
with storing stopwords list, but also code that manipulates the index, or
makes decisions about scoring etc.

For the two apps in same JVM it should solve the problem since I think we
can safely assume each operates on its own index.

Arggh .. but again we face the same problem - how do we pass that
information to the different classes? How is a TokenStream expected to read
that info?

I think we may have to settle on the static Version class, even if it will
read the information from the index (by doing some Version.init(File


On Fri, May 22, 2009 at 1:53 AM, Marvin Humphrey <>wrote:

> On Thu, May 21, 2009 at 05:19:43PM -0400, Michael McCandless wrote:
> > Marvin, which solution would you prefer?
> Between the two, I'd prefer settings constructor arguments, though I would
> be
> inclined to have settings classes that are specific to individual classes
> rather than Lucene-wide.
> At least that scheme gets locality right.  The global actsAsVersion
> variable
> violates that principle and has the potential to saddle a small number of
> users who have done absolutely nothing wrong with bugs that are very, very
> hard to hunt down.  That's unfair.
> As far as analyzers and token streams, the theoretical answer is making
> indexes self-describing via serializable schemas, as discussed on the Lucy
> dev
> list, and as implemented in KinoSearch svn trunk.  With versioning metadata
> attached to the index, there is no longer any worry about upgrading
> analysis
> modules provided that those modules handle their own versioning correctly.
> For instance, in KS the Stopalizer always embeds the complete stoplist in
> the
> schema file, so even if we update the "English" stoplist, we don't get
> invalid
> search results for indexes which were created with the old stoplist.
> Similarly, it may not be possible to keep around multiple variants of
> Snowball, but at least we can fail catastrophically instead of subtly if we
> detect that the Snowball version has changed.
> Full-on schema serialization isn't feasible for Lucene, but attaching an
> actsAsVersion variable to an index and feeding that to your analyzers would
> be
> a decent start.
> Lastly, I think a major java Lucene release is justified already.  Won't
> this
> discussion die down somewhat if you can get 3.0 out?  If there are issues
> that
> are half done, how about rolling back whatever's in the way?
> Marvin Humphrey
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message