lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Proposal about Version API "relaxation"
Date Thu, 15 Apr 2010 11:52:57 GMT
Well ... I must say that I completely disagree w/ dropping index structure
back-support. Our customers will simply not hear of reindexing 10s of TBs of
content because of version upgrades. Such a decision is key to Lucene
adoption in large-scale projects. It's entirely not about whether Lucene is
a content store or not - content is stored on other systems, I agree. But
that doesn't mean reindexing it is tolerable.

Up until now, Lucene migrated my segments gradually, and before I upgraded
from X+1 to X+2 I could run optimize() to ensure my index will be readable
by X+2. I don't think I can myself agree to it, let alone convince all the
stakeholders in my company who adopt Lucene today in numerous projects, to
let go of such capability. We've been there before (requiring reindexing on
version upgrades) w/ some offerings and customers simply didn't like it and
were forced to use an enterprise-class search engine which offered less (and
didn't use Lucene, up until recently !). Until we moved to Lucene ...

What's Solr's take on it?

I differentiate between structural changes and runtime changes. I, myself,
don't mind if we let go of back-compat support for runtime changes, such as
those generated by analyzers. For a couple of reasons, the most important
ones are (1) these are not so frequent (but so is index structural change)
and (2) that's a decision I, as the application developer, makes - using or
not a newer version of an Analyzer. I don't mind working hard to make a 2.x
Analyzer version work in the 3.x world, but I cannot make a 2.x index
readable by a 3.x Lucene jar, if the latter doesn't support it. That's the
key difference, in my mind, between the two. I can choose not to upgrade at
all to a newer analyzer version ... but I don't want to be forced to stay w/
older Lucene versions and features because of that ... well people might say
that it's not Lucene's problem, but I beg to differ. Lucene benefits from
wider and faster adoption and we rely on new features to be adopted quickly.
That might be jeopardized if we let go of that strong capability, IMO.

What we can do is provide an index migration tool ... but personally I don't
know what's the difference between that and gradually migrating segments as
they are merged, code-wise. I mean - it has to be the same code. Only an
index migration tool may take days to complete on a very large index, while
the ongoing migration takes ~0 time when you come to upgrade to a newer
Lucene release.

And the note about Terrier requiring reindexing ... well I can't say it's a
strength of it but a damn big weakness IMO.

About the release pace, I don't think we can suddenly release every 2 years
... makes people think the project is stuck. And some out there are not so
fond of using a 'trunk' version and release it w/ their products because
trunk is perceived as ongoing development (which it is) and thus less
stable, or is likely to change and most importantly harder to maintain (as
the consumer). So I still think we should release more often than not.

That's why I wanted to differentiate X and Y, but I don't mind if we release
just X ... if that's so important to people. BTW Mike, Eclipse's releases
are like Lucene, and in fact I don't know of so many projects that just
release X ... many of them seem to release X.Y.

I don't understand why we're treating this as a "all or nothing" thing. We
can let go of API back-compat, that clearly has no affect on index structure
and content. We can even let go of index runtime changes for all I care. But
I simply don't think we can let go of index structure back-support.

Shai

On Thu, Apr 15, 2010 at 1:12 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> 2010/4/15 Shai Erera <serera@gmail.com>:
>
> > One way is to define 'major' as X and minor X.Y, and another is to define
> major as 'X.Y' and minor as 'X.Y.Z'. I prefer the latter but don't have any
> strong feelings against the former.
>
> I prefer X.Y, ie, changes to Y only is a minor release (mostly bug
> fixes but maybe small features); changes to X is a major release.  I
> think that's more "standard", ie, people will generally grok that 3.3
> -> 4.0 is a major change but 3.3 -> 3.4 isn't.
>
> So this proposal would change how Lucene releases are numbered.  Ie,
> the next release would be 4.0.  Bug fixes / small features would then
> be 4.1.
>
> > Index back compat should be maintained between major releases, like it is
> today, STRUCTURE-wise.
>
> No... in the proposal, you must re-index on upgrading to the next
> major release (3.x -> 4.0).
>
> I think supporting old indexes, badly (what we do today) is not a
> great solution.  EG on upgrading to 3.1 you'll immediately see a
> search perf hit since the flex emulation layer is running.  It's a
> trap.
>
> It's this freedom, I think, that'd let us drop Version entirely.  It's
> the back-compat of the index that is the major driver for having
> Version today (eg so that the analyzers can produce tokens matching
> your old index).
>
> EG Terrier seems to have the same requirement -- note the bold "All
> indexes must be rebuilt":
>
>  http://terrier.org/docs/current/whats_new.html
>
> Also, Lucene isn't a primary store (like a filesytem or a database).
> We expect that your "true" content still lives somewhere else.  So why
> do we go to such great lengths to keep the index format for so
> long...?
>
> > BTW, w/ all that - does it mean 'backwards' can be dropped, or at least
> test-backwards activated only on a branch which we decide needs it? That'll
> be really great.
>
> I think the stable branches (2.x, 3.x) would have backwards tests
> created the moment they are branched, to make sure as we fix bugs /
> backport minor features we don't break back compat, along that branch.
>
> I don't think we need the .Z part of a release numbering -- our
> numbers would look like most other software projects.  3.0 is a major
> release, 3.1, 3.2, 3.3 fix bugs / add minor features, etc.
>
> If flex were done in this world I would've finished it alot faster!  A
> huge amount of time went into the cross back compat emulation layers
> (pre-flex APIs and pre-flex index).
>
> > Also, we will still need to maintain the Backwards section in CHANGES (or
> move it to API Changes), to help people upgrade from release to release.
>
> I think we'd create a migration guide to explain how apps migrate to
> the next major release (this is what other projects do), eg like this:
>
>  http://community.jboss.org/wiki/Hibernate3MigrationGuides#A42
>
> > Unless you're telling me we'll start releasing major releases more often?
>
> I think this is mostly orthogonal?  We could still do major releases
> frequently or rarely with this model... however, it would give us more
> freedom to do major releases frequently (vs today where every major
> release sets a scary back-compat-burden stake in the ground).
>
> > I don't see why would anyone releases a 3.x after 4.0 is out unless
> someone really wants to work hard on maintaining back-compat of some
> features
>
> I think the minor releases on the stable branch (3.1, 3.2, 3.3) would
> be mostly bug fixes, but maybe also minor features if
> contributor's/developer's had the itch to make them available on the
> stable (3.x) branch.  How much dev happens on the stable branch can be
> largely determined by itch...
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message