lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Proposal about Version API "relaxation"
Date Thu, 15 Apr 2010 19:28:58 GMT
Hi Earwin,

I am strongly +1 on this. I would also make the Release Manager for 3.1, if nobody else wants
to do this. I would like to take the preflex tag or some revisions before (maybe without the
IndexWriterConfig, which is a really new API) to be 3.1 branch. And after that port some of
my post-flex-changes like the StandardTokenizer refactoring back (so we can produce the old
analyzer still without Java 1.4).

So +1 on branching pre-flex and release as 3.1 soon. The Unicode improvements rectify a new
release. I think also s1monw wants to have this.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Earwin Burrfoot [mailto:earwin@gmail.com]
> Sent: Thursday, April 15, 2010 8:15 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Proposal about Version API "relaxation"
> 
> I'd like to remind that Mike's proposal has stable branches.
> 
> We can branch off preflex trunk right now and wrap it up as 3.1.
> Current trunk is declared as future 4.0 and all backcompat cruft is
> removed from it.
> If some new features/bugfixes appear in trunk, and they don't break
> stuff - we backport them to 3.x branch, eventually releasing 3.2, 3.3,
> etc
> 
> Thus, devs are free to work without back-compat burden, bleeding edge
> users get their blood, conservative users get their stability + a
> subset of new features from stable branches.
> 
> 
> On Thu, Apr 15, 2010 at 22:02, DM Smith <dmsmith555@gmail.com> wrote:
> > On 04/15/2010 01:50 PM, Earwin Burrfoot wrote:
> >>>
> >>> First, the index format. IMHO, it is a good thing for a major
> release to
> >>> be
> >>> able to read the prior major release's index. And the ability to
> convert
> >>> it
> >>> to the current format via optimize is also good. Whatever is
> decided on
> >>> this
> >>> thread should take this seriously.
> >>>
> >>
> >> Optimize is a bad way to convert to current.
> >> 1. conversion is not guaranteed, optimizing already optimized index
> is a
> >> noop
> >> 2. it merges all your segments. if you use
> BalancedSegmentMergePolicy,
> >> that destroys your segment size distribution
> >>
> >> Dedicated upgrade tool (available both from command-line and
> >> programmatically) is a good way to convert to current.
> >> 1. conversion happens exactly when you need it, conversion happens
> for
> >> sure, no additional checks needed
> >> 2. it should leave all your segments as is, only changing their
> format
> >>
> >>
> >>>
> >>> It is my observation, though possibly not correct, that core only
> has
> >>> rudimentary analysis capabilities, handling English very well. To
> handle
> >>> other languages well "contrib/analyzers" is required. Until
> recently it
> >>> did
> >>> not get much love. There have been many bw compat breaking changes
> >>> (though
> >>> w/ version one can probably get the prior behavior). IMHO, most of
> >>> contrib/analyzers should be core. My guess is that most non-trivial
> >>> applications will use contrib/analyzers.
> >>>
> >>
> >> I counter - most non-trivial applications will use their own
> analyzers.
> >> The more modules - the merrier. You can choose precisely what you
> need.
> >>
> >
> > By and large an analyzer is a simple wrapper for a tokenizer and some
> > filters. Are you suggesting that most non-trivial apps write their
> own
> > tokenizers and filters?
> >
> > I'd find that hard to believe. For example, I don't know enough
> Chinese,
> > Farsi, Arabic, Polish, ... to come up with anything better than what
> Lucene
> > has to tokenize, stem or filter these.
> >
> >>
> >>>
> >>> Our user base are those with ancient,
> >>> underpowered laptops in 3-rd world countries. On those machines it
> might
> >>> take 10 minutes to create an index and during that time the machine
> is
> >>> fairly unresponsive. There is no opportunity to "do it in the
> >>> background."
> >>>
> >>
> >> Major Lucene releases (feature-wise, not version-wise) happen like
> >> once in a year, or year-and-a-half.
> >> Is it that hard for your users to wait ten minutes once a year?
> >>
> >
> >  I said that was for one index. Multiply that times the number of
> books
> > available (300+) and yes, it is too much to ask. Even if a small
> subset is
> > indexed, say 30, that's around 5 hours of waiting.
> >
> > Under consideration is the frequency of breakage. Some are suggesting
> a
> > greater frequency than yearly.
> >
> > DM
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> 
> 
> 
> --
> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message