lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Why release 3.0?
Date Mon, 16 Nov 2009 20:45:13 GMT
I still reccomend we add a file then HowToRegenJflex.txt or something -
that specifically says to use 1.5 or 1.6. I don't changing the current
notice/warning is visible enough to ensure someone doesn't break this.

Robert Muir wrote:
> no. its still 4.0, but i hear 1.7 will be 5.1 or 5.2
>
> the only way to truly control this, would be to use something like ICU
> to control the unicode version being used (and actually be faster, and
> support higher version).
> see http://site.icu-project.org/home/why-use-icu4j
>
> the issue is that lucene does not have 3rd party library dependencies,
> on the other hand, i think tika and/or nutch already incorporate icu
> for charset detection.
>
> i won't argue for this really, i know nobody wants it, but you can see
> how the situation of not being able to control unicode semantics is
> really difficult for a search engine.
>
> On Mon, Nov 16, 2009 at 3:33 PM, Uwe Schindler <uschindler@pangaea.de
> <mailto:uschindler@pangaea.de>> wrote:
>
>     Did 1.6 change the unicode version? Robert?
>
>     -----
>     UWE SCHINDLER
>     Webserver/Middleware Development
>     PANGAEA - Publishing Network for Geoscientific and Environmental Data
>     MARUM - University of Bremen
>     Room 2500, Leobener Str., D-28359 Bremen
>     Tel.: +49 421 218 65595
>     Fax:  +49 421 218 65505
>     http://www.pangaea.de/
>     E-mail <http://www.pangaea.de/%0AE-mail>: uschindler@pangaea.de
>     <mailto:uschindler@pangaea.de>
>
>     > -----Original Message-----
>     > From: Mark Miller [mailto:markrmiller@gmail.com
>     <mailto:markrmiller@gmail.com>]
>     > Sent: Monday, November 16, 2009 9:30 PM
>     > To: java-dev@lucene.apache.org <mailto:java-dev@lucene.apache.org>
>     > Subject: Re: Why release 3.0?
>     >
>     > And what happens when someone regenerates it with 1.6 without
>     knowing?
>     >
>     > Uwe Schindler wrote:
>     > > I check this by generating the file with 1.4 and 1.5. The 1.4
>     version
>     > will
>     > > not change anymore, so we just leave the java file no jflex
>     anymore. The
>     > old
>     > > one is used for Lucene until 2.9, if you use
>     matchVersion=LUCENE_30, the
>     > new
>     > > one is used, which can also be regenerated.
>     > >
>     > > -----
>     > > Uwe Schindler
>     > > H.-H.-Meier-Allee 63, D-28213 Bremen
>     > > http://www.thetaphi.de
>     > > eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     > >
>     > >
>     > >> -----Original Message-----
>     > >> From: Mark Miller [mailto:markrmiller@gmail.com
>     <mailto:markrmiller@gmail.com>]
>     > >> Sent: Monday, November 16, 2009 9:21 PM
>     > >> To: java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org>
>     > >> Subject: Re: Why release 3.0?
>     > >>
>     > >> Good point - and that likely means the current warning is not
>     working -
>     > >> what can we do to improve it?
>     > >>
>     > >> Perhaps a new text file called jflexregen or something, and it
>     > >> specifically says you must use java 1.5?
>     > >>
>     > >> Uwe Schindler wrote:
>     > >>
>     > >>> I think the regenerated code in Standard is since years no
>     longer
>     > >>> generated with 1.4 J Most developers use 1.5 or even 1.6. So it
>     > >>> already changed incompatible.
>     > >>>
>     > >>>
>     > >>>
>     > >>> -----
>     > >>> Uwe Schindler
>     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
>     > >>> http://www.thetaphi.de
>     > >>> eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     > >>>
>     > >>>
>     ----------------------------------------------------------------------
>     > --
>     > >>>
>     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
>     <mailto:rcmuir@gmail.com>]
>     > >>> *Sent:* Monday, November 16, 2009 8:52 PM
>     > >>> *To:* java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org>
>     > >>> *Subject:* Re: Why release 3.0?
>     > >>>
>     > >>>
>     > >>>
>     > >>> Uwe, thats probably a good solution I think. just as long as we
>     > >>> document somewhere,
>     > >>> I think there is some warning verbage in StandardTokenizer
>     already
>     > >>> about this.
>     > >>>
>     > >>> NOTE: if you change StandardTokenizerImpl.jflex and need to
>     regenerate
>     > >>>       the tokenizer, remember to use JRE 1.4 to run jflex
>     (before
>     > >>>       Lucene 3.0).  This grammar now uses constructs (eg
>     :digit:,
>     > >>>       :letter:) whose meaning can vary according to the JRE
>     used to
>     > >>>       run jflex.  See
>     > >>>       https://issues.apache.org/jira/browse/LUCENE-1126 for
>     details.
>     > >>>
>     > >>> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler
>     <uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     > >>> <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>>
wrote:
>     > >>>
>     > >>> But it is a general warning that should be placed in the
>     Wiki: If you
>     > >>> upgrade from Java 1.4 to Java 5, think about reindexing.
>     > >>>
>     > >>>
>     > >>>
>     > >>> It has definitely nothing to do with 3.0, because uses could
>     have
>     > >>> changed (and most of them have) before.
>     > >>>
>     > >>> -----
>     > >>> Uwe Schindler
>     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
>     > >>> http://www.thetaphi.de
>     > >>> eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>
>     > >>>
>     > >>>
>     ----------------------------------------------------------------------
>     > --
>     > >>>
>     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
>     <mailto:rcmuir@gmail.com>
>     > <mailto:rcmuir@gmail.com <mailto:rcmuir@gmail.com>>]
>     > >>> *Sent:* Monday, November 16, 2009 8:45 PM
>     > >>>
>     > >>>
>     > >>> *To:* java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org>
>     <mailto:java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org>>
>     > >>> *Subject:* Re: Why release 3.0?
>     > >>>
>     > >>>
>     > >>>
>     > >>> right, my point is its true its nothing to do with Lucene at
>     all,
>     > >>>
>     > >> really.
>     > >>
>     > >>> but the reality is we should clarify this to users I think.
>     > >>>
>     > >>> Its especially complex in the current StandardTokenizer,
>     which uses a
>     > >>> mix of hardcoded ranges and properties, can you tell me if
>     you should
>     > >>> reindex for given language X?
>     > >>> I wouldn't want to answer that question right now.
>     > >>>
>     > >>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler
>     <uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     > >>> <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>>
wrote:
>     > >>>
>     > >>> We tried out: Character.getType() for these two chars:
>     > >>>
>     > >>>
>     > >>>
>     > >>> Java 5:
>     > >>> '\u00AD' = 16
>     > >>> '\u06DD' = 16
>     > >>>
>     > >>> Java 1.4:
>     > >>> '\u00AD' = 20
>     > >>> '\u06DD' = 7
>     > >>>
>     > >>>
>     > >>>
>     > >>> The first is the soft hyphen.
>     > >>>
>     > >>> -----
>     > >>> Uwe Schindler
>     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
>     > >>> http://www.thetaphi.de
>     > >>> eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>
>     > >>>
>     > >>>
>     ----------------------------------------------------------------------
>     > --
>     > >>>
>     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
>     <mailto:rcmuir@gmail.com>
>     > <mailto:rcmuir@gmail.com <mailto:rcmuir@gmail.com>>]
>     > >>> *Sent:* Monday, November 16, 2009 8:37 PM
>     > >>>
>     > >>>
>     > >>> *To:* java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org>
>     <mailto:java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org>>
>     > >>> *Subject:* Re: Why release 3.0?
>     > >>>
>     > >>>
>     > >>>
>     > >>> right, its nothing to do with lucene, instead due to
>     property changes,
>     > >>> etc.
>     > >>>
>     > >>> i just think we should inform users on java 1.4/2.9 that if they
>     > >>> upgrade to java 1.5/3.0, they should reindex.
>     > >>>
>     > >>> the reason i say this about properties, is there are some
>     that change
>     > >>> that will affect tokenizers, i give two examples, a hyphen that
>     > >>> changes from punctuation to format (might affect
>     > >>>
>     > >> SolrWordDelimiterFilter),
>     > >>
>     > >>> and arabic ayah which changes from NSM to format, which
>     surely affects
>     > >>> ArabicLetterTokenizer.
>     > >>>
>     > >>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe
>     <sarowe@syr.edu <mailto:sarowe@syr.edu>
>     > >>> <mailto:sarowe@syr.edu <mailto:sarowe@syr.edu>>>
wrote:
>     > >>>
>     > >>> Hi Robert,
>     > >>>
>     > >>> I agree that the Unicode version supported by the JVM, as
>     you say,
>     > >>> really has nothing to do with Lucene.
>     > >>>
>     > >>> The disruption here is users' upgrading from Java 1.4 to
>     1.5+, not
>     > >>> when they upgrade Lucene.  I'd guess with few exceptions
>     that most
>     > >>> people have been using Lucene with 1.5+ for a couple of
>     years now,
>     > >>>
>     > >> though.
>     > >>
>     > >>> But even the upgrade from Java 1.4 to 1.5+ will have (had)
>     zero impact
>     > >>> on most Lucene users, assuming that most use Latin-1
>     exclusively;
>     > >>> although I haven't looked, I'd be surprised if Latin-1
>     characters
>     > >>> changed much, if at all, from Unicode 3.0 to 4.0.
>     > >>>
>     > >>> It would be useful, I think, to include (a pointer to?) a
>     description
>     > >>> of the details of the Unicode 3.0->4.0 differences in the
>     Lucene 3.0
>     > >>> release notes, since the minimum required Java version, and
>     so also
>     > >>> the supported Unicode version, changes then.
>     > >>>
>     > >>> Steve
>     > >>>
>     > >>>
>     > >>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
>     > >>>
>     > >>>> the problem is that the properties have changed for various
>     > >>>>
>     > >> characters,
>     > >>
>     > >>>> and new characters were added.
>     > >>>>
>     > >>>> it really has nothing to do with lucene, but the idea you
>     can go from
>     > >>>> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing
>     is not
>     > >>>>
>     > >> true.
>     > >>
>     > >>>> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler
>     <uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     > >>>>
>     > >>> <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>>
wrote:
>     > >>>
>     > >>>>       But an UTF-8 stream from Java 4 can still be read
>     with Java 5,
>     > >>>> what is the problem? Java 5 extended Unicode support, but
>     an index
>     > >>>> created with older versions can still be read. UTF-8 is
>     standardized.
>     > >>>>
>     > >>>>
>     > >>>>
>     > >>>>       -----
>     > >>>>       Uwe Schindler
>     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
>     > >>>>       http://www.thetaphi.de
>     > >>>>       eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>
>     > >>>>
>     > >>>>
>     > >>>> ________________________________
>     > >>>>
>     > >>>>
>     > >>>>       From: Robert Muir [mailto:rcmuir@gmail.com
>     <mailto:rcmuir@gmail.com>
>     > >>>>
>     > >>> <mailto:rcmuir@gmail.com <mailto:rcmuir@gmail.com>>]
>     > >>>
>     > >>>>       Sent: Monday, November 16, 2009 8:09 PM
>     > >>>>
>     > >>>>       To: java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org> <mailto:java- <mailto:java->
>     > >>>>
>     > >> dev@lucene.apache.org <mailto:dev@lucene.apache.org>>
>     > >>
>     > >>>>       Subject: Re: Why release 3.0?
>     > >>>>
>     > >>>>
>     > >>>>
>     > >>>>       uwe, on topic please read my comment on LUCENE-1689,
>     because
>     > >>>> unicode version was bumped in jdk 1.5, i believe this index
>     backwards
>     > >>>> compatibility is only theoretical
>     > >>>>
>     > >>>>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler
>     <uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     > >>>>
>     > >>> <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>>
wrote:
>     > >>>
>     > >>>>       2.9 has *not* the same format as 3.0, an index
>     created with 3.0
>     > >>>> cannot be read with 2.9. This is because compressed field
>     support was
>     > >>>> removed and therefore the version number of the stored
>     fields file
>     > was
>     > >>>> upgraded. But indexes from 2.9 can be read with 3.0 and
>     support may
>     > >>>>
>     > >> get
>     > >>
>     > >>>> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>     > >>>>
>     > >>>>
>     > >>>>
>     > >>>>       Uwe
>     > >>>>
>     > >>>>       -----
>     > >>>>       Uwe Schindler
>     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
>     > >>>>       http://www.thetaphi.de
>     > >>>>       eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>     <mailto:uwe@thetaphi.de <mailto:uwe@thetaphi.de>>
>     > >>>>
>     > >>>>
>     > >>>> ________________________________
>     > >>>>
>     > >>>>
>     > >>>>       From: Jake Mannix [mailto:jake.mannix@gmail.com
>     <mailto:jake.mannix@gmail.com>
>     > >>>>
>     > >>> <mailto:jake.mannix@gmail.com <mailto:jake.mannix@gmail.com>>]
>     > >>>
>     > >>>>       Sent: Monday, November 16, 2009 7:15 PM
>     > >>>>
>     > >>>>
>     > >>>>       To: java-dev@lucene.apache.org
>     <mailto:java-dev@lucene.apache.org> <mailto:java- <mailto:java->
>     > >>>>
>     > >> dev@lucene.apache.org <mailto:dev@lucene.apache.org>>
>     > >>
>     > >>>>       Subject: Re: Why release 3.0?
>     > >>>>
>     > >>>>
>     > >>>>
>     > >>>>       Don't users need to upgrade to 3.0 because 3.1 won't
be
>     > >>>> necessarily able to read your
>     > >>>>       2.4 index file formats?  I suppose if you've already
>     upgraded
>     > to
>     > >>>> 2.9, then all is well because
>     > >>>>       2.9 is the same format as 3.0, but we can't assume
>     all users
>     > >>>> upgraded from 2.4 to 2.9.
>     > >>>>
>     > >>>>       If you've done that already, then 3.0 might not be
>     necessary,
>     > >>>> but if you're on 2.4 right now,
>     > >>>>       you will be in for a bad surprise if you try to
>     upgrade to 3.1.
>     > >>>>
>     > >>>>         -jake
>     > >>>>
>     > >>>>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
>     > >>>> <erickerickson@gmail.com <mailto:erickerickson@gmail.com>
>     <mailto:erickerickson@gmail.com <mailto:erickerickson@gmail.com>>>
>     wrote:
>     > >>>>
>     > >>>>       One of my "specialties" is asking obvious questions
>     just to see
>     > >>>> if everyone's assumptions are aligned. So with the
>     discussion about
>     > >>>> branching 3.0 I have to ask "Is there going to be any 3.0
>     release
>     > >>>> intended for *production*?". And if not, would we save a lot
of
>     > >>>> work by just not worrying about retrofitting fixes to a 3.0
>     branch
>     > >>>> and carrying on with 3.1 as the first *supported* 3.x release?
>     > >>>>
>     > >>>>       Since 3.0 is "upgrade-to-java5 and remove
>     deprecations", I'm
>     > not
>     > >>>> sure *as a user* I see a good reason to upgrade to 3.0.
>     Getting a
>     > >>>> "beta/snapshot" release to get a head start on cleaning up
>     my code
>     > >>>> does seem worthwhile, if I have the spare time. And having
>     a base
>     > >>>> 3.0 version that's not changing all over the place would be
>     useful
>     > >>>> for that.
>     > >>>>
>     > >>>>       That said, I'm also not terribly comfortable with a
>     "release"
>     > >>>> that's out there and unsupported.
>     > >>>>
>     > >>>>       Apologies if this has already been discussed, but I don't
>     > >>>> remember it. Although my memory isn't what it used to be (but
>     > >>>> some would claim it never was<G>)...
>     > >>>>
>     > >>>>       Erick
>     > >>>>
>     > >>>
>     > >>>
>     > >>> --
>     > >>> Robert Muir
>     > >>> rcmuir@gmail.com <mailto:rcmuir@gmail.com>
>     <mailto:rcmuir@gmail.com <mailto:rcmuir@gmail.com>>
>     > >>>
>     > >>>
>     > >>>
>     > >>>
>     > >>> --
>     > >>> Robert Muir
>     > >>> rcmuir@gmail.com <mailto:rcmuir@gmail.com>
>     <mailto:rcmuir@gmail.com <mailto:rcmuir@gmail.com>>
>     > >>>
>     > >>>
>     > >>>
>     > >>>
>     > >>> --
>     > >>> Robert Muir
>     > >>> rcmuir@gmail.com <mailto:rcmuir@gmail.com>
>     <mailto:rcmuir@gmail.com <mailto:rcmuir@gmail.com>>
>     > >>>
>     > >>>
>     > >> --
>     > >> - Mark
>     > >>
>     > >> http://www.lucidimagination.com
>     > >>
>     > >>
>     > >>
>     > >>
>     > >>
>     ---------------------------------------------------------------------
>     > >> To unsubscribe, e-mail:
>     java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>     > >> For additional commands, e-mail:
>     java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>
>     > >>
>     > >
>     > >
>     > >
>     > >
>     ---------------------------------------------------------------------
>     > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>     > > For additional commands, e-mail:
>     java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>
>     > >
>     > >
>     >
>     >
>     > --
>     > - Mark
>     >
>     > http://www.lucidimagination.com
>     >
>     >
>     >
>     >
>     >
>     ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>     > For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>
>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>
>
>
>
>
> -- 
> Robert Muir
> rcmuir@gmail.com <mailto:rcmuir@gmail.com>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message