lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <uschind...@pangaea.de>
Subject RE: Why release 3.0?
Date Mon, 16 Nov 2009 20:33:08 GMT
Did 1.6 change the unicode version? Robert?

-----
UWE SCHINDLER
Webserver/Middleware Development
PANGAEA - Publishing Network for Geoscientific and Environmental Data
MARUM - University of Bremen
Room 2500, Leobener Str., D-28359 Bremen
Tel.: +49 421 218 65595
Fax:  +49 421 218 65505
http://www.pangaea.de/
E-mail: uschindler@pangaea.de

> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Monday, November 16, 2009 9:30 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Why release 3.0?
> 
> And what happens when someone regenerates it with 1.6 without knowing?
> 
> Uwe Schindler wrote:
> > I check this by generating the file with 1.4 and 1.5. The 1.4 version
> will
> > not change anymore, so we just leave the java file no jflex anymore. The
> old
> > one is used for Lucene until 2.9, if you use matchVersion=LUCENE_30, the
> new
> > one is used, which can also be regenerated.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Mark Miller [mailto:markrmiller@gmail.com]
> >> Sent: Monday, November 16, 2009 9:21 PM
> >> To: java-dev@lucene.apache.org
> >> Subject: Re: Why release 3.0?
> >>
> >> Good point - and that likely means the current warning is not working -
> >> what can we do to improve it?
> >>
> >> Perhaps a new text file called jflexregen or something, and it
> >> specifically says you must use java 1.5?
> >>
> >> Uwe Schindler wrote:
> >>
> >>> I think the regenerated code in Standard is since years no longer
> >>> generated with 1.4 J Most developers use 1.5 or even 1.6. So it
> >>> already changed incompatible.
> >>>
> >>>
> >>>
> >>> -----
> >>> Uwe Schindler
> >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >>> http://www.thetaphi.de
> >>> eMail: uwe@thetaphi.de
> >>>
> >>> ----------------------------------------------------------------------
> --
> >>>
> >>> *From:* Robert Muir [mailto:rcmuir@gmail.com]
> >>> *Sent:* Monday, November 16, 2009 8:52 PM
> >>> *To:* java-dev@lucene.apache.org
> >>> *Subject:* Re: Why release 3.0?
> >>>
> >>>
> >>>
> >>> Uwe, thats probably a good solution I think. just as long as we
> >>> document somewhere,
> >>> I think there is some warning verbage in StandardTokenizer already
> >>> about this.
> >>>
> >>> NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
> >>>       the tokenizer, remember to use JRE 1.4 to run jflex (before
> >>>       Lucene 3.0).  This grammar now uses constructs (eg :digit:,
> >>>       :letter:) whose meaning can vary according to the JRE used to
> >>>       run jflex.  See
> >>>       https://issues.apache.org/jira/browse/LUCENE-1126 for details.
> >>>
> >>> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe@thetaphi.de
> >>> <mailto:uwe@thetaphi.de>> wrote:
> >>>
> >>> But it is a general warning that should be placed in the Wiki: If you
> >>> upgrade from Java 1.4 to Java 5, think about reindexing.
> >>>
> >>>
> >>>
> >>> It has definitely nothing to do with 3.0, because uses could have
> >>> changed (and most of them have) before.
> >>>
> >>> -----
> >>> Uwe Schindler
> >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >>> http://www.thetaphi.de
> >>> eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
> >>>
> >>> ----------------------------------------------------------------------
> --
> >>>
> >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> <mailto:rcmuir@gmail.com>]
> >>> *Sent:* Monday, November 16, 2009 8:45 PM
> >>>
> >>>
> >>> *To:* java-dev@lucene.apache.org <mailto:java-dev@lucene.apache.org>
> >>> *Subject:* Re: Why release 3.0?
> >>>
> >>>
> >>>
> >>> right, my point is its true its nothing to do with Lucene at all,
> >>>
> >> really.
> >>
> >>> but the reality is we should clarify this to users I think.
> >>>
> >>> Its especially complex in the current StandardTokenizer, which uses a
> >>> mix of hardcoded ranges and properties, can you tell me if you should
> >>> reindex for given language X?
> >>> I wouldn't want to answer that question right now.
> >>>
> >>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe@thetaphi.de
> >>> <mailto:uwe@thetaphi.de>> wrote:
> >>>
> >>> We tried out: Character.getType() for these two chars:
> >>>
> >>>
> >>>
> >>> Java 5:
> >>> '\u00AD' = 16
> >>> '\u06DD' = 16
> >>>
> >>> Java 1.4:
> >>> '\u00AD' = 20
> >>> '\u06DD' = 7
> >>>
> >>>
> >>>
> >>> The first is the soft hyphen.
> >>>
> >>> -----
> >>> Uwe Schindler
> >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >>> http://www.thetaphi.de
> >>> eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
> >>>
> >>> ----------------------------------------------------------------------
> --
> >>>
> >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> <mailto:rcmuir@gmail.com>]
> >>> *Sent:* Monday, November 16, 2009 8:37 PM
> >>>
> >>>
> >>> *To:* java-dev@lucene.apache.org <mailto:java-dev@lucene.apache.org>
> >>> *Subject:* Re: Why release 3.0?
> >>>
> >>>
> >>>
> >>> right, its nothing to do with lucene, instead due to property changes,
> >>> etc.
> >>>
> >>> i just think we should inform users on java 1.4/2.9 that if they
> >>> upgrade to java 1.5/3.0, they should reindex.
> >>>
> >>> the reason i say this about properties, is there are some that change
> >>> that will affect tokenizers, i give two examples, a hyphen that
> >>> changes from punctuation to format (might affect
> >>>
> >> SolrWordDelimiterFilter),
> >>
> >>> and arabic ayah which changes from NSM to format, which surely affects
> >>> ArabicLetterTokenizer.
> >>>
> >>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe@syr.edu
> >>> <mailto:sarowe@syr.edu>> wrote:
> >>>
> >>> Hi Robert,
> >>>
> >>> I agree that the Unicode version supported by the JVM, as you say,
> >>> really has nothing to do with Lucene.
> >>>
> >>> The disruption here is users' upgrading from Java 1.4 to 1.5+, not
> >>> when they upgrade Lucene.  I'd guess with few exceptions that most
> >>> people have been using Lucene with 1.5+ for a couple of years now,
> >>>
> >> though.
> >>
> >>> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact
> >>> on most Lucene users, assuming that most use Latin-1 exclusively;
> >>> although I haven't looked, I'd be surprised if Latin-1 characters
> >>> changed much, if at all, from Unicode 3.0 to 4.0.
> >>>
> >>> It would be useful, I think, to include (a pointer to?) a description
> >>> of the details of the Unicode 3.0->4.0 differences in the Lucene 3.0
> >>> release notes, since the minimum required Java version, and so also
> >>> the supported Unicode version, changes then.
> >>>
> >>> Steve
> >>>
> >>>
> >>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> >>>
> >>>> the problem is that the properties have changed for various
> >>>>
> >> characters,
> >>
> >>>> and new characters were added.
> >>>>
> >>>> it really has nothing to do with lucene, but the idea you can go from
> >>>> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not
> >>>>
> >> true.
> >>
> >>>> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe@thetaphi.de
> >>>>
> >>> <mailto:uwe@thetaphi.de>> wrote:
> >>>
> >>>>       But an UTF-8 stream from Java 4 can still be read with Java 5,
> >>>> what is the problem? Java 5 extended Unicode support, but an index
> >>>> created with older versions can still be read. UTF-8 is standardized.
> >>>>
> >>>>
> >>>>
> >>>>       -----
> >>>>       Uwe Schindler
> >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> >>>>       http://www.thetaphi.de
> >>>>       eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
> >>>>
> >>>>
> >>>> ________________________________
> >>>>
> >>>>
> >>>>       From: Robert Muir [mailto:rcmuir@gmail.com
> >>>>
> >>> <mailto:rcmuir@gmail.com>]
> >>>
> >>>>       Sent: Monday, November 16, 2009 8:09 PM
> >>>>
> >>>>       To: java-dev@lucene.apache.org <mailto:java-
> >>>>
> >> dev@lucene.apache.org>
> >>
> >>>>       Subject: Re: Why release 3.0?
> >>>>
> >>>>
> >>>>
> >>>>       uwe, on topic please read my comment on LUCENE-1689, because
> >>>> unicode version was bumped in jdk 1.5, i believe this index backwards
> >>>> compatibility is only theoretical
> >>>>
> >>>>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe@thetaphi.de
> >>>>
> >>> <mailto:uwe@thetaphi.de>> wrote:
> >>>
> >>>>       2.9 has *not* the same format as 3.0, an index created with 3.0
> >>>> cannot be read with 2.9. This is because compressed field support was
> >>>> removed and therefore the version number of the stored fields file
> was
> >>>> upgraded. But indexes from 2.9 can be read with 3.0 and support may
> >>>>
> >> get
> >>
> >>>> removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >>>>
> >>>>
> >>>>
> >>>>       Uwe
> >>>>
> >>>>       -----
> >>>>       Uwe Schindler
> >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> >>>>       http://www.thetaphi.de
> >>>>       eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
> >>>>
> >>>>
> >>>> ________________________________
> >>>>
> >>>>
> >>>>       From: Jake Mannix [mailto:jake.mannix@gmail.com
> >>>>
> >>> <mailto:jake.mannix@gmail.com>]
> >>>
> >>>>       Sent: Monday, November 16, 2009 7:15 PM
> >>>>
> >>>>
> >>>>       To: java-dev@lucene.apache.org <mailto:java-
> >>>>
> >> dev@lucene.apache.org>
> >>
> >>>>       Subject: Re: Why release 3.0?
> >>>>
> >>>>
> >>>>
> >>>>       Don't users need to upgrade to 3.0 because 3.1 won't be
> >>>> necessarily able to read your
> >>>>       2.4 index file formats?  I suppose if you've already upgraded
> to
> >>>> 2.9, then all is well because
> >>>>       2.9 is the same format as 3.0, but we can't assume all users
> >>>> upgraded from 2.4 to 2.9.
> >>>>
> >>>>       If you've done that already, then 3.0 might not be necessary,
> >>>> but if you're on 2.4 right now,
> >>>>       you will be in for a bad surprise if you try to upgrade to 3.1.
> >>>>
> >>>>         -jake
> >>>>
> >>>>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> >>>> <erickerickson@gmail.com <mailto:erickerickson@gmail.com>>
wrote:
> >>>>
> >>>>       One of my "specialties" is asking obvious questions just to see
> >>>> if everyone's assumptions are aligned. So with the discussion about
> >>>> branching 3.0 I have to ask "Is there going to be any 3.0 release
> >>>> intended for *production*?". And if not, would we save a lot of
> >>>> work by just not worrying about retrofitting fixes to a 3.0 branch
> >>>> and carrying on with 3.1 as the first *supported* 3.x release?
> >>>>
> >>>>       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm
> not
> >>>> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> >>>> "beta/snapshot" release to get a head start on cleaning up my code
> >>>> does seem worthwhile, if I have the spare time. And having a base
> >>>> 3.0 version that's not changing all over the place would be useful
> >>>> for that.
> >>>>
> >>>>       That said, I'm also not terribly comfortable with a "release"
> >>>> that's out there and unsupported.
> >>>>
> >>>>       Apologies if this has already been discussed, but I don't
> >>>> remember it. Although my memory isn't what it used to be (but
> >>>> some would claim it never was<G>)...
> >>>>
> >>>>       Erick
> >>>>
> >>>
> >>>
> >>> --
> >>> Robert Muir
> >>> rcmuir@gmail.com <mailto:rcmuir@gmail.com>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Robert Muir
> >>> rcmuir@gmail.com <mailto:rcmuir@gmail.com>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Robert Muir
> >>> rcmuir@gmail.com <mailto:rcmuir@gmail.com>
> >>>
> >>>
> >> --
> >> - Mark
> >>
> >> http://www.lucidimagination.com
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message