lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Why release 3.0?
Date Mon, 16 Nov 2009 20:29:30 GMT
And what happens when someone regenerates it with 1.6 without knowing?

Uwe Schindler wrote:
> I check this by generating the file with 1.4 and 1.5. The 1.4 version will
> not change anymore, so we just leave the java file no jflex anymore. The old
> one is used for Lucene until 2.9, if you use matchVersion=LUCENE_30, the new
> one is used, which can also be regenerated.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>   
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmiller@gmail.com]
>> Sent: Monday, November 16, 2009 9:21 PM
>> To: java-dev@lucene.apache.org
>> Subject: Re: Why release 3.0?
>>
>> Good point - and that likely means the current warning is not working -
>> what can we do to improve it?
>>
>> Perhaps a new text file called jflexregen or something, and it
>> specifically says you must use java 1.5?
>>
>> Uwe Schindler wrote:
>>     
>>> I think the regenerated code in Standard is since years no longer
>>> generated with 1.4 J Most developers use 1.5 or even 1.6. So it
>>> already changed incompatible.
>>>
>>>
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>> ------------------------------------------------------------------------
>>>
>>> *From:* Robert Muir [mailto:rcmuir@gmail.com]
>>> *Sent:* Monday, November 16, 2009 8:52 PM
>>> *To:* java-dev@lucene.apache.org
>>> *Subject:* Re: Why release 3.0?
>>>
>>>
>>>
>>> Uwe, thats probably a good solution I think. just as long as we
>>> document somewhere,
>>> I think there is some warning verbage in StandardTokenizer already
>>> about this.
>>>
>>> NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
>>>       the tokenizer, remember to use JRE 1.4 to run jflex (before
>>>       Lucene 3.0).  This grammar now uses constructs (eg :digit:,
>>>       :letter:) whose meaning can vary according to the JRE used to
>>>       run jflex.  See
>>>       https://issues.apache.org/jira/browse/LUCENE-1126 for details.
>>>
>>> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe@thetaphi.de
>>> <mailto:uwe@thetaphi.de>> wrote:
>>>
>>> But it is a general warning that should be placed in the Wiki: If you
>>> upgrade from Java 1.4 to Java 5, think about reindexing.
>>>
>>>
>>>
>>> It has definitely nothing to do with 3.0, because uses could have
>>> changed (and most of them have) before.
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> *From:* Robert Muir [mailto:rcmuir@gmail.com <mailto:rcmuir@gmail.com>]
>>> *Sent:* Monday, November 16, 2009 8:45 PM
>>>
>>>
>>> *To:* java-dev@lucene.apache.org <mailto:java-dev@lucene.apache.org>
>>> *Subject:* Re: Why release 3.0?
>>>
>>>
>>>
>>> right, my point is its true its nothing to do with Lucene at all,
>>>       
>> really.
>>     
>>> but the reality is we should clarify this to users I think.
>>>
>>> Its especially complex in the current StandardTokenizer, which uses a
>>> mix of hardcoded ranges and properties, can you tell me if you should
>>> reindex for given language X?
>>> I wouldn't want to answer that question right now.
>>>
>>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe@thetaphi.de
>>> <mailto:uwe@thetaphi.de>> wrote:
>>>
>>> We tried out: Character.getType() for these two chars:
>>>
>>>
>>>
>>> Java 5:
>>> '\u00AD' = 16
>>> '\u06DD' = 16
>>>
>>> Java 1.4:
>>> '\u00AD' = 20
>>> '\u06DD' = 7
>>>
>>>
>>>
>>> The first is the soft hyphen.
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> *From:* Robert Muir [mailto:rcmuir@gmail.com <mailto:rcmuir@gmail.com>]
>>> *Sent:* Monday, November 16, 2009 8:37 PM
>>>
>>>
>>> *To:* java-dev@lucene.apache.org <mailto:java-dev@lucene.apache.org>
>>> *Subject:* Re: Why release 3.0?
>>>
>>>
>>>
>>> right, its nothing to do with lucene, instead due to property changes,
>>> etc.
>>>
>>> i just think we should inform users on java 1.4/2.9 that if they
>>> upgrade to java 1.5/3.0, they should reindex.
>>>
>>> the reason i say this about properties, is there are some that change
>>> that will affect tokenizers, i give two examples, a hyphen that
>>> changes from punctuation to format (might affect
>>>       
>> SolrWordDelimiterFilter),
>>     
>>> and arabic ayah which changes from NSM to format, which surely affects
>>> ArabicLetterTokenizer.
>>>
>>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe@syr.edu
>>> <mailto:sarowe@syr.edu>> wrote:
>>>
>>> Hi Robert,
>>>
>>> I agree that the Unicode version supported by the JVM, as you say,
>>> really has nothing to do with Lucene.
>>>
>>> The disruption here is users' upgrading from Java 1.4 to 1.5+, not
>>> when they upgrade Lucene.  I'd guess with few exceptions that most
>>> people have been using Lucene with 1.5+ for a couple of years now,
>>>       
>> though.
>>     
>>> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact
>>> on most Lucene users, assuming that most use Latin-1 exclusively;
>>> although I haven't looked, I'd be surprised if Latin-1 characters
>>> changed much, if at all, from Unicode 3.0 to 4.0.
>>>
>>> It would be useful, I think, to include (a pointer to?) a description
>>> of the details of the Unicode 3.0->4.0 differences in the Lucene 3.0
>>> release notes, since the minimum required Java version, and so also
>>> the supported Unicode version, changes then.
>>>
>>> Steve
>>>
>>>
>>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
>>>       
>>>> the problem is that the properties have changed for various
>>>>         
>> characters,
>>     
>>>> and new characters were added.
>>>>
>>>> it really has nothing to do with lucene, but the idea you can go from
>>>> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not
>>>>         
>> true.
>>     
>>>> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe@thetaphi.de
>>>>         
>>> <mailto:uwe@thetaphi.de>> wrote:
>>>       
>>>>       But an UTF-8 stream from Java 4 can still be read with Java 5,
>>>> what is the problem? Java 5 extended Unicode support, but an index
>>>> created with older versions can still be read. UTF-8 is standardized.
>>>>
>>>>
>>>>
>>>>       -----
>>>>       Uwe Schindler
>>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>       http://www.thetaphi.de
>>>>       eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>>
>>>>       From: Robert Muir [mailto:rcmuir@gmail.com
>>>>         
>>> <mailto:rcmuir@gmail.com>]
>>>       
>>>>       Sent: Monday, November 16, 2009 8:09 PM
>>>>
>>>>       To: java-dev@lucene.apache.org <mailto:java-
>>>>         
>> dev@lucene.apache.org>
>>     
>>>>       Subject: Re: Why release 3.0?
>>>>
>>>>
>>>>
>>>>       uwe, on topic please read my comment on LUCENE-1689, because
>>>> unicode version was bumped in jdk 1.5, i believe this index backwards
>>>> compatibility is only theoretical
>>>>
>>>>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe@thetaphi.de
>>>>         
>>> <mailto:uwe@thetaphi.de>> wrote:
>>>       
>>>>       2.9 has *not* the same format as 3.0, an index created with 3.0
>>>> cannot be read with 2.9. This is because compressed field support was
>>>> removed and therefore the version number of the stored fields file was
>>>> upgraded. But indexes from 2.9 can be read with 3.0 and support may
>>>>         
>> get
>>     
>>>> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>>>>
>>>>
>>>>
>>>>       Uwe
>>>>
>>>>       -----
>>>>       Uwe Schindler
>>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>       http://www.thetaphi.de
>>>>       eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>>
>>>>       From: Jake Mannix [mailto:jake.mannix@gmail.com
>>>>         
>>> <mailto:jake.mannix@gmail.com>]
>>>       
>>>>       Sent: Monday, November 16, 2009 7:15 PM
>>>>
>>>>
>>>>       To: java-dev@lucene.apache.org <mailto:java-
>>>>         
>> dev@lucene.apache.org>
>>     
>>>>       Subject: Re: Why release 3.0?
>>>>
>>>>
>>>>
>>>>       Don't users need to upgrade to 3.0 because 3.1 won't be
>>>> necessarily able to read your
>>>>       2.4 index file formats?  I suppose if you've already upgraded to
>>>> 2.9, then all is well because
>>>>       2.9 is the same format as 3.0, but we can't assume all users
>>>> upgraded from 2.4 to 2.9.
>>>>
>>>>       If you've done that already, then 3.0 might not be necessary,
>>>> but if you're on 2.4 right now,
>>>>       you will be in for a bad surprise if you try to upgrade to 3.1.
>>>>
>>>>         -jake
>>>>
>>>>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
>>>> <erickerickson@gmail.com <mailto:erickerickson@gmail.com>> wrote:
>>>>
>>>>       One of my "specialties" is asking obvious questions just to see
>>>> if everyone's assumptions are aligned. So with the discussion about
>>>> branching 3.0 I have to ask "Is there going to be any 3.0 release
>>>> intended for *production*?". And if not, would we save a lot of
>>>> work by just not worrying about retrofitting fixes to a 3.0 branch
>>>> and carrying on with 3.1 as the first *supported* 3.x release?
>>>>
>>>>       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
>>>> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
>>>> "beta/snapshot" release to get a head start on cleaning up my code
>>>> does seem worthwhile, if I have the spare time. And having a base
>>>> 3.0 version that's not changing all over the place would be useful
>>>> for that.
>>>>
>>>>       That said, I'm also not terribly comfortable with a "release"
>>>> that's out there and unsupported.
>>>>
>>>>       Apologies if this has already been discussed, but I don't
>>>> remember it. Although my memory isn't what it used to be (but
>>>> some would claim it never was<G>)...
>>>>
>>>>       Erick
>>>>         
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com <mailto:rcmuir@gmail.com>
>>>
>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com <mailto:rcmuir@gmail.com>
>>>
>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com <mailto:rcmuir@gmail.com>
>>>
>>>       
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message