Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 64540 invoked from network); 6 Jul 2009 13:26:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Jul 2009 13:26:46 -0000 Received: (qmail 89778 invoked by uid 500); 6 Jul 2009 13:26:56 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 89681 invoked by uid 500); 6 Jul 2009 13:26:55 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 89673 invoked by uid 99); 6 Jul 2009 13:26:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jul 2009 13:26:55 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rcmuir@gmail.com designates 209.85.210.182 as permitted sender) Received: from [209.85.210.182] (HELO mail-yx0-f182.google.com) (209.85.210.182) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jul 2009 13:26:44 +0000 Received: by yxe12 with SMTP id 12so6468485yxe.29 for ; Mon, 06 Jul 2009 06:26:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=yYEMFqYZZPx3jvJ8l5QP5vd/lOVeSScC8S20ez6G2zE=; b=fqurF5WATg7I6UdFZPNRRW+dwiQxEVmJ7SFZEKZ1bAxlGwJDLrZ/C6oCrYMuA1e6wg vY5Mhday+i1QQqCGXk+sfQSrHdlOyOQPlwNKsSgTTm6Zw8ZezoFATDX+4H9yRtoy1t7F sQbGP+PAzaCV0OsuzegSpJML7lelw2eq9ajVM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=AimQcZj1JJWriGqu3NpjEPM/EqvNk1RJ5maoQm10Eex1HP5zhAZO/fYydNrUJrUsqW fbSwwyJBxKjo4IAj9emPJ9i3VkLK6lnJceK2UD738Pii0uv0BLTNt3hKnuffQE43aSza 3W/ciGoJMpXNtRdCeUX21iPhHon2nn4+VgzNQ= MIME-Version: 1.0 Received: by 10.100.254.12 with SMTP id b12mr8586223ani.43.1246886782061; Mon, 06 Jul 2009 06:26:22 -0700 (PDT) In-Reply-To: References: <1523185665.1245551767331.JavaMail.jira@brutus> <880214399.1246541867281.JavaMail.jira@brutus> <8f0ad1f30907020644w41b782c7ree050220334b1eb8@mail.gmail.com> <786fde50907020703y4d21941k38a0488258f6cc7a@mail.gmail.com> <9ac0c6aa0907060406ta429b4cw62a2d4295309f42e@mail.gmail.com> <8f0ad1f30907060411k3fb5b804id4aa058a69e24a42@mail.gmail.com> <000C0AB107CC4BEAB39AEC52399D0B3B@VEGA> <8f0ad1f30907060608v1cbc8269oe803d42fe5a48ef1@mail.gmail.com> Date: Mon, 6 Jul 2009 09:26:21 -0400 Message-ID: <8f0ad1f30907060626g19620991u8371f37b725781a1@mail.gmail.com> Subject: Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() excessively in IndexReader and IndexWriter From: Robert Muir To: java-dev@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org uwe I completely agree. to add the icing on the cake the entire analyzer appears to be just a duplication of the contrib/snowball Russian functionality...! On Mon, Jul 6, 2009 at 9:19 AM, Uwe Schindler wrote: > The whole russian analyzer is very strange and works against all > charset/unicode conventions. It defines own "charsets" (the only valid on= e > is UNICODE), which are all applied to standard java 16 bit chars. The tes= t > shows, how this works: It open a text file in KOI8 using the "ISO-88591-1= " > charset (just to not modify the codepoints when converting to 16bit java > chars (in principle it does a deprecated "new String(byte[],0)"). These > completely wrong java chars are then handled by an analyzers's internal > charset conversion (working on the 16 bit chars). > > The only correct usage of this package is: > - open file with correct encoding (when instantiating the Reader, so spec= ify > KOI8 or windows1251 to the Reader). The string is then correctly UTF-16 > encoded java chars. On this string the "pseudo-charset" UNICODE of this > analyzer can be used. > > In my opinion, this invalid usage of java chars should be deprecated, the > only correct pseudo-charset should be the one specified by UNICODE and al= l > charset conversions should be done using the Reader. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > >> -----Original Message----- >> From: Robert Muir [mailto:rcmuir@gmail.com] >> Sent: Monday, July 06, 2009 3:08 PM >> To: java-dev@lucene.apache.org >> Subject: Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() >> excessively in IndexReader and IndexWriter >> >> Uwe, I think so too. This way it will not be prone to breakage again. >> >> On Mon, Jul 6, 2009 at 8:38 AM, Uwe Schindler wrote: >> > In my opinion, these files should be converted to UTF-8 and committed >> again >> > (and the Reader in the test recondigured for UTF-8). Then they can be >> native >> > EOL style again. The problem is that SVN can only handle the EOL style >> for >> > one-byte-per-char and UTF-8 files. >> > >> > I give it a try here (and I have a converter). >> > >> > ----- >> > Uwe Schindler >> > H.-H.-Meier-Allee 63, D-28213 Bremen >> > http://www.thetaphi.de >> > eMail: uwe@thetaphi.de >> > >> >> -----Original Message----- >> >> From: Robert Muir [mailto:rcmuir@gmail.com] >> >> Sent: Monday, July 06, 2009 1:11 PM >> >> To: java-dev@lucene.apache.org >> >> Subject: Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() >> >> excessively in IndexReader and IndexWriter >> >> >> >> yeah, its fixed now. >> >> >> >> On Mon, Jul 6, 2009 at 7:06 AM, Michael >> >> McCandless wrote: >> >> > Is this the native vs LF svn:eol-style that Uwe already fixed? >> >> > >> >> > Mike >> >> > >> >> > On Thu, Jul 2, 2009 at 10:03 AM, Shai Erera wrote= : >> >> >> Can somebody try to revert the change and test it on Windows? >> >> >> >> >> >> On Thu, Jul 2, 2009 at 4:44 PM, Robert Muir >> wrote: >> >> >>> >> >> >>> well then I have no idea why it doesn't fail. Except that perhaps >> its >> >> >>> EOL-related (as Shai said), and that the failure is somehow >> >> >>> platform-dependent due to newline differences between windows and >> unix >> >> >>> (and the way these are encoded in UTF-16/stored in SVN)? >> >> >>> >> >> >>> I don't do really any work with files in UTF-16 so this is just a >> >> theory. >> >> >>> >> >> >>> On Thu, Jul 2, 2009 at 9:40 AM, Mark Miller >> >> wrote: >> >> >>> > Hudson runs all the tests and emails java-dev if any of them >> fail. >> >> >>> > >> >> >>> > On Thu, Jul 2, 2009 at 9:37 AM, Robert Muir (JIRA) >> >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> =C2=A0 =C2=A0[ >> >> >>> >> >> >> >>> >> https://issues.apache.org/jira/browse/LUCENE- >> >> 1707?page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:comment- >> >> tabpanel&focusedCommentId=3D12726479#action_12726479 >> >> >>> >> ] >> >> >>> >> >> >> >>> >> Robert Muir commented on LUCENE-1707: >> >> >>> >> ------------------------------------- >> >> >>> >> >> >> >>> >> bq. Why doesn't Hudson encounter this problem? >> >> >>> >> >> >> >>> >> Forgive my ignorance, does hudson also run tests or just verif= y >> >> build? >> >> >>> >> These files are only used in tests! >> >> >>> >> >> >> >>> >> I agree we should correct it, and perhaps to prevent other >> problems >> >> >>> >> these >> >> >>> >> files should be converted to UTF-8. >> >> >>> >> >> >> >>> >> For the record I am still confused about these java-code >> analyzers >> >> that >> >> >>> >> implement snowball algorithms, why do they exist when the same >> >> >>> >> functionality >> >> >>> >> is in contrib/snowball? >> >> >>> >> >> >> >>> >> >> >> >>> >> > Don't use ensureOpen() excessively in IndexReader and >> IndexWriter >> >> >>> >> > ------------------------------------------------------------= -- >> --- >> >> >>> >> > >> >> >>> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Key:= LUCENE-1707 >> >> >>> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 URL: >> >> >>> >> > https://issues.apache.org/jira/browse/LUCENE-1707 >> >> >>> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Project: Lucene - = Java >> >> >>> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Issue Type: Improvement >> >> >>> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Components: Index >> >> >>> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Reporter: Shai Erer= a >> >> >>> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Fix For: 2.9 >> >> >>> >> > >> >> >>> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 Attachments: LUCENE-1707.patch, = LUCENE-1707.patch >> >> >>> >> > >> >> >>> >> > >> >> >>> >> > A spin off from here: >> >> >>> >> > http://www.nabble.com/Excessive-use-of-ensureOpen()- >> >> td24127806.html. >> >> >>> >> > We should stop calling this method when it's not necessary f= or >> >> any >> >> >>> >> > internal Lucene code. Currently, this code seems to hurt >> properly >> >> >>> >> > written >> >> >>> >> > apps, unnecessarily. >> >> >>> >> > Will post a patch soon >> >> >>> >> >> >> >>> >> -- >> >> >>> >> This message is automatically generated by JIRA. >> >> >>> >> - >> >> >>> >> You can reply to this email to add a comment to the issue >> online. >> >> >>> >> >> >> >>> >> >> >> >>> >> --------------------------------------------------------------= -- >> --- >> >> -- >> >> >>> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> >> >>> >> For additional commands, e-mail: java-dev-help@lucene.apache.o= rg >> >> >>> >> >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > -- >> >> >>> > -- >> >> >>> > - Mark >> >> >>> > >> >> >>> > http://www.lucidimagination.com >> >> >>> > >> >> >>> > >> >> >>> >> >> >>> >> >> >>> >> >> >>> -- >> >> >>> Robert Muir >> >> >>> rcmuir@gmail.com >> >> >>> >> >> >>> -----------------------------------------------------------------= -- >> -- >> >> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> >> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org >> >> >>> >> >> >> >> >> >> >> >> > >> >> > -------------------------------------------------------------------= -- >> >> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> >> > For additional commands, e-mail: java-dev-help@lucene.apache.org >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> Robert Muir >> >> rcmuir@gmail.com >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> >> For additional commands, e-mail: java-dev-help@lucene.apache.org >> > >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> > For additional commands, e-mail: java-dev-help@lucene.apache.org >> > >> > >> >> >> >> -- >> Robert Muir >> rcmuir@gmail.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-dev-help@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > --=20 Robert Muir rcmuir@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org