lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() excessively in IndexReader and IndexWriter
Date Mon, 06 Jul 2009 15:40:20 GMT
Wonderful, and the tests (TestRussianStems) pass?

Thanks,
Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Monday, July 06, 2009 5:37 PM
> To: java-dev@lucene.apache.org
> Subject: Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen()
> excessively in IndexReader and IndexWriter
> 
> contrib/analyzers/src/test/org/apache/lucene/analysis/ru/stemsUTF8.txt
> looks right on OpenSolaris (unix EOLs).
> 
> Mike
> 
> On Mon, Jul 6, 2009 at 9:53 AM, Uwe Schindler<uwe@thetaphi.de> wrote:
> > I fixed the encoding problem by convertig the test files to UTF-8 and
> > changed the Reader charset parameter to UTF-8. All files now have old-
> style
> > native again. Could somebody check if in unix, the files only have LF
> (and
> > in windows the files have CRLF, which is the state how I committed it)?
> >
> > The overall strange/incorrect charset conversion is not touched at all,
> but
> > I strongly agree to remove it (and only keep UnicodeRussian as charset
> > parmeter allowed to the analyzer) or remove the analyzer at all.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >> -----Original Message-----
> >> From: Robert Muir [mailto:rcmuir@gmail.com]
> >> Sent: Monday, July 06, 2009 3:26 PM
> >> To: java-dev@lucene.apache.org
> >> Subject: Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen()
> >> excessively in IndexReader and IndexWriter
> >>
> >> uwe I completely agree.
> >>
> >> to add the icing on the cake the entire analyzer appears to be just a
> >> duplication of the contrib/snowball Russian functionality...!
> >>
> >> On Mon, Jul 6, 2009 at 9:19 AM, Uwe Schindler<uwe@thetaphi.de> wrote:
> >> > The whole russian analyzer is very strange and works against all
> >> > charset/unicode conventions. It defines own "charsets" (the only
> valid
> >> one
> >> > is UNICODE), which are all applied to standard java 16 bit chars. The
> >> test
> >> > shows, how this works: It open a text file in KOI8 using the "ISO-
> 88591-
> >> 1"
> >> > charset (just to not modify the codepoints when converting to 16bit
> java
> >> > chars (in principle it does a deprecated "new String(byte[],0)").
> These
> >> > completely wrong java chars are then handled by an analyzers's
> internal
> >> > charset conversion (working on the 16 bit chars).
> >> >
> >> > The only correct usage of this package is:
> >> > - open file with correct encoding (when instantiating the Reader, so
> >> specify
> >> > KOI8 or windows1251 to the Reader). The string is then correctly UTF-
> 16
> >> > encoded java chars. On this string the "pseudo-charset" UNICODE of
> this
> >> > analyzer can be used.
> >> >
> >> > In my opinion, this invalid usage of java chars should be deprecated,
> >> the
> >> > only correct pseudo-charset should be the one specified by UNICODE
> and
> >> all
> >> > charset conversions should be done using the Reader.
> >> >
> >> > Uwe
> >> >
> >> > -----
> >> > Uwe Schindler
> >> > H.-H.-Meier-Allee 63, D-28213 Bremen
> >> > http://www.thetaphi.de
> >> > eMail: uwe@thetaphi.de
> >> >
> >> >> -----Original Message-----
> >> >> From: Robert Muir [mailto:rcmuir@gmail.com]
> >> >> Sent: Monday, July 06, 2009 3:08 PM
> >> >> To: java-dev@lucene.apache.org
> >> >> Subject: Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen()
> >> >> excessively in IndexReader and IndexWriter
> >> >>
> >> >> Uwe, I think so too. This way it will not be prone to breakage
> again.
> >> >>
> >> >> On Mon, Jul 6, 2009 at 8:38 AM, Uwe Schindler<uwe@thetaphi.de>
> wrote:
> >> >> > In my opinion, these files should be converted to UTF-8 and
> committed
> >> >> again
> >> >> > (and the Reader in the test recondigured for UTF-8). Then they
can
> be
> >> >> native
> >> >> > EOL style again. The problem is that SVN can only handle the EOL
> >> style
> >> >> for
> >> >> > one-byte-per-char and UTF-8 files.
> >> >> >
> >> >> > I give it a try here (and I have a converter).
> >> >> >
> >> >> > -----
> >> >> > Uwe Schindler
> >> >> > H.-H.-Meier-Allee 63, D-28213 Bremen
> >> >> > http://www.thetaphi.de
> >> >> > eMail: uwe@thetaphi.de
> >> >> >
> >> >> >> -----Original Message-----
> >> >> >> From: Robert Muir [mailto:rcmuir@gmail.com]
> >> >> >> Sent: Monday, July 06, 2009 1:11 PM
> >> >> >> To: java-dev@lucene.apache.org
> >> >> >> Subject: Re: [jira] Commented: (LUCENE-1707) Don't use
> ensureOpen()
> >> >> >> excessively in IndexReader and IndexWriter
> >> >> >>
> >> >> >> yeah, its fixed now.
> >> >> >>
> >> >> >> On Mon, Jul 6, 2009 at 7:06 AM, Michael
> >> >> >> McCandless<lucene@mikemccandless.com> wrote:
> >> >> >> > Is this the native vs LF svn:eol-style that Uwe already
fixed?
> >> >> >> >
> >> >> >> > Mike
> >> >> >> >
> >> >> >> > On Thu, Jul 2, 2009 at 10:03 AM, Shai Erera<serera@gmail.com>
> >> wrote:
> >> >> >> >> Can somebody try to revert the change and test it
on Windows?
> >> >> >> >>
> >> >> >> >> On Thu, Jul 2, 2009 at 4:44 PM, Robert Muir <rcmuir@gmail.com>
> >> >> wrote:
> >> >> >> >>>
> >> >> >> >>> well then I have no idea why it doesn't fail.
Except that
> >> perhaps
> >> >> its
> >> >> >> >>> EOL-related (as Shai said), and that the failure
is somehow
> >> >> >> >>> platform-dependent due to newline differences
between windows
> >> and
> >> >> unix
> >> >> >> >>> (and the way these are encoded in UTF-16/stored
in SVN)?
> >> >> >> >>>
> >> >> >> >>> I don't do really any work with files in UTF-16
so this is
> just
> >> a
> >> >> >> theory.
> >> >> >> >>>
> >> >> >> >>> On Thu, Jul 2, 2009 at 9:40 AM, Mark
> >> Miller<markrmiller@gmail.com>
> >> >> >> wrote:
> >> >> >> >>> > Hudson runs all the tests and emails java-dev
if any of
> them
> >> >> fail.
> >> >> >> >>> >
> >> >> >> >>> > On Thu, Jul 2, 2009 at 9:37 AM, Robert Muir
(JIRA)
> >> >> <jira@apache.org>
> >> >> >> >>> > wrote:
> >> >> >> >>> >>
> >> >> >> >>> >>    [
> >> >> >> >>> >>
> >> >> >> >>> >> https://issues.apache.org/jira/browse/LUCENE-
> >> >> >>
> 1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> >> >> >> tabpanel&focusedCommentId=12726479#action_12726479
> >> >> >> >>> >> ]
> >> >> >> >>> >>
> >> >> >> >>> >> Robert Muir commented on LUCENE-1707:
> >> >> >> >>> >> -------------------------------------
> >> >> >> >>> >>
> >> >> >> >>> >> bq. Why doesn't Hudson encounter this
problem?
> >> >> >> >>> >>
> >> >> >> >>> >> Forgive my ignorance, does hudson also
run tests or just
> >> verify
> >> >> >> build?
> >> >> >> >>> >> These files are only used in tests!
> >> >> >> >>> >>
> >> >> >> >>> >> I agree we should correct it, and perhaps
to prevent other
> >> >> problems
> >> >> >> >>> >> these
> >> >> >> >>> >> files should be converted to UTF-8.
> >> >> >> >>> >>
> >> >> >> >>> >> For the record I am still confused about
these java-code
> >> >> analyzers
> >> >> >> that
> >> >> >> >>> >> implement snowball algorithms, why do
they exist when the
> >> same
> >> >> >> >>> >> functionality
> >> >> >> >>> >> is in contrib/snowball?
> >> >> >> >>> >>
> >> >> >> >>> >>
> >> >> >> >>> >> > Don't use ensureOpen() excessively
in IndexReader and
> >> >> IndexWriter
> >> >> >> >>> >> > --------------------------------------------------------
> ---
> >> ---
> >> >> ---
> >> >> >> >>> >> >
> >> >> >> >>> >> >                 Key: LUCENE-1707
> >> >> >> >>> >> >                 URL:
> >> >> >> >>> >> > https://issues.apache.org/jira/browse/LUCENE-1707
> >> >> >> >>> >> >             Project: Lucene
- Java
> >> >> >> >>> >> >          Issue Type: Improvement
> >> >> >> >>> >> >          Components: Index
> >> >> >> >>> >> >            Reporter: Shai
Erera
> >> >> >> >>> >> >             Fix For: 2.9
> >> >> >> >>> >> >
> >> >> >> >>> >> >         Attachments: LUCENE-1707.patch,
LUCENE-
> 1707.patch
> >> >> >> >>> >> >
> >> >> >> >>> >> >
> >> >> >> >>> >> > A spin off from here:
> >> >> >> >>> >> > http://www.nabble.com/Excessive-use-of-ensureOpen()-
> >> >> >> td24127806.html.
> >> >> >> >>> >> > We should stop calling this method
when it's not
> necessary
> >> for
> >> >> >> any
> >> >> >> >>> >> > internal Lucene code. Currently,
this code seems to hurt
> >> >> properly
> >> >> >> >>> >> > written
> >> >> >> >>> >> > apps, unnecessarily.
> >> >> >> >>> >> > Will post a patch soon
> >> >> >> >>> >>
> >> >> >> >>> >> --
> >> >> >> >>> >> This message is automatically generated
by JIRA.
> >> >> >> >>> >> -
> >> >> >> >>> >> You can reply to this email to add a
comment to the issue
> >> >> online.
> >> >> >> >>> >>
> >> >> >> >>> >>
> >> >> >> >>> >> ----------------------------------------------------------
> ---
> >> ---
> >> >> ---
> >> >> >> --
> >> >> >> >>> >> To unsubscribe, e-mail: java-dev-
> >> unsubscribe@lucene.apache.org
> >> >> >> >>> >> For additional commands, e-mail: java-dev-
> >> help@lucene.apache.org
> >> >> >> >>> >>
> >> >> >> >>> >
> >> >> >> >>> >
> >> >> >> >>> >
> >> >> >> >>> > --
> >> >> >> >>> > --
> >> >> >> >>> > - Mark
> >> >> >> >>> >
> >> >> >> >>> > http://www.lucidimagination.com
> >> >> >> >>> >
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> --
> >> >> >> >>> Robert Muir
> >> >> >> >>> rcmuir@gmail.com
> >> >> >> >>>
> >> >> >> >>> -------------------------------------------------------------
> ---
> >> ---
> >> >> --
> >> >> >> >>> To unsubscribe, e-mail: java-dev-
> unsubscribe@lucene.apache.org
> >> >> >> >>> For additional commands, e-mail: java-dev-
> help@lucene.apache.org
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >> > ---------------------------------------------------------------
> ---
> >> ---
> >> >> >> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> >> >> > For additional commands, e-mail: java-dev-
> help@lucene.apache.org
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Robert Muir
> >> >> >> rcmuir@gmail.com
> >> >> >>
> >> >> >> -----------------------------------------------------------------
> ---
> >> -
> >> >> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> >> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >> >> >
> >> >> >
> >> >> >
> >> >> > ------------------------------------------------------------------
> ---
> >> >> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> >> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Robert Muir
> >> >> rcmuir@gmail.com
> >> >>
> >> >> --------------------------------------------------------------------
> -
> >> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >> >
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Robert Muir
> >> rcmuir@gmail.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message