Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 73015 invoked from network); 11 Oct 2010 12:46:24 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Oct 2010 12:46:24 -0000 Received: (qmail 69527 invoked by uid 500); 11 Oct 2010 12:46:23 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 69419 invoked by uid 500); 11 Oct 2010 12:46:23 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 69411 invoked by uid 99); 11 Oct 2010 12:46:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Oct 2010 12:46:23 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [85.25.71.29] (HELO mail.troja.net) (85.25.71.29) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Oct 2010 12:46:15 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.troja.net (Postfix) with ESMTP id 464B245F5D7 for ; Mon, 11 Oct 2010 14:45:55 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mail.troja.net Received: from mail.troja.net ([127.0.0.1]) by localhost (megaira.troja.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S+IrIKk5keu0 for ; Mon, 11 Oct 2010 14:45:43 +0200 (CEST) Received: from VEGA (unknown [82.113.121.113]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.troja.net (Postfix) with ESMTPSA id 60DEF45F5D0 for ; Mon, 11 Oct 2010 14:45:41 +0200 (CEST) From: "Uwe Schindler" To: References: <20101011023200.2610D23888EC@eris.apache.org> In-Reply-To: <20101011023200.2610D23888EC@eris.apache.org> Subject: RE: svn commit: r1021234 - in /lucene/dev/trunk/lucene/contrib/benchmark: CHANGES.txt README.enwiki lib/xerces-2.9.1-patched-XERCESJ-1257.jar lib/xercesImpl-2.10.0.jar lib/xml-apis-2.10.0.jar lib/xml-apis-2.9.0.jar sortBench.py Date: Mon, 11 Oct 2010 14:46:24 +0200 Message-ID: <007b01cb6942$51efa910$f5cefb30$@thetaphi.de> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Thread-index: AQJJSCmjvOWq36b55bK2vFZtFRtZOZI/9D6w Content-language: de X-Virus-Checked: Checked by ClamAV on apache.org Hah, thanks. Wanted to do this, too! :-) ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: sarowe@apache.org [mailto:sarowe@apache.org] > Sent: Monday, October 11, 2010 4:32 AM > To: commits@lucene.apache.org > Subject: svn commit: r1021234 - in > /lucene/dev/trunk/lucene/contrib/benchmark: CHANGES.txt README.enwiki > lib/xerces-2.9.1-patched-XERCESJ-1257.jar lib/xercesImpl-2.10.0.jar = lib/xml- > apis-2.10.0.jar lib/xml-apis-2.9.0.jar sortBench.py >=20 > Author: sarowe > Date: Mon Oct 11 02:31:59 2010 > New Revision: 1021234 >=20 > URL: http://svn.apache.org/viewvc?rev=3D1021234&view=3Drev > Log: > Upgraded xerces-2.9.1-patched-XERCESJ-1257.jar (committed as part of > LUCENE-1591) to xercesImpl-2.10.0.jar (which contains the fix for = XERCESJ- > 1257) and also upgraded xml-apis-2.9.0.jar to xml-apis-2.10.0.jar. >=20 > Added: > = lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar = (with > props) > lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar = (with > props) > Removed: > = lucene/dev/trunk/lucene/contrib/benchmark/lib/xerces-2.9.1-patched- > XERCESJ-1257.jar > lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.9.0.jar > Modified: > lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt > lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki > lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py >=20 > Modified: lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt > URL: > = http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/CH= > ANGES.txt?rev=3D1021234&r1=3D1021233&r2=3D1021234&view=3Ddiff > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt (original) > +++ lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt Mon Oct 11 > +++ 02:31:59 2010 > @@ -2,6 +2,15 @@ Lucene Benchmark Contrib Change Log >=20 > The Benchmark contrib package contains code for benchmarking Lucene = in a > variety of ways. >=20 > +10/10/2010 > + The locally built patched version of the Xerces-J jar introduced > + as part of LUCENE-1591 is no longer required, because Xerces > + 2.10.0, which contains a fix for XERCESJ-1257 (see > + http://svn.apache.org/viewvc?view=3Drevision&revision=3D554069), > + was released earlier this year. Upgraded > + xerces-2.9.1-patched-XERCESJ-1257.jar and xml-apis-2.9.0.jar > + to xercesImpl-2.10.0.jar and xml-apis-2.10.0.jar. (Steven Rowe) > + > 8/2/2010 > LUCENE-2582: You can now specify the default codec to use for > writing new segments by adding default.codec =3D Pulsing (for >=20 > Modified: lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki > URL: > = http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/RE= > ADME.enwiki?rev=3D1021234&r1=3D1021233&r2=3D1021234&view=3Ddiff > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki (original) > +++ lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki Mon Oct 11 > +++ 02:31:59 2010 > @@ -20,50 +20,3 @@ After that, ant enwiki should process th test. Ant = targets > get-enwiki, expand-enwiki, and extract-enwiki can also be used to = download, > decompress, and extract (to individual files in work/enwiki) the = dataset, > respectively. > - > -NOTE: This bug in Xerces: > - > - https://issues.apache.org/jira/browse/XERCESJ-1257 > - > -which is still present as of 2.9.1, causes an exception like this = when -processing > Wikipedia's XML: > - > -Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException: > Invalid byte 2 of 4-byte UTF-8 sequence. > - at > = org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknow= > n Source) > - at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown > Source) > - at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown > Source) > - at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown > Source) > - at > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContent > Dispatcher.dispatch(Unknown Source) > - at > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Un > known Source) > - at org.apache.xerces.parsers.XML11Configuration.parse(Unknown > Source) > - at org.apache.xerces.parsers.XML11Configuration.parse(Unknown > Source) > - at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) > - at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown > Source) > - at > = org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(Enwik > iDocMaker.java:77) > - ... 1 more > - > -The original poster in the Xerces bug provided this patch: > - > ---- UTF8Reader.java 2006-11-23 00:36:53.000000000 +0100 > -+++ /home/rainman/lucene/xerces- > 2_9_0/src/org/apache/xerces/impl/io/UTF8Reader.java 2008-04-04 > 00:40:58.000000000 +0200 > -@@ -534,6 +534,16 @@ > - invalidByte(4, 4, b2); > - } > - > -+ // check if output buffer is large enough to hold 2 = surrogate chars > -+ if( out + 1 >=3D offset + length ){ > -+ fBuffer[0] =3D (byte)b0; > -+ fBuffer[1] =3D (byte)b1; > -+ fBuffer[2] =3D (byte)b2; > -+ fBuffer[3] =3D (byte)b3; > -+ fOffset =3D 4; > -+ return out - offset; > -+ } > -+ > - // decode bytes into surrogate characters > - int uuuuu =3D ((b0 << 2) & 0x001C) | ((b1 >> 4) & = 0x0003); > - if (uuuuu > 0x10) { > - > -which I've applied to Xerces 2.9.1 sources, and committed under = -lib/xerces- > 2.9.1-patched-XERCESJ-1257.jar. Once XERCESJ-1257 is fixed -we can = upgrade > to a standard Xerces release. >=20 > Added: = lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar > URL: > = http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/li= b/ > xercesImpl-2.10.0.jar?rev=3D1021234&view=3Dauto > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Binary file - no diff available. >=20 > Propchange: lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl- > 2.10.0.jar > = -------------------------------------------------------------------------= ----- > svn:mime-type =3D application/octet-stream >=20 > Added: = lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar > URL: > = http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/li= b/ > xml-apis-2.10.0.jar?rev=3D1021234&view=3Dauto > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Binary file - no diff available. >=20 > Propchange: lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis- > 2.10.0.jar > = -------------------------------------------------------------------------= ----- > svn:mime-type =3D application/octet-stream >=20 > Modified: lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py > URL: > = http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/so= rt > Bench.py?rev=3D1021234&r1=3D1021233&r2=3D1021234&view=3Ddiff > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py (original) > +++ lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py Mon Oct 11 > +++ 02:31:59 2010 > @@ -227,7 +227,7 @@ content.source=3Dorg.apache.lucene.benchma > print ' mkdir %s' % LOG_DIR > os.makedirs(LOG_DIR) >=20 > - command =3D '%s -classpath > = ../../build/classes/java:../../build/classes/demo:../../build/contrib/hig= hlighter/cl > asses/java:lib/commons-digester-1.7.jar:lib/commons-collections- > 3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging- > = 1.0.4.jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.9.0.jar:lib/xml-ap= is- > 2.9.0.jar:../../build/contrib/benchmark/classes/java > org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' % > (JAVA_COMMAND, algFile, fullLogFileName) > + command =3D '%s -classpath > + = ../../build/classes/java:../../build/classes/demo:../../build/contrib/ > + = highlighter/classes/java:lib/commons-digester-1.7.jar:lib/commons-coll > + = ections-3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging-1.0.4 > + = .jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.10.0.jar:lib/xml-api > + s-2.10.0.jar:../../build/contrib/benchmark/classes/java > + org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' % > + (JAVA_COMMAND, algFile, fullLogFileName) >=20 > if DEBUG: > print 'command=3D%s' % command >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org