lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: svn commit: r1021234 - in /lucene/dev/trunk/lucene/contrib/benchmark: CHANGES.txt README.enwiki lib/xerces-2.9.1-patched-XERCESJ-1257.jar lib/xercesImpl-2.10.0.jar lib/xml-apis-2.10.0.jar lib/xml-apis-2.9.0.jar sortBench.py
Date Mon, 11 Oct 2010 12:46:24 GMT
Hah, thanks. Wanted to do this, too! :-)

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: sarowe@apache.org [mailto:sarowe@apache.org]
> Sent: Monday, October 11, 2010 4:32 AM
> To: commits@lucene.apache.org
> Subject: svn commit: r1021234 - in
> /lucene/dev/trunk/lucene/contrib/benchmark: CHANGES.txt README.enwiki
> lib/xerces-2.9.1-patched-XERCESJ-1257.jar lib/xercesImpl-2.10.0.jar lib/xml-
> apis-2.10.0.jar lib/xml-apis-2.9.0.jar sortBench.py
> 
> Author: sarowe
> Date: Mon Oct 11 02:31:59 2010
> New Revision: 1021234
> 
> URL: http://svn.apache.org/viewvc?rev=1021234&view=rev
> Log:
> Upgraded xerces-2.9.1-patched-XERCESJ-1257.jar (committed as part of
> LUCENE-1591) to xercesImpl-2.10.0.jar (which contains the fix for XERCESJ-
> 1257) and also upgraded xml-apis-2.9.0.jar to xml-apis-2.10.0.jar.
> 
> Added:
>     lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar   (with
> props)
>     lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar   (with
> props)
> Removed:
>     lucene/dev/trunk/lucene/contrib/benchmark/lib/xerces-2.9.1-patched-
> XERCESJ-1257.jar
>     lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.9.0.jar
> Modified:
>     lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt
>     lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki
>     lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py
> 
> Modified: lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/CH
> ANGES.txt?rev=1021234&r1=1021233&r2=1021234&view=diff
> ================================================================
> ==============
> --- lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt (original)
> +++ lucene/dev/trunk/lucene/contrib/benchmark/CHANGES.txt Mon Oct 11
> +++ 02:31:59 2010
> @@ -2,6 +2,15 @@ Lucene Benchmark Contrib Change Log
> 
>  The Benchmark contrib package contains code for benchmarking Lucene in a
> variety of ways.
> 
> +10/10/2010
> +  The locally built patched version of the Xerces-J jar introduced
> +  as part of LUCENE-1591 is no longer required, because Xerces
> +  2.10.0, which contains a fix for XERCESJ-1257 (see
> +  http://svn.apache.org/viewvc?view=revision&revision=554069),
> +  was released earlier this year.  Upgraded
> +  xerces-2.9.1-patched-XERCESJ-1257.jar and xml-apis-2.9.0.jar
> +  to xercesImpl-2.10.0.jar and xml-apis-2.10.0.jar. (Steven Rowe)
> +
>  8/2/2010
>    LUCENE-2582: You can now specify the default codec to use for
>    writing new segments by adding default.codec = Pulsing (for
> 
> Modified: lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/RE
> ADME.enwiki?rev=1021234&r1=1021233&r2=1021234&view=diff
> ================================================================
> ==============
> --- lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki (original)
> +++ lucene/dev/trunk/lucene/contrib/benchmark/README.enwiki Mon Oct 11
> +++ 02:31:59 2010
> @@ -20,50 +20,3 @@ After that, ant enwiki should process th  test. Ant targets
> get-enwiki, expand-enwiki, and extract-enwiki can  also be used to download,
> decompress, and extract (to individual files  in work/enwiki) the dataset,
> respectively.
> -
> -NOTE: This bug in Xerces:
> -
> -  https://issues.apache.org/jira/browse/XERCESJ-1257
> -
> -which is still present as of 2.9.1, causes an exception like this when -processing
> Wikipedia's XML:
> -
> -Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException:
> Invalid byte 2 of 4-byte UTF-8 sequence.
> -	at
> org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknow
> n Source)
> -	at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown
> Source)
> -	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown
> Source)
> -	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown
> Source)
> -	at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContent
> Dispatcher.dispatch(Unknown Source)
> -	at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Un
> known Source)
> -	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
> -	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
> -	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> -	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
> -	at
> org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(Enwik
> iDocMaker.java:77)
> -	... 1 more
> -
> -The original poster in the Xerces bug provided this patch:
> -
> ---- UTF8Reader.java	2006-11-23 00:36:53.000000000 +0100
> -+++ /home/rainman/lucene/xerces-
> 2_9_0/src/org/apache/xerces/impl/io/UTF8Reader.java	2008-04-04
> 00:40:58.000000000 +0200
> -@@ -534,6 +534,16 @@
> -                     invalidByte(4, 4, b2);
> -                 }
> -
> -+                // check if output buffer is large enough to hold 2 surrogate chars
> -+                if( out + 1 >= offset + length ){
> -+                    fBuffer[0] = (byte)b0;
> -+                    fBuffer[1] = (byte)b1;
> -+                    fBuffer[2] = (byte)b2;
> -+                    fBuffer[3] = (byte)b3;
> -+                    fOffset = 4;
> -+                    return out - offset;
> -+		}
> -+
> -                 // decode bytes into surrogate characters
> -                 int uuuuu = ((b0 << 2) & 0x001C) | ((b1 >> 4) &
0x0003);
> -                 if (uuuuu > 0x10) {
> -
> -which I've applied to Xerces 2.9.1 sources, and committed under -lib/xerces-
> 2.9.1-patched-XERCESJ-1257.jar.  Once XERCESJ-1257 is fixed -we can upgrade
> to a standard Xerces release.
> 
> Added: lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-2.10.0.jar
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/lib/
> xercesImpl-2.10.0.jar?rev=1021234&view=auto
> ================================================================
> ==============
> Binary file - no diff available.
> 
> Propchange: lucene/dev/trunk/lucene/contrib/benchmark/lib/xercesImpl-
> 2.10.0.jar
> ------------------------------------------------------------------------------
>     svn:mime-type = application/octet-stream
> 
> Added: lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-2.10.0.jar
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/lib/
> xml-apis-2.10.0.jar?rev=1021234&view=auto
> ================================================================
> ==============
> Binary file - no diff available.
> 
> Propchange: lucene/dev/trunk/lucene/contrib/benchmark/lib/xml-apis-
> 2.10.0.jar
> ------------------------------------------------------------------------------
>     svn:mime-type = application/octet-stream
> 
> Modified: lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/benchmark/sort
> Bench.py?rev=1021234&r1=1021233&r2=1021234&view=diff
> ================================================================
> ==============
> --- lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py (original)
> +++ lucene/dev/trunk/lucene/contrib/benchmark/sortBench.py Mon Oct 11
> +++ 02:31:59 2010
> @@ -227,7 +227,7 @@ content.source=org.apache.lucene.benchma
>        print '  mkdir %s' % LOG_DIR
>        os.makedirs(LOG_DIR)
> 
> -    command = '%s -classpath
> ../../build/classes/java:../../build/classes/demo:../../build/contrib/highlighter/cl
> asses/java:lib/commons-digester-1.7.jar:lib/commons-collections-
> 3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging-
> 1.0.4.jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.9.0.jar:lib/xml-apis-
> 2.9.0.jar:../../build/contrib/benchmark/classes/java
> org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' %
> (JAVA_COMMAND, algFile, fullLogFileName)
> +    command = '%s -classpath
> + ../../build/classes/java:../../build/classes/demo:../../build/contrib/
> + highlighter/classes/java:lib/commons-digester-1.7.jar:lib/commons-coll
> + ections-3.1.jar:lib/commons-compress-1.0.jar:lib/commons-logging-1.0.4
> + .jar:lib/commons-beanutils-1.7.0.jar:lib/xerces-2.10.0.jar:lib/xml-api
> + s-2.10.0.jar:../../build/contrib/benchmark/classes/java
> + org.apache.lucene.benchmark.byTask.Benchmark %s > "%s" 2>&1' %
> + (JAVA_COMMAND, algFile, fullLogFileName)
> 
>      if DEBUG:
>        print 'command=%s' % command
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message