lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <whosc...@lbl.gov>
Subject [ANN] Nux-1.3 released
Date Wed, 03 Aug 2005 18:34:37 GMT
The Nux-1.3 release has been uploaded to

     http://dsd.lbl.gov/nux/

Nux is an open-source Java toolkit making efficient and powerful XML  
processing easy.


Changelog:

     •    Upgraded to saxonb-8.5 (saxon-8.4 and 8.3 should continue  
to work as well).

     •    Upgraded to xom-1.1-rc1 (with compatible performance  
patches). Plain xom-1.0 should continue to work as well, albeit less  
efficiently.

     •    Numerous bnux Binary XML performance enhancements for  
serialization and deserialization (UTF-8 character encoding, buffer  
management, symbol table, pack sorting, cache locality, etc).  
Overall, bnux is now about twice as fast, and, perhaps more  
importantly, has a much more uniform performance profile, no matter  
what kind of document flavour is thrown at it. It routinely delivers  
50-100 MB/sec deserialization performance, and 30-70 MB/sec  
serialization performance (commodity PC 2004). It is roughly 5-10  
times faster than xom-1.1 with xerces-2.7.1 (which, in turn, is  
faster than saxonb-8.5, dom4j-1.6.1 and xerces-2.7.1 DOM). Further,  
preliminary measurements indicate bnux deserialization and  
serialization to be consistently 2-3 times faster than Sun's  
FastInfoSet implementation, using XOM. Saxon's PTree could not be  
tested as it is only available in the commercial version. The only  
remaining area with substantial potential for performance improvement  
seems to be complex namespace handling. This might be addressed by  
slightly restructuring private XOM internals in a future version.

     •    BinaryXMLTest now also has command line support for testing  
and benchmarking Saxon, DOM and FastInfoSet (besides bnux and XOM).

     •    Rewrote XQueryCommand. The new nux/bin/fire-xquery is a  
more powerful, flexible and reliable command line test tool that runs  
a given XQuery against a set of files and prints the result sequence.  
In addition, it supports schema validation, XInclude (via XOM), an  
XQuery update facility, malformed HTML parsing (via TagSoup) and much  
more. It's available for Unix and Windows, and works like any other  
decent Unix command line tool.

     •    Removed ValidationCommand (made obsolete by the fire-xquery  
functionality).

     •    Added experimental XQuery in-place update functionality.  
Comments on the usefulness of the current behaviour are especially  
welcome, as are suggestions for potential improvements.

     •    Added nux.xom.xquery.ResultSequenceSerializer, which  
serializes an XQuery/XPath2 result sequence onto a given output  
stream, using various configurable serialization options such  
encoding and indentation. Implements the W3C XQuery/XSLT2  
Serialization Draft Spec. Also implements an alternative wrapping  
algorithm that ensures that any arbitrary result sequence can always  
be output as a well-formed XML document.

     •    Added XQueryFactory.createXQuery(File file, URI baseURI)  
and XQueryPool.getXQuery(File file, URI baseURI) to allow for  
separation of the location of the query file and input XML files.

     •    The default XQuery DocumentURIResolver now recognizes the  
".bnux" file extension as binary XML, and parses it accordingly. For  
example, a query can be 'doc("samples/data/articles.xml.bnux")/ 
articles/*'

     •    Added FileUtil.listFiles(). Returns the URIs of all files  
who's path matches at least one of the given inclusion wildcard or  
regular expressions but none of the given exclusion wildcard or  
regular expressions; starting from the given directory, optionally  
with recursive directory traversal, insensitive to underlying  
operating system conventions.

     •    XOMUtil.Normalizer now uses XML whitespace definition  
rather than Java whitespace definition.

     •    Added XOMUtil.Normalizer.STRIP, which removes Texts that  
consist of whitespace-only (boundary whitespace), retaining other  
strings unchanged.

     •    Added AnalyzerUtil.getPorterStemmerAnalyzer() for English  
language stemming on full text search.

     •    Added XOMUtil.toDocument(String xml) convenience method to  
parse a string.

     •    Moved XOMUtil.toByteArray() and XOMUtil.toString() into  
class FileUtil. The old methods remain available but have been  
deprecated.

     •    Added "jar-bnux" ant target to optionally build a minimal  
jar file (20 KB) for binary XML only.

     •    Added more test documents to samples/data directory.

     •    Updated license blurbs to 2005.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message