subversion-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From br...@apache.org
Subject svn commit: r1405742 [1/4] - in /subversion/upstream/utf8proc: ./ pgsql/ ruby/ ruby/gem/
Date Mon, 05 Nov 2012 10:54:57 GMT
Author: brane
Date: Mon Nov  5 10:54:56 2012
New Revision: 1405742

URL: http://svn.apache.org/viewvc?rev=1405742&view=rev
Log:
Import upstream utf8proc-v1.1.5

Added:
    subversion/upstream/utf8proc/
    subversion/upstream/utf8proc/Changelog
    subversion/upstream/utf8proc/LICENSE
    subversion/upstream/utf8proc/Makefile
    subversion/upstream/utf8proc/README
    subversion/upstream/utf8proc/data_generator.rb
    subversion/upstream/utf8proc/lump.txt
    subversion/upstream/utf8proc/pgsql/
    subversion/upstream/utf8proc/pgsql/Makefile
    subversion/upstream/utf8proc/pgsql/utf8proc.sql
    subversion/upstream/utf8proc/pgsql/utf8proc_pgsql.c
    subversion/upstream/utf8proc/ruby/
    subversion/upstream/utf8proc/ruby/extconf.rb
    subversion/upstream/utf8proc/ruby/gem/
    subversion/upstream/utf8proc/ruby/gem/LICENSE
    subversion/upstream/utf8proc/ruby/gem/utf8proc.gemspec
    subversion/upstream/utf8proc/ruby/utf8proc.rb
    subversion/upstream/utf8proc/ruby/utf8proc_native.c
    subversion/upstream/utf8proc/utf8proc.c
    subversion/upstream/utf8proc/utf8proc.h
    subversion/upstream/utf8proc/utf8proc_data.c

Added: subversion/upstream/utf8proc/Changelog
URL: http://svn.apache.org/viewvc/subversion/upstream/utf8proc/Changelog?rev=1405742&view=auto
==============================================================================
--- subversion/upstream/utf8proc/Changelog (added)
+++ subversion/upstream/utf8proc/Changelog Mon Nov  5 10:54:56 2012
@@ -0,0 +1,128 @@
+Changelog
+
+2006-06-02:
+- initial release of version 0.1
+
+2006-06-05:
+- changed behaviour of PostgreSQL function to return NULL in case of
+  invalid input, rather than raising an exceptional condition
+- improved efficiency of PostgreSQL function (no transformation to C string
+  is done)
+
+2006-06-20:
+- added -fpic compiler flag in Makefile
+- fixed bug in the C code for the ruby library (usage of non-existent
+  function)
+
+Release of version 0.2
+
+
+2006-07-18:
+- changed normalization from NFC to NFKC for postgresql unifold function
+
+2006-08-04:
+- added support to mark the beginning of a grapheme cluster with 0xFF
+  (option: CHARBOUND)
+- added the ruby method String#chars, which is returning an array of UTF-8
+  encoded grapheme clusters
+- added NLF2LF transformation in postgresql unifold function
+- added the DECOMPOSE option, if you neither use COMPOSE or DECOMPOSE, no
+  normalization will be performed (different from previous versions)
+- using integer constants rather than C-strings for character properties
+- fixed (hopefully) a problem with the ruby library on Mac OS X, which
+  occured when compiler optimization was switched on
+
+Release of version 0.3
+
+
+2006-09-17:
+- added the LUMP option, which lumps certain characters together
+  (see lump.txt) (also used for the PostgreSQL "unifold" function)
+- added the STRIPMARK option, which strips marking characters
+  (or marks of composed characters)
+- deprecated ruby method String#char_ary in favour of String#utf8chars
+
+Release of version 1.0
+
+
+2006-09-20:
+- included a gem file for the ruby version of the library
+
+Release of version 1.0.1
+
+
+2006-09-21:
+- included a check in Integer#utf8, which raises an exception, if the given
+  code-point is invalid because of being too high (this was missing yet)
+
+2006-12-26:
+- added support for PostgreSQL version 8.2
+
+Release of version 1.0.2
+
+
+2007-03-16:
+- Fixed a bug in the ruby library, which caused an error, when splitting an
+  empty string at grapheme cluster boundaries (method String#utf8chars).
+
+Release of version 1.0.3
+
+
+2007-06-25:
+- Added a new PostgreSQL function 'unistrip', which behaves like 'unifold',
+  but also removes all character marks (e.g. accents).
+
+2007-07-22:
+- Changed license from BSD to MIT style.
+- Added a new function 'utf8proc_codepoint_valid' to the C library.
+- Changed compiler flags in Makefile from -g -O0 to -O2
+- The ruby script, which was used to build the utf8proc_data.c file, is now
+  included in the distribution.
+
+Release of version 1.1.1
+
+
+2007-07-25:
+- Fixed a serious bug in the data file generator, which caused characters
+  being treated incorrectly, when stripping default ignorable characters or
+  calculating grapheme cluster boundaries.
+
+Release of version 1.1.2
+
+
+2008-10-04:
+- Added a function utf8proc_version returning a string containing the version
+  number of the library.
+- Included a target libutf8proc.dylib for MacOSX.
+
+2009-05-01:
+- PostgreSQL 8.3 compatibility (use of SET_VARSIZE macro)
+
+Release of version 1.1.3
+
+
+2009-06-14:
+- replaced C++ style comments for compatibility reasons
+- added typecasts to suppress compiler warnings
+- removed redundant source files for ruby-gemfile generation
+
+2009-08-19:
+- Changed copyright notice for Public Software Group e. V.
+- Minor changes in the README file
+- Release of version 1.1.4
+
+2009-08-20:
+- Use RSTRING_PTR() and RSTRING_LEN() instead of RSTRING()->ptr and
+  RSTRING()->len for ruby1.9 compatibility (and #define them, if not
+  existent)
+
+2009-10-02:
+- Patches for compatibility with Microsoft Visual Studio
+
+2009-10-08:
+- Fixes to make utf8proc usable in C++ programs
+
+2009-10-16:
+- Release of version 1.1.5
+
+2009-10-08:

Added: subversion/upstream/utf8proc/LICENSE
URL: http://svn.apache.org/viewvc/subversion/upstream/utf8proc/LICENSE?rev=1405742&view=auto
==============================================================================
--- subversion/upstream/utf8proc/LICENSE (added)
+++ subversion/upstream/utf8proc/LICENSE Mon Nov  5 10:54:56 2012
@@ -0,0 +1,64 @@
+
+Copyright (c) 2009 Public Software Group e. V., Berlin, Germany
+
+Permission is hereby granted, free of charge, to any person obtaining a
+copy of this software and associated documentation files (the "Software"),
+to deal in the Software without restriction, including without limitation
+the rights to use, copy, modify, merge, publish, distribute, sublicense,
+and/or sell copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+DEALINGS IN THE SOFTWARE.
+
+
+This software distribution contains derived data from a modified version of
+the Unicode data files. The following license applies to that data:
+
+COPYRIGHT AND PERMISSION NOTICE
+
+Copyright (c) 1991-2007 Unicode, Inc. All rights reserved. Distributed
+under the Terms of Use in http://www.unicode.org/copyright.html.
+
+Permission is hereby granted, free of charge, to any person obtaining a
+copy of the Unicode data files and any associated documentation (the "Data
+Files") or Unicode software and any associated documentation (the
+"Software") to deal in the Data Files or Software without restriction,
+including without limitation the rights to use, copy, modify, merge,
+publish, distribute, and/or sell copies of the Data Files or Software, and
+to permit persons to whom the Data Files or Software are furnished to do
+so, provided that (a) the above copyright notice(s) and this permission
+notice appear with all copies of the Data Files or Software, (b) both the
+above copyright notice(s) and this permission notice appear in associated
+documentation, and (c) there is clear notice in each modified Data File or
+in the Software as well as in the documentation associated with the Data
+File(s) or Software that the data or software has been modified.
+
+THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
+KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF
+THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS
+INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR
+CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF
+USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
+TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
+PERFORMANCE OF THE DATA FILES OR SOFTWARE.
+
+Except as contained in this notice, the name of a copyright holder shall
+not be used in advertising or otherwise to promote the sale, use or other
+dealings in these Data Files or Software without prior written
+authorization of the copyright holder.
+
+
+Unicode and the Unicode logo are trademarks of Unicode, Inc., and may be
+registered in some jurisdictions. All other trademarks and registered
+trademarks mentioned herein are the property of their respective owners.
+

Added: subversion/upstream/utf8proc/Makefile
URL: http://svn.apache.org/viewvc/subversion/upstream/utf8proc/Makefile?rev=1405742&view=auto
==============================================================================
--- subversion/upstream/utf8proc/Makefile (added)
+++ subversion/upstream/utf8proc/Makefile Mon Nov  5 10:54:56 2012
@@ -0,0 +1,68 @@
+# libutf8proc Makefile
+
+
+# settings
+
+cflags = -O2 -std=c99 -pedantic -Wall -fpic $(CFLAGS)
+cc = $(CC) $(cflags)
+
+
+# meta targets
+
+c-library: libutf8proc.a libutf8proc.so
+
+ruby-library: ruby/utf8proc_native.so
+
+pgsql-library: pgsql/utf8proc_pgsql.so
+
+all: c-library ruby-library ruby-gem pgsql-library
+
+clean::
+	rm -f utf8proc.o libutf8proc.a libutf8proc.so
+	cd ruby/ && test -e Makefile && (make clean && rm -f Makefile) ||
true
+	rm -Rf ruby/gem/lib ruby/gem/ext
+	rm -f ruby/gem/utf8proc-*.gem
+	cd pgsql/ && make clean
+
+# real targets
+
+utf8proc.o: utf8proc.h utf8proc.c utf8proc_data.c
+	$(cc) -c -o utf8proc.o utf8proc.c
+
+libutf8proc.a: utf8proc.o
+	rm -f libutf8proc.a
+	ar rs libutf8proc.a utf8proc.o
+
+libutf8proc.so: utf8proc.o
+	$(cc) -shared -o libutf8proc.so utf8proc.o
+	chmod a-x libutf8proc.so
+
+libutf8proc.dylib: utf8proc.o
+	$(cc) -dynamiclib -o $@ $^ -install_name $(libdir)/$@
+
+ruby/Makefile: ruby/extconf.rb
+	cd ruby && ruby extconf.rb
+
+ruby/utf8proc_native.so: utf8proc.h utf8proc.c utf8proc_data.c \
+		ruby/utf8proc_native.c ruby/Makefile
+	cd ruby && make
+
+ruby/gem/lib/utf8proc.rb: ruby/utf8proc.rb
+	test -e ruby/gem/lib || mkdir ruby/gem/lib
+	cp ruby/utf8proc.rb ruby/gem/lib/
+
+ruby/gem/ext/extconf.rb: ruby/extconf.rb
+	test -e ruby/gem/ext || mkdir ruby/gem/ext
+	cp ruby/extconf.rb ruby/gem/ext/
+
+ruby/gem/ext/utf8proc_native.c: utf8proc.h utf8proc_data.c utf8proc.c ruby/utf8proc_native.c
+	test -e ruby/gem/ext || mkdir ruby/gem/ext
+	cat utf8proc.h utf8proc_data.c utf8proc.c ruby/utf8proc_native.c | grep -v '#include "utf8proc.h"'
| grep -v '#include "utf8proc_data.c"' | grep -v '#include "../utf8proc.c"' > ruby/gem/ext/utf8proc_native.c
+
+ruby-gem:: ruby/gem/lib/utf8proc.rb ruby/gem/ext/extconf.rb ruby/gem/ext/utf8proc_native.c
+	cd ruby/gem && gem build utf8proc.gemspec
+
+pgsql/utf8proc_pgsql.so: utf8proc.h utf8proc.c utf8proc_data.c \
+		pgsql/utf8proc_pgsql.c
+	cd pgsql && make
+

Added: subversion/upstream/utf8proc/README
URL: http://svn.apache.org/viewvc/subversion/upstream/utf8proc/README?rev=1405742&view=auto
==============================================================================
--- subversion/upstream/utf8proc/README (added)
+++ subversion/upstream/utf8proc/README Mon Nov  5 10:54:56 2012
@@ -0,0 +1,116 @@
+
+Please read the LICENSE file, which is shipping with this software.
+
+
+*** QUICK START ***
+
+For compilation of the C library call "make c-library", for compilation of
+the ruby library call "make ruby-library" and for compilation of the
+PostgreSQL extension call "make pgsql-library".
+
+For ruby you can also create a gem-file by calling "make ruby-gem".
+
+"make all" can be used to build everything, but both ruby and PostgreSQL
+installations are required in this case.
+
+
+*** GENERAL INFORMATION ***
+
+The C library is found in this directory after successful compilation and
+is named "libutf8proc.a" and "libutf8proc.so". The ruby library consists of
+the files "utf8proc.rb" and "utf8proc_native.so", which are found in the
+subdirectory "ruby/". If you chose to create a gem-file it is placed in the
+"ruby/gem" directory. The PostgreSQL extension is named "utf8proc_pgsql.so"
+and resides in the "pgsql/" directory.
+
+Both the ruby library and the PostgreSQL extension are built as stand-alone
+libraries and are therefore not dependent the dynamic version of the
+C library files, but this behaviour might change in future releases.
+
+The Unicode version being supported is 5.0.0.
+Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as
+      version 5.0.0 had not been available at the time of implementation.
+
+For Unicode normalizations, the following options have to be used:
+Normalization Form C:  STABLE, COMPOSE
+Normalization Form D:  STABLE, DECOMPOSE
+Normalization Form KC: STABLE, COMPOSE, COMPAT
+Normalization Form KD: STABLE, DECOMPOSE, COMPAT
+
+
+*** C LIBRARY ***
+
+The documentation for the C library is found in the utf8proc.h header file.
+"utf8proc_map" is most likely function you will be using for mapping UTF-8
+strings, unless you want to allocate memory yourself.
+
+
+*** RUBY API ***
+
+The ruby library adds the methods "utf8map" and "utf8map!" to the String
+class, and the method "utf8" to the Integer class.
+
+The String#utf8map method does the same as the "utf8proc_map" C function.
+Options for the mapping procedure are passed as symbols, i.e:
+"Hello".utf8map(:casefold) => "hello"
+
+The descriptions of all options are found in the C header file
+"utf8proc.h". Please notice that the according symbols in ruby are all
+lowercase.
+
+String#utf8map! is the destructive function in the meaning that the string
+is replaced by the result.
+
+There are shortcuts for the 4 normalization forms specified by Unicode:
+String#utf8nfd,  String#utf8nfd!,
+String#utf8nfc,  String#utf8nfc!,
+String#utf8nfkd, String#utf8nfkd!,
+String#utf8nfkc, String#utf8nfkc!
+
+The method Integer#utf8 returns a UTF-8 string, which is containing the
+unicode char given by the code point.
+0x000A.utf8 => "\n"
+0x2028.utf8 => "\342\200\250"
+
+
+*** POSTGRESQL API ***
+
+For PostgreSQL there are two SQL functions supplied named "unifold" and
+"unistrip". These functions function can be used to prepare index fields in
+order to be folded in a way where string-comparisons make more sense, e.g.
+where "bathtub" == "bath<soft hyphen>tub"
+or "Hello World" == "hello world".
+
+CREATE TABLE people (
+  id    serial8 primary key,
+  name  text,
+  CHECK (unifold(name) NOTNULL)
+);
+CREATE INDEX name_idx ON people (unifold(name));
+SELECT * FROM people WHERE unifold(name) = unifold('John Doe');
+
+The function "unistrip" removes character marks like accents or diaeresis,
+while "unifold" keeps then.
+
+NOTICE: The outputs of the function can change between releases, as
+        utf8proc does not follow a versioning stability policy. You have to
+        rebuild your database indicies, if you upgrade to a newer version
+        of utf8proc.
+
+
+*** TODO ***
+
+- detect stable code points and process segments independently in order to
+  save memory
+- do a quick check before normalizing strings to optimize speed
+- support stream processing
+
+
+*** CONTACT ***
+
+If you find any bugs or experience difficulties in compiling this software,
+please contact us:
+
+Project page: http://www.public-software-group.org/utf8proc
+
+



Mime
View raw message