Return-Path: X-Original-To: apmail-couchdb-commits-archive@www.apache.org Delivered-To: apmail-couchdb-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 294261016F for ; Thu, 13 Feb 2014 18:14:17 +0000 (UTC) Received: (qmail 75874 invoked by uid 500); 13 Feb 2014 18:12:28 -0000 Delivered-To: apmail-couchdb-commits-archive@couchdb.apache.org Received: (qmail 74922 invoked by uid 500); 13 Feb 2014 18:12:05 -0000 Mailing-List: contact commits-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list commits@couchdb.apache.org Received: (qmail 74776 invoked by uid 99); 13 Feb 2014 18:12:00 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Feb 2014 18:12:00 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id 235618A9216; Thu, 13 Feb 2014 18:12:00 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: benoitc@apache.org To: commits@couchdb.apache.org Date: Thu, 13 Feb 2014 18:12:04 -0000 Message-Id: In-Reply-To: <113237e1f9904b5cb0e6f0a915a4b1ba@git.apache.org> References: <113237e1f9904b5cb0e6f0a915a4b1ba@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [06/57] [abbrv] remove couch_collate http://git-wip-us.apache.org/repos/asf/couchdb/blob/81332b78/apps/couch_collate/platform/osx/icu/unicode/usearch.h ---------------------------------------------------------------------- diff --git a/apps/couch_collate/platform/osx/icu/unicode/usearch.h b/apps/couch_collate/platform/osx/icu/unicode/usearch.h deleted file mode 100644 index deaa78f..0000000 --- a/apps/couch_collate/platform/osx/icu/unicode/usearch.h +++ /dev/null @@ -1,766 +0,0 @@ -/* -********************************************************************** -* Copyright (C) 2001-2008 IBM and others. All rights reserved. -********************************************************************** -* Date Name Description -* 06/28/2001 synwee Creation. -********************************************************************** -*/ -#ifndef USEARCH_H -#define USEARCH_H - -#include "unicode/utypes.h" - -#if !UCONFIG_NO_COLLATION && !UCONFIG_NO_BREAK_ITERATION - -#include "unicode/ucol.h" -#include "unicode/ucoleitr.h" -#include "unicode/ubrk.h" - -/** - * \file - * \brief C API: StringSearch - * - * C Apis for an engine that provides language-sensitive text searching based - * on the comparison rules defined in a UCollator data struct, - * see ucol.h. This ensures that language eccentricity can be - * handled, e.g. for the German collator, characters ß and SS will be matched - * if case is chosen to be ignored. - * See the - * "ICU Collation Design Document" for more information. - *

- * The algorithm implemented is a modified form of the Boyer Moore's search. - * For more information see - * - * "Efficient Text Searching in Java", published in Java Report - * in February, 1999, for further information on the algorithm. - *

- * There are 2 match options for selection:
- * Let S' be the sub-string of a text string S between the offsets start and - * end . - *
- * A pattern string P matches a text string S at the offsets - * if - *

 
- * option 1. Some canonical equivalent of P matches some canonical equivalent 
- *           of S'
- * option 2. P matches S' and if P starts or ends with a combining mark, 
- *           there exists no non-ignorable combining mark before or after S' 
- *           in S respectively. 
- * 
- * Option 2. will be the default. - *

- * This search has APIs similar to that of other text iteration mechanisms - * such as the break iterators in ubrk.h. Using these - * APIs, it is easy to scan through text looking for all occurances of - * a given pattern. This search iterator allows changing of direction by - * calling a reset followed by a next or previous. - * Though a direction change can occur without calling reset first, - * this operation comes with some speed penalty. - * Generally, match results in the forward direction will match the result - * matches in the backwards direction in the reverse order - *

- * usearch.h provides APIs to specify the starting position - * within the text string to be searched, e.g. usearch_setOffset, - * usearch_preceding and usearch_following. Since the - * starting position will be set as it is specified, please take note that - * there are some dangerous positions which the search may render incorrect - * results: - *

    - *
  • The midst of a substring that requires normalization. - *
  • If the following match is to be found, the position should not be the - * second character which requires to be swapped with the preceding - * character. Vice versa, if the preceding match is to be found, - * position to search from should not be the first character which - * requires to be swapped with the next character. E.g certain Thai and - * Lao characters require swapping. - *
  • If a following pattern match is to be found, any position within a - * contracting sequence except the first will fail. Vice versa if a - * preceding pattern match is to be found, a invalid starting point - * would be any character within a contracting sequence except the last. - *
- *

- * A breakiterator can be used if only matches at logical breaks are desired. - * Using a breakiterator will only give you results that exactly matches the - * boundaries given by the breakiterator. For instance the pattern "e" will - * not be found in the string "\u00e9" if a character break iterator is used. - *

- * Options are provided to handle overlapping matches. - * E.g. In English, overlapping matches produces the result 0 and 2 - * for the pattern "abab" in the text "ababab", where else mutually - * exclusive matches only produce the result of 0. - *

- * Though collator attributes will be taken into consideration while - * performing matches, there are no APIs here for setting and getting the - * attributes. These attributes can be set by getting the collator - * from usearch_getCollator and using the APIs in ucol.h. - * Lastly to update String Search to the new collator attributes, - * usearch_reset() has to be called. - *

- * Restriction:
- * Currently there are no composite characters that consists of a - * character with combining class > 0 before a character with combining - * class == 0. However, if such a character exists in the future, the - * search mechanism does not guarantee the results for option 1. - * - *

- * Example of use:
- *


- * char *tgtstr = "The quick brown fox jumped over the lazy fox";
- * char *patstr = "fox";
- * UChar target[64];
- * UChar pattern[16];
- * UErrorCode status = U_ZERO_ERROR;
- * u_uastrcpy(target, tgtstr);
- * u_uastrcpy(pattern, patstr);
- *
- * UStringSearch *search = usearch_open(pattern, -1, target, -1, "en_US", 
- *                                  NULL, &status);
- * if (U_SUCCESS(status)) {
- *     for (int pos = usearch_first(search, &status); 
- *          pos != USEARCH_DONE; 
- *          pos = usearch_next(search, &status))
- *     {
- *         printf("Found match at %d pos, length is %d\n", pos, 
- *                                        usearch_getMatchLength(search));
- *     }
- * }
- *
- * usearch_close(search);
- * 
- * @stable ICU 2.4 - */ - -/** -* DONE is returned by previous() and next() after all valid matches have -* been returned, and by first() and last() if there are no matches at all. -* @stable ICU 2.4 -*/ -#define USEARCH_DONE -1 - -/** -* Data structure for searching -* @stable ICU 2.4 -*/ -struct UStringSearch; -/** -* Data structure for searching -* @stable ICU 2.4 -*/ -typedef struct UStringSearch UStringSearch; - -/** -* @stable ICU 2.4 -*/ -typedef enum { - /** Option for overlapping matches */ - USEARCH_OVERLAP, - /** - Option for canonical matches. option 1 in header documentation. - The default value will be USEARCH_OFF - */ - USEARCH_CANONICAL_MATCH, - USEARCH_ATTRIBUTE_COUNT -} USearchAttribute; - -/** -* @stable ICU 2.4 -*/ -typedef enum { - /** default value for any USearchAttribute */ - USEARCH_DEFAULT = -1, - /** value for USEARCH_OVERLAP and USEARCH_CANONICAL_MATCH */ - USEARCH_OFF, - /** value for USEARCH_OVERLAP and USEARCH_CANONICAL_MATCH */ - USEARCH_ON, - USEARCH_ATTRIBUTE_VALUE_COUNT -} USearchAttributeValue; - -/* open and close ------------------------------------------------------ */ - -/** -* Creating a search iterator data struct using the argument locale language -* rule set. A collator will be created in the process, which will be owned by -* this search and will be deleted in usearch_close. -* @param pattern for matching -* @param patternlength length of the pattern, -1 for null-termination -* @param text text string -* @param textlength length of the text string, -1 for null-termination -* @param locale name of locale for the rules to be used -* @param breakiter A BreakIterator that will be used to restrict the points -* at which matches are detected. If a match is found, but -* the match's start or end index is not a boundary as -* determined by the BreakIterator, the match will -* be rejected and another will be searched for. -* If this parameter is NULL, no break detection is -* attempted. -* @param status for errors if it occurs. If pattern or text is NULL, or if -* patternlength or textlength is 0 then an -* U_ILLEGAL_ARGUMENT_ERROR is returned. -* @return search iterator data structure, or NULL if there is an error. -* @stable ICU 2.4 -*/ -U_STABLE UStringSearch * U_EXPORT2 usearch_open(const UChar *pattern, - int32_t patternlength, - const UChar *text, - int32_t textlength, - const char *locale, - UBreakIterator *breakiter, - UErrorCode *status); - -/** -* Creating a search iterator data struct using the argument collator language -* rule set. Note, user retains the ownership of this collator, thus the -* responsibility of deletion lies with the user. -* NOTE: string search cannot be instantiated from a collator that has -* collate digits as numbers (CODAN) turned on. -* @param pattern for matching -* @param patternlength length of the pattern, -1 for null-termination -* @param text text string -* @param textlength length of the text string, -1 for null-termination -* @param collator used for the language rules -* @param breakiter A BreakIterator that will be used to restrict the points -* at which matches are detected. If a match is found, but -* the match's start or end index is not a boundary as -* determined by the BreakIterator, the match will -* be rejected and another will be searched for. -* If this parameter is NULL, no break detection is -* attempted. -* @param status for errors if it occurs. If collator, pattern or text is NULL, -* or if patternlength or textlength is 0 then an -* U_ILLEGAL_ARGUMENT_ERROR is returned. -* @return search iterator data structure, or NULL if there is an error. -* @stable ICU 2.4 -*/ -U_STABLE UStringSearch * U_EXPORT2 usearch_openFromCollator( - const UChar *pattern, - int32_t patternlength, - const UChar *text, - int32_t textlength, - const UCollator *collator, - UBreakIterator *breakiter, - UErrorCode *status); - -/** -* Destroying and cleaning up the search iterator data struct. -* If a collator is created in usearch_open, it will be destroyed here. -* @param searchiter data struct to clean up -* @stable ICU 2.4 -*/ -U_STABLE void U_EXPORT2 usearch_close(UStringSearch *searchiter); - -/* get and set methods -------------------------------------------------- */ - -/** -* Sets the current position in the text string which the next search will -* start from. Clears previous states. -* This method takes the argument index and sets the position in the text -* string accordingly without checking if the index is pointing to a -* valid starting point to begin searching. -* Search positions that may render incorrect results are highlighted in the -* header comments -* @param strsrch search iterator data struct -* @param position position to start next search from. If position is less -* than or greater than the text range for searching, -* an U_INDEX_OUTOFBOUNDS_ERROR will be returned -* @param status error status if any. -* @stable ICU 2.4 -*/ -U_STABLE void U_EXPORT2 usearch_setOffset(UStringSearch *strsrch, - int32_t position, - UErrorCode *status); - -/** -* Return the current index in the string text being searched. -* If the iteration has gone past the end of the text (or past the beginning -* for a backwards search), USEARCH_DONE is returned. -* @param strsrch search iterator data struct -* @see #USEARCH_DONE -* @stable ICU 2.4 -*/ -U_STABLE int32_t U_EXPORT2 usearch_getOffset(const UStringSearch *strsrch); - -/** -* Sets the text searching attributes located in the enum USearchAttribute -* with values from the enum USearchAttributeValue. -* USEARCH_DEFAULT can be used for all attributes for resetting. -* @param strsrch search iterator data struct -* @param attribute text attribute to be set -* @param value text attribute value -* @param status for errors if it occurs -* @see #usearch_getAttribute -* @stable ICU 2.4 -*/ -U_STABLE void U_EXPORT2 usearch_setAttribute(UStringSearch *strsrch, - USearchAttribute attribute, - USearchAttributeValue value, - UErrorCode *status); - -/** -* Gets the text searching attributes. -* @param strsrch search iterator data struct -* @param attribute text attribute to be retrieve -* @return text attribute value -* @see #usearch_setAttribute -* @stable ICU 2.4 -*/ -U_STABLE USearchAttributeValue U_EXPORT2 usearch_getAttribute( - const UStringSearch *strsrch, - USearchAttribute attribute); - -/** -* Returns the index to the match in the text string that was searched. -* This call returns a valid result only after a successful call to -* usearch_first, usearch_next, usearch_previous, -* or usearch_last. -* Just after construction, or after a searching method returns -* USEARCH_DONE, this method will return USEARCH_DONE. -*

-* Use usearch_getMatchedLength to get the matched string length. -* @param strsrch search iterator data struct -* @return index to a substring within the text string that is being -* searched. -* @see #usearch_first -* @see #usearch_next -* @see #usearch_previous -* @see #usearch_last -* @see #USEARCH_DONE -* @stable ICU 2.4 -*/ -U_STABLE int32_t U_EXPORT2 usearch_getMatchedStart( - const UStringSearch *strsrch); - -/** -* Returns the length of text in the string which matches the search pattern. -* This call returns a valid result only after a successful call to -* usearch_first, usearch_next, usearch_previous, -* or usearch_last. -* Just after construction, or after a searching method returns -* USEARCH_DONE, this method will return 0. -* @param strsrch search iterator data struct -* @return The length of the match in the string text, or 0 if there is no -* match currently. -* @see #usearch_first -* @see #usearch_next -* @see #usearch_previous -* @see #usearch_last -* @see #USEARCH_DONE -* @stable ICU 2.4 -*/ -U_STABLE int32_t U_EXPORT2 usearch_getMatchedLength( - const UStringSearch *strsrch); - -/** -* Returns the text that was matched by the most recent call to -* usearch_first, usearch_next, usearch_previous, -* or usearch_last. -* If the iterator is not pointing at a valid match (e.g. just after -* construction or after USEARCH_DONE has been returned, returns -* an empty string. If result is not large enough to store the matched text, -* result will be filled with the partial text and an U_BUFFER_OVERFLOW_ERROR -* will be returned in status. result will be null-terminated whenever -* possible. If the buffer fits the matched text exactly, a null-termination -* is not possible, then a U_STRING_NOT_TERMINATED_ERROR set in status. -* Pre-flighting can be either done with length = 0 or the API -* usearch_getMatchLength. -* @param strsrch search iterator data struct -* @param result UChar buffer to store the matched string -* @param resultCapacity length of the result buffer -* @param status error returned if result is not large enough -* @return exact length of the matched text, not counting the null-termination -* @see #usearch_first -* @see #usearch_next -* @see #usearch_previous -* @see #usearch_last -* @see #USEARCH_DONE -* @stable ICU 2.4 -*/ -U_STABLE int32_t U_EXPORT2 usearch_getMatchedText(const UStringSearch *strsrch, - UChar *result, - int32_t resultCapacity, - UErrorCode *status); - -#if !UCONFIG_NO_BREAK_ITERATION - -/** -* Set the BreakIterator that will be used to restrict the points at which -* matches are detected. -* @param strsrch search iterator data struct -* @param breakiter A BreakIterator that will be used to restrict the points -* at which matches are detected. If a match is found, but -* the match's start or end index is not a boundary as -* determined by the BreakIterator, the match will -* be rejected and another will be searched for. -* If this parameter is NULL, no break detection is -* attempted. -* @param status for errors if it occurs -* @see #usearch_getBreakIterator -* @stable ICU 2.4 -*/ -U_STABLE void U_EXPORT2 usearch_setBreakIterator(UStringSearch *strsrch, - UBreakIterator *breakiter, - UErrorCode *status); - -/** -* Returns the BreakIterator that is used to restrict the points at which -* matches are detected. This will be the same object that was passed to the -* constructor or to usearch_setBreakIterator. Note that -* NULL -* is a legal value; it means that break detection should not be attempted. -* @param strsrch search iterator data struct -* @return break iterator used -* @see #usearch_setBreakIterator -* @stable ICU 2.4 -*/ -U_STABLE const UBreakIterator * U_EXPORT2 usearch_getBreakIterator( - const UStringSearch *strsrch); - -#endif - -/** -* Set the string text to be searched. Text iteration will hence begin at the -* start of the text string. This method is useful if you want to re-use an -* iterator to search for the same pattern within a different body of text. -* @param strsrch search iterator data struct -* @param text new string to look for match -* @param textlength length of the new string, -1 for null-termination -* @param status for errors if it occurs. If text is NULL, or textlength is 0 -* then an U_ILLEGAL_ARGUMENT_ERROR is returned with no change -* done to strsrch. -* @see #usearch_getText -* @stable ICU 2.4 -*/ -U_STABLE void U_EXPORT2 usearch_setText( UStringSearch *strsrch, - const UChar *text, - int32_t textlength, - UErrorCode *status); - -/** -* Return the string text to be searched. -* @param strsrch search iterator data struct -* @param length returned string text length -* @return string text -* @see #usearch_setText -* @stable ICU 2.4 -*/ -U_STABLE const UChar * U_EXPORT2 usearch_getText(const UStringSearch *strsrch, - int32_t *length); - -/** -* Gets the collator used for the language rules. -*

-* Deleting the returned UCollator before calling -* usearch_close would cause the string search to fail. -* usearch_close will delete the collator if this search owns it. -* @param strsrch search iterator data struct -* @return collator -* @stable ICU 2.4 -*/ -U_STABLE UCollator * U_EXPORT2 usearch_getCollator( - const UStringSearch *strsrch); - -/** -* Sets the collator used for the language rules. User retains the ownership -* of this collator, thus the responsibility of deletion lies with the user. -* This method causes internal data such as Boyer-Moore shift tables to -* be recalculated, but the iterator's position is unchanged. -* @param strsrch search iterator data struct -* @param collator to be used -* @param status for errors if it occurs -* @stable ICU 2.4 -*/ -U_STABLE void U_EXPORT2 usearch_setCollator( UStringSearch *strsrch, - const UCollator *collator, - UErrorCode *status); - -/** -* Sets the pattern used for matching. -* Internal data like the Boyer Moore table will be recalculated, but the -* iterator's position is unchanged. -* @param strsrch search iterator data struct -* @param pattern string -* @param patternlength pattern length, -1 for null-terminated string -* @param status for errors if it occurs. If text is NULL, or textlength is 0 -* then an U_ILLEGAL_ARGUMENT_ERROR is returned with no change -* done to strsrch. -* @stable ICU 2.4 -*/ -U_STABLE void U_EXPORT2 usearch_setPattern( UStringSearch *strsrch, - const UChar *pattern, - int32_t patternlength, - UErrorCode *status); - -/** -* Gets the search pattern -* @param strsrch search iterator data struct -* @param length return length of the pattern, -1 indicates that the pattern -* is null-terminated -* @return pattern string -* @stable ICU 2.4 -*/ -U_STABLE const UChar * U_EXPORT2 usearch_getPattern( - const UStringSearch *strsrch, - int32_t *length); - -/* methods ------------------------------------------------------------- */ - -/** -* Returns the first index at which the string text matches the search -* pattern. -* The iterator is adjusted so that its current index (as returned by -* usearch_getOffset) is the match position if one was found. -* If a match is not found, USEARCH_DONE will be returned and -* the iterator will be adjusted to the index USEARCH_DONE. -* @param strsrch search iterator data struct -* @param status for errors if it occurs -* @return The character index of the first match, or -* USEARCH_DONE if there are no matches. -* @see #usearch_getOffset -* @see #USEARCH_DONE -* @stable ICU 2.4 -*/ -U_STABLE int32_t U_EXPORT2 usearch_first(UStringSearch *strsrch, - UErrorCode *status); - -/** -* Returns the first index greater than position at which the string -* text -* matches the search pattern. The iterator is adjusted so that its current -* index (as returned by usearch_getOffset) is the match position if -* one was found. -* If a match is not found, USEARCH_DONE will be returned and -* the iterator will be adjusted to the index USEARCH_DONE -*

-* Search positions that may render incorrect results are highlighted in the -* header comments. If position is less than or greater than the text range -* for searching, an U_INDEX_OUTOFBOUNDS_ERROR will be returned -* @param strsrch search iterator data struct -* @param position to start the search at -* @param status for errors if it occurs -* @return The character index of the first match following pos, -* or USEARCH_DONE if there are no matches. -* @see #usearch_getOffset -* @see #USEARCH_DONE -* @stable ICU 2.4 -*/ -U_STABLE int32_t U_EXPORT2 usearch_following(UStringSearch *strsrch, - int32_t position, - UErrorCode *status); - -/** -* Returns the last index in the target text at which it matches the search -* pattern. The iterator is adjusted so that its current -* index (as returned by usearch_getOffset) is the match position if -* one was found. -* If a match is not found, USEARCH_DONE will be returned and -* the iterator will be adjusted to the index USEARCH_DONE. -* @param strsrch search iterator data struct -* @param status for errors if it occurs -* @return The index of the first match, or USEARCH_DONE if there -* are no matches. -* @see #usearch_getOffset -* @see #USEARCH_DONE -* @stable ICU 2.4 -*/ -U_STABLE int32_t U_EXPORT2 usearch_last(UStringSearch *strsrch, - UErrorCode *status); - -/** -* Returns the first index less than position at which the string text -* matches the search pattern. The iterator is adjusted so that its current -* index (as returned by usearch_getOffset) is the match position if -* one was found. -* If a match is not found, USEARCH_DONE will be returned and -* the iterator will be adjusted to the index USEARCH_DONE -*

-* Search positions that may render incorrect results are highlighted in the -* header comments. If position is less than or greater than the text range -* for searching, an U_INDEX_OUTOFBOUNDS_ERROR will be returned -* @param strsrch search iterator data struct -* @param position index position the search is to begin at -* @param status for errors if it occurs -* @return The character index of the first match preceding pos, -* or USEARCH_DONE if there are no matches. -* @see #usearch_getOffset -* @see #USEARCH_DONE -* @stable ICU 2.4 -*/ -U_STABLE int32_t U_EXPORT2 usearch_preceding(UStringSearch *strsrch, - int32_t position, - UErrorCode *status); - -/** -* Returns the index of the next point at which the string text matches the -* search pattern, starting from the current position. -* The iterator is adjusted so that its current -* index (as returned by usearch_getOffset) is the match position if -* one was found. -* If a match is not found, USEARCH_DONE will be returned and -* the iterator will be adjusted to the index USEARCH_DONE -* @param strsrch search iterator data struct -* @param status for errors if it occurs -* @return The index of the next match after the current position, or -* USEARCH_DONE if there are no more matches. -* @see #usearch_first -* @see #usearch_getOffset -* @see #USEARCH_DONE -* @stable ICU 2.4 -*/ -U_STABLE int32_t U_EXPORT2 usearch_next(UStringSearch *strsrch, - UErrorCode *status); - -/** -* Returns the index of the previous point at which the string text matches -* the search pattern, starting at the current position. -* The iterator is adjusted so that its current -* index (as returned by usearch_getOffset) is the match position if -* one was found. -* If a match is not found, USEARCH_DONE will be returned and -* the iterator will be adjusted to the index USEARCH_DONE -* @param strsrch search iterator data struct -* @param status for errors if it occurs -* @return The index of the previous match before the current position, -* or USEARCH_DONE if there are no more matches. -* @see #usearch_last -* @see #usearch_getOffset -* @see #USEARCH_DONE -* @stable ICU 2.4 -*/ -U_STABLE int32_t U_EXPORT2 usearch_previous(UStringSearch *strsrch, - UErrorCode *status); - -/** -* Reset the iteration. -* Search will begin at the start of the text string if a forward iteration -* is initiated before a backwards iteration. Otherwise if a backwards -* iteration is initiated before a forwards iteration, the search will begin -* at the end of the text string. -* @param strsrch search iterator data struct -* @see #usearch_first -* @stable ICU 2.4 -*/ -U_STABLE void U_EXPORT2 usearch_reset(UStringSearch *strsrch); - -/** - * Simple forward search for the pattern, starting at a specified index, - * and using using a default set search options. - * - * This is an experimental function, and is not an official part of the - * ICU API. - * - * The collator options, such as UCOL_STRENGTH and UCOL_NORMALIZTION, are honored. - * - * The UStringSearch options USEARCH_CANONICAL_MATCH, USEARCH_OVERLAP and - * any Break Iterator are ignored. - * - * Matches obey the following constraints: - * - * Characters at the start or end positions of a match that are ignorable - * for collation are not included as part of the match, unless they - * are part of a combining sequence, as described below. - * - * A match will not include a partial combining sequence. Combining - * character sequences are considered to be inseperable units, - * and either match the pattern completely, or are considered to not match - * at all. Thus, for example, an A followed a combining accent mark will - * not be found when searching for a plain (unaccented) A. (unless - * the collation strength has been set to ignore all accents). - * - * When beginning a search, the initial starting position, startIdx, - * is assumed to be an acceptable match boundary with respect to - * combining characters. A combining sequence that spans across the - * starting point will not supress a match beginning at startIdx. - * - * Characters that expand to multiple collation elements - * (German sharp-S becoming 'ss', or the composed forms of accented - * characters, for example) also must match completely. - * Searching for a single 's' in a string containing only a sharp-s will - * find no match. - * - * - * @param strsrch the UStringSearch struct, which references both - * the text to be searched and the pattern being sought. - * @param startIdx The index into the text to begin the search. - * @param matchStart An out parameter, the starting index of the matched text. - * This parameter may be NULL. - * A value of -1 will be returned if no match was found. - * @param matchLimit Out parameter, the index of the first position following the matched text. - * The matchLimit will be at a suitable position for beginning a subsequent search - * in the input text. - * This parameter may be NULL. - * A value of -1 will be returned if no match was found. - * - * @param status Report any errors. Note that no match found is not an error. - * @return TRUE if a match was found, FALSE otherwise. - * - * @internal - */ -U_INTERNAL UBool U_EXPORT2 usearch_search(UStringSearch *strsrch, - int32_t startIdx, - int32_t *matchStart, - int32_t *matchLimit, - UErrorCode *status); - -/** - * Simple backwards search for the pattern, starting at a specified index, - * and using using a default set search options. - * - * This is an experimental function, and is not an official part of the - * ICU API. - * - * The collator options, such as UCOL_STRENGTH and UCOL_NORMALIZTION, are honored. - * - * The UStringSearch options USEARCH_CANONICAL_MATCH, USEARCH_OVERLAP and - * any Break Iterator are ignored. - * - * Matches obey the following constraints: - * - * Characters at the start or end positions of a match that are ignorable - * for collation are not included as part of the match, unless they - * are part of a combining sequence, as described below. - * - * A match will not include a partial combining sequence. Combining - * character sequences are considered to be inseperable units, - * and either match the pattern completely, or are considered to not match - * at all. Thus, for example, an A followed a combining accent mark will - * not be found when searching for a plain (unaccented) A. (unless - * the collation strength has been set to ignore all accents). - * - * When beginning a search, the initial starting position, startIdx, - * is assumed to be an acceptable match boundary with respect to - * combining characters. A combining sequence that spans across the - * starting point will not supress a match beginning at startIdx. - * - * Characters that expand to multiple collation elements - * (German sharp-S becoming 'ss', or the composed forms of accented - * characters, for example) also must match completely. - * Searching for a single 's' in a string containing only a sharp-s will - * find no match. - * - * - * @param strsrch the UStringSearch struct, which references both - * the text to be searched and the pattern being sought. - * @param startIdx The index into the text to begin the search. - * @param matchStart An out parameter, the starting index of the matched text. - * This parameter may be NULL. - * A value of -1 will be returned if no match was found. - * @param matchLimit Out parameter, the index of the first position following the matched text. - * The matchLimit will be at a suitable position for beginning a subsequent search - * in the input text. - * This parameter may be NULL. - * A value of -1 will be returned if no match was found. - * - * @param status Report any errors. Note that no match found is not an error. - * @return TRUE if a match was found, FALSE otherwise. - * - * @internal - */ -U_INTERNAL UBool U_EXPORT2 usearch_searchBackwards(UStringSearch *strsrch, - int32_t startIdx, - int32_t *matchStart, - int32_t *matchLimit, - UErrorCode *status); - -#endif /* #if !UCONFIG_NO_COLLATION && !UCONFIG_NO_BREAK_ITERATION */ - -#endif http://git-wip-us.apache.org/repos/asf/couchdb/blob/81332b78/apps/couch_collate/platform/osx/icu/unicode/uset.h ---------------------------------------------------------------------- diff --git a/apps/couch_collate/platform/osx/icu/unicode/uset.h b/apps/couch_collate/platform/osx/icu/unicode/uset.h deleted file mode 100644 index 2bbfd7a..0000000 --- a/apps/couch_collate/platform/osx/icu/unicode/uset.h +++ /dev/null @@ -1,1052 +0,0 @@ -/* -******************************************************************************* -* -* Copyright (C) 2002-2008, International Business Machines -* Corporation and others. All Rights Reserved. -* -******************************************************************************* -* file name: uset.h -* encoding: US-ASCII -* tab size: 8 (not used) -* indentation:4 -* -* created on: 2002mar07 -* created by: Markus W. Scherer -* -* C version of UnicodeSet. -*/ - - -/** - * \file - * \brief C API: Unicode Set - * - *

This is a C wrapper around the C++ UnicodeSet class.

- */ - -#ifndef __USET_H__ -#define __USET_H__ - -#include "unicode/utypes.h" -#include "unicode/uchar.h" - -#ifndef UCNV_H -struct USet; -/** - * A UnicodeSet. Use the uset_* API to manipulate. Create with - * uset_open*, and destroy with uset_close. - * @stable ICU 2.4 - */ -typedef struct USet USet; -#endif - -/** - * Bitmask values to be passed to uset_openPatternOptions() or - * uset_applyPattern() taking an option parameter. - * @stable ICU 2.4 - */ -enum { - /** - * Ignore white space within patterns unless quoted or escaped. - * @stable ICU 2.4 - */ - USET_IGNORE_SPACE = 1, - - /** - * Enable case insensitive matching. E.g., "[ab]" with this flag - * will match 'a', 'A', 'b', and 'B'. "[^ab]" with this flag will - * match all except 'a', 'A', 'b', and 'B'. This performs a full - * closure over case mappings, e.g. U+017F for s. - * - * The resulting set is a superset of the input for the code points but - * not for the strings. - * It performs a case mapping closure of the code points and adds - * full case folding strings for the code points, and reduces strings of - * the original set to their full case folding equivalents. - * - * This is designed for case-insensitive matches, for example - * in regular expressions. The full code point case closure allows checking of - * an input character directly against the closure set. - * Strings are matched by comparing the case-folded form from the closure - * set with an incremental case folding of the string in question. - * - * The closure set will also contain single code points if the original - * set contained case-equivalent strings (like U+00DF for "ss" or "Ss" etc.). - * This is not necessary (that is, redundant) for the above matching method - * but results in the same closure sets regardless of whether the original - * set contained the code point or a string. - * - * @stable ICU 2.4 - */ - USET_CASE_INSENSITIVE = 2, - - /** - * Enable case insensitive matching. E.g., "[ab]" with this flag - * will match 'a', 'A', 'b', and 'B'. "[^ab]" with this flag will - * match all except 'a', 'A', 'b', and 'B'. This adds the lower-, - * title-, and uppercase mappings as well as the case folding - * of each existing element in the set. - * @stable ICU 3.2 - */ - USET_ADD_CASE_MAPPINGS = 4, - - /** - * Enough for any single-code point set - * @internal - */ - USET_SERIALIZED_STATIC_ARRAY_CAPACITY=8 -}; - -/** - * Argument values for whether span() and similar functions continue while - * the current character is contained vs. not contained in the set. - * - * The functionality is straightforward for sets with only single code points, - * without strings (which is the common case): - * - USET_SPAN_CONTAINED and USET_SPAN_SIMPLE - * work the same. - * - span() and spanBack() partition any string the same way when - * alternating between span(USET_SPAN_NOT_CONTAINED) and - * span(either "contained" condition). - * - Using a complemented (inverted) set and the opposite span conditions - * yields the same results. - * - * When a set contains multi-code point strings, then these statements may not - * be true, depending on the strings in the set (for example, whether they - * overlap with each other) and the string that is processed. - * For a set with strings: - * - The complement of the set contains the opposite set of code points, - * but the same set of strings. - * Therefore, complementing both the set and the span conditions - * may yield different results. - * - When starting spans at different positions in a string - * (span(s, ...) vs. span(s+1, ...)) the ends of the spans may be different - * because a set string may start before the later position. - * - span(USET_SPAN_SIMPLE) may be shorter than - * span(USET_SPAN_CONTAINED) because it will not recursively try - * all possible paths. - * For example, with a set which contains the three strings "xy", "xya" and "ax", - * span("xyax", USET_SPAN_CONTAINED) will return 4 but - * span("xyax", USET_SPAN_SIMPLE) will return 3. - * span(USET_SPAN_SIMPLE) will never be longer than - * span(USET_SPAN_CONTAINED). - * - With either "contained" condition, span() and spanBack() may partition - * a string in different ways. - * For example, with a set which contains the two strings "ab" and "ba", - * and when processing the string "aba", - * span() will yield contained/not-contained boundaries of { 0, 2, 3 } - * while spanBack() will yield boundaries of { 0, 1, 3 }. - * - * Note: If it is important to get the same boundaries whether iterating forward - * or backward through a string, then either only span() should be used and - * the boundaries cached for backward operation, or an ICU BreakIterator - * could be used. - * - * Note: Unpaired surrogates are treated like surrogate code points. - * Similarly, set strings match only on code point boundaries, - * never in the middle of a surrogate pair. - * Illegal UTF-8 sequences are treated like U+FFFD. - * When processing UTF-8 strings, malformed set strings - * (strings with unpaired surrogates which cannot be converted to UTF-8) - * are ignored. - * - * @stable ICU 4.0 - */ -typedef enum USetSpanCondition { - /** - * Continue a span() while there is no set element at the current position. - * Stops before the first set element (character or string). - * (For code points only, this is like while contains(current)==FALSE). - * - * When span() returns, the substring between where it started and the position - * it returned consists only of characters that are not in the set, - * and none of its strings overlap with the span. - * - * @stable ICU 4.0 - */ - USET_SPAN_NOT_CONTAINED = 0, - /** - * Continue a span() while there is a set element at the current position. - * (For characters only, this is like while contains(current)==TRUE). - * - * When span() returns, the substring between where it started and the position - * it returned consists only of set elements (characters or strings) that are in the set. - * - * If a set contains strings, then the span will be the longest substring - * matching any of the possible concatenations of set elements (characters or strings). - * (There must be a single, non-overlapping concatenation of characters or strings.) - * This is equivalent to a POSIX regular expression for (OR of each set element)*. - * - * @stable ICU 4.0 - */ - USET_SPAN_CONTAINED = 1, - /** - * Continue a span() while there is a set element at the current position. - * (For characters only, this is like while contains(current)==TRUE). - * - * When span() returns, the substring between where it started and the position - * it returned consists only of set elements (characters or strings) that are in the set. - * - * If a set only contains single characters, then this is the same - * as USET_SPAN_CONTAINED. - * - * If a set contains strings, then the span will be the longest substring - * with a match at each position with the longest single set element (character or string). - * - * Use this span condition together with other longest-match algorithms, - * such as ICU converters (ucnv_getUnicodeSet()). - * - * @stable ICU 4.0 - */ - USET_SPAN_SIMPLE = 2, - /** - * One more than the last span condition. - * @stable ICU 4.0 - */ - USET_SPAN_CONDITION_COUNT -} USetSpanCondition; - -/** - * A serialized form of a Unicode set. Limited manipulations are - * possible directly on a serialized set. See below. - * @stable ICU 2.4 - */ -typedef struct USerializedSet { - /** - * The serialized Unicode Set. - * @stable ICU 2.4 - */ - const uint16_t *array; - /** - * The length of the array that contains BMP characters. - * @stable ICU 2.4 - */ - int32_t bmpLength; - /** - * The total length of the array. - * @stable ICU 2.4 - */ - int32_t length; - /** - * A small buffer for the array to reduce memory allocations. - * @stable ICU 2.4 - */ - uint16_t staticArray[USET_SERIALIZED_STATIC_ARRAY_CAPACITY]; -} USerializedSet; - -/********************************************************************* - * USet API - *********************************************************************/ - -/** - * Creates a USet object that contains the range of characters - * start..end, inclusive. If start > end - * then an empty set is created. - * @param start first character of the range, inclusive - * @param end last character of the range, inclusive - * @return a newly created USet. The caller must call uset_close() on - * it when done. - * @stable ICU 2.4 - */ -U_STABLE USet* U_EXPORT2 -uset_open(UChar32 start, UChar32 end); - -/** - * Creates a set from the given pattern. See the UnicodeSet class - * description for the syntax of the pattern language. - * @param pattern a string specifying what characters are in the set - * @param patternLength the length of the pattern, or -1 if null - * terminated - * @param ec the error code - * @stable ICU 2.4 - */ -U_STABLE USet* U_EXPORT2 -uset_openPattern(const UChar* pattern, int32_t patternLength, - UErrorCode* ec); - -/** - * Creates a set from the given pattern. See the UnicodeSet class - * description for the syntax of the pattern language. - * @param pattern a string specifying what characters are in the set - * @param patternLength the length of the pattern, or -1 if null - * terminated - * @param options bitmask for options to apply to the pattern. - * Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE. - * @param ec the error code - * @stable ICU 2.4 - */ -U_STABLE USet* U_EXPORT2 -uset_openPatternOptions(const UChar* pattern, int32_t patternLength, - uint32_t options, - UErrorCode* ec); - -/** - * Disposes of the storage used by a USet object. This function should - * be called exactly once for objects returned by uset_open(). - * @param set the object to dispose of - * @stable ICU 2.4 - */ -U_STABLE void U_EXPORT2 -uset_close(USet* set); - -/** - * Returns a copy of this object. - * If this set is frozen, then the clone will be frozen as well. - * Use uset_cloneAsThawed() for a mutable clone of a frozen set. - * @param set the original set - * @return the newly allocated copy of the set - * @see uset_cloneAsThawed - * @stable ICU 4.0 - */ -U_DRAFT USet * U_EXPORT2 -uset_clone(const USet *set); - -/** - * Determines whether the set has been frozen (made immutable) or not. - * See the ICU4J Freezable interface for details. - * @param set the set - * @return TRUE/FALSE for whether the set has been frozen - * @see uset_freeze - * @see uset_cloneAsThawed - * @stable ICU 4.0 - */ -U_DRAFT UBool U_EXPORT2 -uset_isFrozen(const USet *set); - -/** - * Freeze the set (make it immutable). - * Once frozen, it cannot be unfrozen and is therefore thread-safe - * until it is deleted. - * See the ICU4J Freezable interface for details. - * Freezing the set may also make some operations faster, for example - * uset_contains() and uset_span(). - * A frozen set will not be modified. (It remains frozen.) - * @param set the set - * @return the same set, now frozen - * @see uset_isFrozen - * @see uset_cloneAsThawed - * @stable ICU 4.0 - */ -U_DRAFT void U_EXPORT2 -uset_freeze(USet *set); - -/** - * Clone the set and make the clone mutable. - * See the ICU4J Freezable interface for details. - * @param set the set - * @return the mutable clone - * @see uset_freeze - * @see uset_isFrozen - * @see uset_clone - * @stable ICU 4.0 - */ -U_DRAFT USet * U_EXPORT2 -uset_cloneAsThawed(const USet *set); - -/** - * Causes the USet object to represent the range start - end. - * If start > end then this USet is set to an empty range. - * A frozen set will not be modified. - * @param set the object to set to the given range - * @param start first character in the set, inclusive - * @param end last character in the set, inclusive - * @stable ICU 3.2 - */ -U_STABLE void U_EXPORT2 -uset_set(USet* set, - UChar32 start, UChar32 end); - -/** - * Modifies the set to represent the set specified by the given - * pattern. See the UnicodeSet class description for the syntax of - * the pattern language. See also the User Guide chapter about UnicodeSet. - * Empties the set passed before applying the pattern. - * A frozen set will not be modified. - * @param set The set to which the pattern is to be applied. - * @param pattern A pointer to UChar string specifying what characters are in the set. - * The character at pattern[0] must be a '['. - * @param patternLength The length of the UChar string. -1 if NUL terminated. - * @param options A bitmask for options to apply to the pattern. - * Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE. - * @param status Returns an error if the pattern cannot be parsed. - * @return Upon successful parse, the value is either - * the index of the character after the closing ']' - * of the parsed pattern. - * If the status code indicates failure, then the return value - * is the index of the error in the source. - * - * @stable ICU 2.8 - */ -U_STABLE int32_t U_EXPORT2 -uset_applyPattern(USet *set, - const UChar *pattern, int32_t patternLength, - uint32_t options, - UErrorCode *status); - -/** - * Modifies the set to contain those code points which have the given value - * for the given binary or enumerated property, as returned by - * u_getIntPropertyValue. Prior contents of this set are lost. - * A frozen set will not be modified. - * - * @param set the object to contain the code points defined by the property - * - * @param prop a property in the range UCHAR_BIN_START..UCHAR_BIN_LIMIT-1 - * or UCHAR_INT_START..UCHAR_INT_LIMIT-1 - * or UCHAR_MASK_START..UCHAR_MASK_LIMIT-1. - * - * @param value a value in the range u_getIntPropertyMinValue(prop).. - * u_getIntPropertyMaxValue(prop), with one exception. If prop is - * UCHAR_GENERAL_CATEGORY_MASK, then value should not be a UCharCategory, but - * rather a mask value produced by U_GET_GC_MASK(). This allows grouped - * categories such as [:L:] to be represented. - * - * @param ec error code input/output parameter - * - * @stable ICU 3.2 - */ -U_STABLE void U_EXPORT2 -uset_applyIntPropertyValue(USet* set, - UProperty prop, int32_t value, UErrorCode* ec); - -/** - * Modifies the set to contain those code points which have the - * given value for the given property. Prior contents of this - * set are lost. - * A frozen set will not be modified. - * - * @param set the object to contain the code points defined by the given - * property and value alias - * - * @param prop a string specifying a property alias, either short or long. - * The name is matched loosely. See PropertyAliases.txt for names and a - * description of loose matching. If the value string is empty, then this - * string is interpreted as either a General_Category value alias, a Script - * value alias, a binary property alias, or a special ID. Special IDs are - * matched loosely and correspond to the following sets: - * - * "ANY" = [\\u0000-\\U0010FFFF], - * "ASCII" = [\\u0000-\\u007F], - * "Assigned" = [:^Cn:]. - * - * @param propLength the length of the prop, or -1 if NULL - * - * @param value a string specifying a value alias, either short or long. - * The name is matched loosely. See PropertyValueAliases.txt for names - * and a description of loose matching. In addition to aliases listed, - * numeric values and canonical combining classes may be expressed - * numerically, e.g., ("nv", "0.5") or ("ccc", "220"). The value string - * may also be empty. - * - * @param valueLength the length of the value, or -1 if NULL - * - * @param ec error code input/output parameter - * - * @stable ICU 3.2 - */ -U_STABLE void U_EXPORT2 -uset_applyPropertyAlias(USet* set, - const UChar *prop, int32_t propLength, - const UChar *value, int32_t valueLength, - UErrorCode* ec); - -/** - * Return true if the given position, in the given pattern, appears - * to be the start of a UnicodeSet pattern. - * - * @param pattern a string specifying the pattern - * @param patternLength the length of the pattern, or -1 if NULL - * @param pos the given position - * @stable ICU 3.2 - */ -U_STABLE UBool U_EXPORT2 -uset_resemblesPattern(const UChar *pattern, int32_t patternLength, - int32_t pos); - -/** - * Returns a string representation of this set. If the result of - * calling this function is passed to a uset_openPattern(), it - * will produce another set that is equal to this one. - * @param set the set - * @param result the string to receive the rules, may be NULL - * @param resultCapacity the capacity of result, may be 0 if result is NULL - * @param escapeUnprintable if TRUE then convert unprintable - * character to their hex escape representations, \\uxxxx or - * \\Uxxxxxxxx. Unprintable characters are those other than - * U+000A, U+0020..U+007E. - * @param ec error code. - * @return length of string, possibly larger than resultCapacity - * @stable ICU 2.4 - */ -U_STABLE int32_t U_EXPORT2 -uset_toPattern(const USet* set, - UChar* result, int32_t resultCapacity, - UBool escapeUnprintable, - UErrorCode* ec); - -/** - * Adds the given character to the given USet. After this call, - * uset_contains(set, c) will return TRUE. - * A frozen set will not be modified. - * @param set the object to which to add the character - * @param c the character to add - * @stable ICU 2.4 - */ -U_STABLE void U_EXPORT2 -uset_add(USet* set, UChar32 c); - -/** - * Adds all of the elements in the specified set to this set if - * they're not already present. This operation effectively - * modifies this set so that its value is the union of the two - * sets. The behavior of this operation is unspecified if the specified - * collection is modified while the operation is in progress. - * A frozen set will not be modified. - * - * @param set the object to which to add the set - * @param additionalSet the source set whose elements are to be added to this set. - * @stable ICU 2.6 - */ -U_STABLE void U_EXPORT2 -uset_addAll(USet* set, const USet *additionalSet); - -/** - * Adds the given range of characters to the given USet. After this call, - * uset_contains(set, start, end) will return TRUE. - * A frozen set will not be modified. - * @param set the object to which to add the character - * @param start the first character of the range to add, inclusive - * @param end the last character of the range to add, inclusive - * @stable ICU 2.2 - */ -U_STABLE void U_EXPORT2 -uset_addRange(USet* set, UChar32 start, UChar32 end); - -/** - * Adds the given string to the given USet. After this call, - * uset_containsString(set, str, strLen) will return TRUE. - * A frozen set will not be modified. - * @param set the object to which to add the character - * @param str the string to add - * @param strLen the length of the string or -1 if null terminated. - * @stable ICU 2.4 - */ -U_STABLE void U_EXPORT2 -uset_addString(USet* set, const UChar* str, int32_t strLen); - -/** - * Adds each of the characters in this string to the set. Thus "ch" => {"c", "h"} - * If this set already any particular character, it has no effect on that character. - * A frozen set will not be modified. - * @param set the object to which to add the character - * @param str the source string - * @param strLen the length of the string or -1 if null terminated. - * @stable ICU 3.4 - */ -U_STABLE void U_EXPORT2 -uset_addAllCodePoints(USet* set, const UChar *str, int32_t strLen); - -/** - * Removes the given character from the given USet. After this call, - * uset_contains(set, c) will return FALSE. - * A frozen set will not be modified. - * @param set the object from which to remove the character - * @param c the character to remove - * @stable ICU 2.4 - */ -U_STABLE void U_EXPORT2 -uset_remove(USet* set, UChar32 c); - -/** - * Removes the given range of characters from the given USet. After this call, - * uset_contains(set, start, end) will return FALSE. - * A frozen set will not be modified. - * @param set the object to which to add the character - * @param start the first character of the range to remove, inclusive - * @param end the last character of the range to remove, inclusive - * @stable ICU 2.2 - */ -U_STABLE void U_EXPORT2 -uset_removeRange(USet* set, UChar32 start, UChar32 end); - -/** - * Removes the given string to the given USet. After this call, - * uset_containsString(set, str, strLen) will return FALSE. - * A frozen set will not be modified. - * @param set the object to which to add the character - * @param str the string to remove - * @param strLen the length of the string or -1 if null terminated. - * @stable ICU 2.4 - */ -U_STABLE void U_EXPORT2 -uset_removeString(USet* set, const UChar* str, int32_t strLen); - -/** - * Removes from this set all of its elements that are contained in the - * specified set. This operation effectively modifies this - * set so that its value is the asymmetric set difference of - * the two sets. - * A frozen set will not be modified. - * @param set the object from which the elements are to be removed - * @param removeSet the object that defines which elements will be - * removed from this set - * @stable ICU 3.2 - */ -U_STABLE void U_EXPORT2 -uset_removeAll(USet* set, const USet* removeSet); - -/** - * Retain only the elements in this set that are contained in the - * specified range. If start > end then an empty range is - * retained, leaving the set empty. This is equivalent to - * a boolean logic AND, or a set INTERSECTION. - * A frozen set will not be modified. - * - * @param set the object for which to retain only the specified range - * @param start first character, inclusive, of range to be retained - * to this set. - * @param end last character, inclusive, of range to be retained - * to this set. - * @stable ICU 3.2 - */ -U_STABLE void U_EXPORT2 -uset_retain(USet* set, UChar32 start, UChar32 end); - -/** - * Retains only the elements in this set that are contained in the - * specified set. In other words, removes from this set all of - * its elements that are not contained in the specified set. This - * operation effectively modifies this set so that its value is - * the intersection of the two sets. - * A frozen set will not be modified. - * - * @param set the object on which to perform the retain - * @param retain set that defines which elements this set will retain - * @stable ICU 3.2 - */ -U_STABLE void U_EXPORT2 -uset_retainAll(USet* set, const USet* retain); - -/** - * Reallocate this objects internal structures to take up the least - * possible space, without changing this object's value. - * A frozen set will not be modified. - * - * @param set the object on which to perfrom the compact - * @stable ICU 3.2 - */ -U_STABLE void U_EXPORT2 -uset_compact(USet* set); - -/** - * Inverts this set. This operation modifies this set so that - * its value is its complement. This operation does not affect - * the multicharacter strings, if any. - * A frozen set will not be modified. - * @param set the set - * @stable ICU 2.4 - */ -U_STABLE void U_EXPORT2 -uset_complement(USet* set); - -/** - * Complements in this set all elements contained in the specified - * set. Any character in the other set will be removed if it is - * in this set, or will be added if it is not in this set. - * A frozen set will not be modified. - * - * @param set the set with which to complement - * @param complement set that defines which elements will be xor'ed - * from this set. - * @stable ICU 3.2 - */ -U_STABLE void U_EXPORT2 -uset_complementAll(USet* set, const USet* complement); - -/** - * Removes all of the elements from this set. This set will be - * empty after this call returns. - * A frozen set will not be modified. - * @param set the set - * @stable ICU 2.4 - */ -U_STABLE void U_EXPORT2 -uset_clear(USet* set); - -/** - * Returns TRUE if the given USet contains no characters and no - * strings. - * @param set the set - * @return true if set is empty - * @stable ICU 2.4 - */ -U_STABLE UBool U_EXPORT2 -uset_isEmpty(const USet* set); - -/** - * Returns TRUE if the given USet contains the given character. - * This function works faster with a frozen set. - * @param set the set - * @param c The codepoint to check for within the set - * @return true if set contains c - * @stable ICU 2.4 - */ -U_STABLE UBool U_EXPORT2 -uset_contains(const USet* set, UChar32 c); - -/** - * Returns TRUE if the given USet contains all characters c - * where start <= c && c <= end. - * @param set the set - * @param start the first character of the range to test, inclusive - * @param end the last character of the range to test, inclusive - * @return TRUE if set contains the range - * @stable ICU 2.2 - */ -U_STABLE UBool U_EXPORT2 -uset_containsRange(const USet* set, UChar32 start, UChar32 end); - -/** - * Returns TRUE if the given USet contains the given string. - * @param set the set - * @param str the string - * @param strLen the length of the string or -1 if null terminated. - * @return true if set contains str - * @stable ICU 2.4 - */ -U_STABLE UBool U_EXPORT2 -uset_containsString(const USet* set, const UChar* str, int32_t strLen); - -/** - * Returns the index of the given character within this set, where - * the set is ordered by ascending code point. If the character - * is not in this set, return -1. The inverse of this method is - * charAt(). - * @param set the set - * @param c the character to obtain the index for - * @return an index from 0..size()-1, or -1 - * @stable ICU 3.2 - */ -U_STABLE int32_t U_EXPORT2 -uset_indexOf(const USet* set, UChar32 c); - -/** - * Returns the character at the given index within this set, where - * the set is ordered by ascending code point. If the index is - * out of range, return (UChar32)-1. The inverse of this method is - * indexOf(). - * @param set the set - * @param index an index from 0..size()-1 to obtain the char for - * @return the character at the given index, or (UChar32)-1. - * @stable ICU 3.2 - */ -U_STABLE UChar32 U_EXPORT2 -uset_charAt(const USet* set, int32_t index); - -/** - * Returns the number of characters and strings contained in the given - * USet. - * @param set the set - * @return a non-negative integer counting the characters and strings - * contained in set - * @stable ICU 2.4 - */ -U_STABLE int32_t U_EXPORT2 -uset_size(const USet* set); - -/** - * Returns the number of items in this set. An item is either a range - * of characters or a single multicharacter string. - * @param set the set - * @return a non-negative integer counting the character ranges - * and/or strings contained in set - * @stable ICU 2.4 - */ -U_STABLE int32_t U_EXPORT2 -uset_getItemCount(const USet* set); - -/** - * Returns an item of this set. An item is either a range of - * characters or a single multicharacter string. - * @param set the set - * @param itemIndex a non-negative integer in the range 0.. - * uset_getItemCount(set)-1 - * @param start pointer to variable to receive first character - * in range, inclusive - * @param end pointer to variable to receive last character in range, - * inclusive - * @param str buffer to receive the string, may be NULL - * @param strCapacity capacity of str, or 0 if str is NULL - * @param ec error code - * @return the length of the string (>= 2), or 0 if the item is a - * range, in which case it is the range *start..*end, or -1 if - * itemIndex is out of range - * @stable ICU 2.4 - */ -U_STABLE int32_t U_EXPORT2 -uset_getItem(const USet* set, int32_t itemIndex, - UChar32* start, UChar32* end, - UChar* str, int32_t strCapacity, - UErrorCode* ec); - -/** - * Returns true if set1 contains all the characters and strings - * of set2. It answers the question, 'Is set1 a superset of set2?' - * @param set1 set to be checked for containment - * @param set2 set to be checked for containment - * @return true if the test condition is met - * @stable ICU 3.2 - */ -U_STABLE UBool U_EXPORT2 -uset_containsAll(const USet* set1, const USet* set2); - -/** - * Returns true if this set contains all the characters - * of the given string. This is does not check containment of grapheme - * clusters, like uset_containsString. - * @param set set of characters to be checked for containment - * @param str string containing codepoints to be checked for containment - * @param strLen the length of the string or -1 if null terminated. - * @return true if the test condition is met - * @stable ICU 3.4 - */ -U_STABLE UBool U_EXPORT2 -uset_containsAllCodePoints(const USet* set, const UChar *str, int32_t strLen); - -/** - * Returns true if set1 contains none of the characters and strings - * of set2. It answers the question, 'Is set1 a disjoint set of set2?' - * @param set1 set to be checked for containment - * @param set2 set to be checked for containment - * @return true if the test condition is met - * @stable ICU 3.2 - */ -U_STABLE UBool U_EXPORT2 -uset_containsNone(const USet* set1, const USet* set2); - -/** - * Returns true if set1 contains some of the characters and strings - * of set2. It answers the question, 'Does set1 and set2 have an intersection?' - * @param set1 set to be checked for containment - * @param set2 set to be checked for containment - * @return true if the test condition is met - * @stable ICU 3.2 - */ -U_STABLE UBool U_EXPORT2 -uset_containsSome(const USet* set1, const USet* set2); - -/** - * Returns the length of the initial substring of the input string which - * consists only of characters and strings that are contained in this set - * (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), - * or only of characters and strings that are not contained - * in this set (USET_SPAN_NOT_CONTAINED). - * See USetSpanCondition for details. - * Similar to the strspn() C library function. - * Unpaired surrogates are treated according to contains() of their surrogate code points. - * This function works faster with a frozen set and with a non-negative string length argument. - * @param set the set - * @param s start of the string - * @param length of the string; can be -1 for NUL-terminated - * @param spanCondition specifies the containment condition - * @return the length of the initial substring according to the spanCondition; - * 0 if the start of the string does not fit the spanCondition - * @stable ICU 4.0 - * @see USetSpanCondition - */ -U_DRAFT int32_t U_EXPORT2 -uset_span(const USet *set, const UChar *s, int32_t length, USetSpanCondition spanCondition); - -/** - * Returns the start of the trailing substring of the input string which - * consists only of characters and strings that are contained in this set - * (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), - * or only of characters and strings that are not contained - * in this set (USET_SPAN_NOT_CONTAINED). - * See USetSpanCondition for details. - * Unpaired surrogates are treated according to contains() of their surrogate code points. - * This function works faster with a frozen set and with a non-negative string length argument. - * @param set the set - * @param s start of the string - * @param length of the string; can be -1 for NUL-terminated - * @param spanCondition specifies the containment condition - * @return the start of the trailing substring according to the spanCondition; - * the string length if the end of the string does not fit the spanCondition - * @stable ICU 4.0 - * @see USetSpanCondition - */ -U_DRAFT int32_t U_EXPORT2 -uset_spanBack(const USet *set, const UChar *s, int32_t length, USetSpanCondition spanCondition); - -/** - * Returns the length of the initial substring of the input string which - * consists only of characters and strings that are contained in this set - * (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), - * or only of characters and strings that are not contained - * in this set (USET_SPAN_NOT_CONTAINED). - * See USetSpanCondition for details. - * Similar to the strspn() C library function. - * Malformed byte sequences are treated according to contains(0xfffd). - * This function works faster with a frozen set and with a non-negative string length argument. - * @param set the set - * @param s start of the string (UTF-8) - * @param length of the string; can be -1 for NUL-terminated - * @param spanCondition specifies the containment condition - * @return the length of the initial substring according to the spanCondition; - * 0 if the start of the string does not fit the spanCondition - * @stable ICU 4.0 - * @see USetSpanCondition - */ -U_DRAFT int32_t U_EXPORT2 -uset_spanUTF8(const USet *set, const char *s, int32_t length, USetSpanCondition spanCondition); - -/** - * Returns the start of the trailing substring of the input string which - * consists only of characters and strings that are contained in this set - * (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), - * or only of characters and strings that are not contained - * in this set (USET_SPAN_NOT_CONTAINED). - * See USetSpanCondition for details. - * Malformed byte sequences are treated according to contains(0xfffd). - * This function works faster with a frozen set and with a non-negative string length argument. - * @param set the set - * @param s start of the string (UTF-8) - * @param length of the string; can be -1 for NUL-terminated - * @param spanCondition specifies the containment condition - * @return the start of the trailing substring according to the spanCondition; - * the string length if the end of the string does not fit the spanCondition - * @stable ICU 4.0 - * @see USetSpanCondition - */ -U_DRAFT int32_t U_EXPORT2 -uset_spanBackUTF8(const USet *set, const char *s, int32_t length, USetSpanCondition spanCondition); - -/** - * Returns true if set1 contains all of the characters and strings - * of set2, and vis versa. It answers the question, 'Is set1 equal to set2?' - * @param set1 set to be checked for containment - * @param set2 set to be checked for containment - * @return true if the test condition is met - * @stable ICU 3.2 - */ -U_STABLE UBool U_EXPORT2 -uset_equals(const USet* set1, const USet* set2); - -/********************************************************************* - * Serialized set API - *********************************************************************/ - -/** - * Serializes this set into an array of 16-bit integers. Serialization - * (currently) only records the characters in the set; multicharacter - * strings are ignored. - * - * The array - * has following format (each line is one 16-bit integer): - * - * length = (n+2*m) | (m!=0?0x8000:0) - * bmpLength = n; present if m!=0 - * bmp[0] - * bmp[1] - * ... - * bmp[n-1] - * supp-high[0] - * supp-low[0] - * supp-high[1] - * supp-low[1] - * ... - * supp-high[m-1] - * supp-low[m-1] - * - * The array starts with a header. After the header are n bmp - * code points, then m supplementary code points. Either n or m - * or both may be zero. n+2*m is always <= 0x7FFF. - * - * If there are no supplementary characters (if m==0) then the - * header is one 16-bit integer, 'length', with value n. - * - * If there are supplementary characters (if m!=0) then the header - * is two 16-bit integers. The first, 'length', has value - * (n+2*m)|0x8000. The second, 'bmpLength', has value n. - * - * After the header the code points are stored in ascending order. - * Supplementary code points are stored as most significant 16 - * bits followed by least significant 16 bits. - * - * @param set the set - * @param dest pointer to buffer of destCapacity 16-bit integers. - * May be NULL only if destCapacity is zero. - * @param destCapacity size of dest, or zero. Must not be negative. - * @param pErrorCode pointer to the error code. Will be set to - * U_INDEX_OUTOFBOUNDS_ERROR if n+2*m > 0x7FFF. Will be set to - * U_BUFFER_OVERFLOW_ERROR if n+2*m+(m!=0?2:1) > destCapacity. - * @return the total length of the serialized format, including - * the header, that is, n+2*m+(m!=0?2:1), or 0 on error other - * than U_BUFFER_OVERFLOW_ERROR. - * @stable ICU 2.4 - */ -U_STABLE int32_t U_EXPORT2 -uset_serialize(const USet* set, uint16_t* dest, int32_t destCapacity, UErrorCode* pErrorCode); - -/** - * Given a serialized array, fill in the given serialized set object. - * @param fillSet pointer to result - * @param src pointer to start of array - * @param srcLength length of array - * @return true if the given array is valid, otherwise false - * @stable ICU 2.4 - */ -U_STABLE UBool U_EXPORT2 -uset_getSerializedSet(USerializedSet* fillSet, const uint16_t* src, int32_t srcLength); - -/** - * Set the USerializedSet to contain the given character (and nothing - * else). - * @param fillSet pointer to result - * @param c The codepoint to set - * @stable ICU 2.4 - */ -U_STABLE void U_EXPORT2 -uset_setSerializedToOne(USerializedSet* fillSet, UChar32 c); - -/** - * Returns TRUE if the given USerializedSet contains the given - * character. - * @param set the serialized set - * @param c The codepoint to check for within the set - * @return true if set contains c - * @stable ICU 2.4 - */ -U_STABLE UBool U_EXPORT2 -uset_serializedContains(const USerializedSet* set, UChar32 c); - -/** - * Returns the number of disjoint ranges of characters contained in - * the given serialized set. Ignores any strings contained in the - * set. - * @param set the serialized set - * @return a non-negative integer counting the character ranges - * contained in set - * @stable ICU 2.4 - */ -U_STABLE int32_t U_EXPORT2 -uset_getSerializedRangeCount(const USerializedSet* set); - -/** - * Returns a range of characters contained in the given serialized - * set. - * @param set the serialized set - * @param rangeIndex a non-negative integer in the range 0.. - * uset_getSerializedRangeCount(set)-1 - * @param pStart pointer to variable to receive first character - * in range, inclusive - * @param pEnd pointer to variable to receive last character in range, - * inclusive - * @return true if rangeIndex is valid, otherwise false - * @stable ICU 2.4 - */ -U_STABLE UBool U_EXPORT2 -uset_getSerializedRange(const USerializedSet* set, int32_t rangeIndex, - UChar32* pStart, UChar32* pEnd); - -#endif http://git-wip-us.apache.org/repos/asf/couchdb/blob/81332b78/apps/couch_collate/platform/osx/icu/unicode/usetiter.h ---------------------------------------------------------------------- diff --git a/apps/couch_collate/platform/osx/icu/unicode/usetiter.h b/apps/couch_collate/platform/osx/icu/unicode/usetiter.h deleted file mode 100644 index defa75c..0000000 --- a/apps/couch_collate/platform/osx/icu/unicode/usetiter.h +++ /dev/null @@ -1,318 +0,0 @@ -/* -********************************************************************** -* Copyright (c) 2002-2006, International Business Machines -* Corporation and others. All Rights Reserved. -********************************************************************** -*/ -#ifndef USETITER_H -#define USETITER_H - -#include "unicode/utypes.h" -#include "unicode/uobject.h" -#include "unicode/unistr.h" - -/** - * \file - * \brief C++ API: UnicodeSetIterator iterates over the contents of a UnicodeSet. - */ - -U_NAMESPACE_BEGIN - -class UnicodeSet; -class UnicodeString; - -/** - * - * UnicodeSetIterator iterates over the contents of a UnicodeSet. It - * iterates over either code points or code point ranges. After all - * code points or ranges have been returned, it returns the - * multicharacter strings of the UnicodSet, if any. - * - * This class is not intended to be subclassed. Consider any fields - * or methods declared as "protected" to be private. The use of - * protected in this class is an artifact of history. - * - *

To iterate over code points and strings, use a loop like this: - *

- * UnicodeSetIterator it(set);
- * while (set.next()) {
- *     processItem(set.getString());
- * }
- * 
- *

Each item in the set is accessed as a string. Set elements - * consisting of single code points are returned as strings containing - * just the one code point. - * - *

To iterate over code point ranges, instead of individual code points, - * use a loop like this: - *

- * UnicodeSetIterator it(set);
- * while (it.nextRange()) {
- *   if (it.isString()) {
- *     processString(it.getString());
- *   } else {
- *     processCodepointRange(it.getCodepoint(), it.getCodepointEnd());
- *   }
- * }
- * 
- * @author M. Davis - * @stable ICU 2.4 - */ -class U_COMMON_API UnicodeSetIterator : public UObject { - - protected: - - /** - * Value of codepoint if the iterator points to a string. - * If codepoint == IS_STRING, then examine - * string for the current iteration result. - * @stable ICU 2.4 - */ - enum { IS_STRING = -1 }; - - /** - * Current code point, or the special value IS_STRING, if - * the iterator points to a string. - * @stable ICU 2.4 - */ - UChar32 codepoint; - - /** - * When iterating over ranges using nextRange(), - * codepointEnd contains the inclusive end of the - * iteration range, if codepoint != IS_STRING. If - * iterating over code points using next(), or if - * codepoint == IS_STRING, then the value of - * codepointEnd is undefined. - * @stable ICU 2.4 - */ - UChar32 codepointEnd; - - /** - * If codepoint == IS_STRING, then string points - * to the current string. If codepoint != IS_STRING, the - * value of string is undefined. - * @stable ICU 2.4 - */ - const UnicodeString* string; - - public: - - /** - * Create an iterator over the given set. The iterator is valid - * only so long as set is valid. - * @param set set to iterate over - * @stable ICU 2.4 - */ - UnicodeSetIterator(const UnicodeSet& set); - - /** - * Create an iterator over nothing. next() and - * nextRange() return false. This is a convenience - * constructor allowing the target to be set later. - * @stable ICU 2.4 - */ - UnicodeSetIterator(); - - /** - * Destructor. - * @stable ICU 2.4 - */ - virtual ~UnicodeSetIterator(); - - /** - * Returns true if the current element is a string. If so, the - * caller can retrieve it with getString(). If this - * method returns false, the current element is a code point or - * code point range, depending on whether next() or - * nextRange() was called. - * Elements of types string and codepoint can both be retrieved - * with the function getString(). - * Elements of type codepoint can also be retrieved with - * getCodepoint(). - * For ranges, getCodepoint() returns the starting codepoint - * of the range, and getCodepointEnd() returns the end - * of the range. - * @stable ICU 2.4 - */ - inline UBool isString() const; - - /** - * Returns the current code point, if isString() returned - * false. Otherwise returns an undefined result. - * @stable ICU 2.4 - */ - inline UChar32 getCodepoint() const; - - /** - * Returns the end of the current code point range, if - * isString() returned false and nextRange() was - * called. Otherwise returns an undefined result. - * @stable ICU 2.4 - */ - inline UChar32 getCodepointEnd() const; - - /** - * Returns the current string, if isString() returned - * true. If the current iteration item is a code point, a UnicodeString - * containing that single code point is returned. - * - * Ownership of the returned string remains with the iterator. - * The string is guaranteed to remain valid only until the iterator is - * advanced to the next item, or until the iterator is deleted. - * - * @stable ICU 2.4 - */ - const UnicodeString& getString(); - - /** - * Advances the iteration position to the next element in the set, - * which can be either a single code point or a string. - * If there are no more elements in the set, return false. - * - *

- * If isString() == TRUE, the value is a - * string, otherwise the value is a - * single code point. Elements of either type can be retrieved - * with the function getString(), while elements of - * consisting of a single code point can be retrieved with - * getCodepoint() - * - *

The order of iteration is all code points in sorted order, - * followed by all strings sorted order. Do not mix - * calls to next() and nextRange() without - * calling reset() between them. The results of doing so - * are undefined. - * - * @return true if there was another element in the set. - * @stable ICU 2.4 - */ - UBool next(); - - /** - * Returns the next element in the set, either a code point range - * or a string. If there are no more elements in the set, return - * false. If isString() == TRUE, the value is a - * string and can be accessed with getString(). Otherwise the value is a - * range of one or more code points from getCodepoint() to - * getCodepointeEnd() inclusive. - * - *

The order of iteration is all code points ranges in sorted - * order, followed by all strings sorted order. Ranges are - * disjoint and non-contiguous. The value returned from getString() - * is undefined unless isString() == TRUE. Do not mix calls to - * next() and nextRange() without calling - * reset() between them. The results of doing so are - * undefined. - * - * @return true if there was another element in the set. - * @stable ICU 2.4 - */ - UBool nextRange(); - - /** - * Sets this iterator to visit the elements of the given set and - * resets it to the start of that set. The iterator is valid only - * so long as set is valid. - * @param set the set to iterate over. - * @stable ICU 2.4 - */ - void reset(const UnicodeSet& set); - - /** - * Resets this iterator to the start of the set. - * @stable ICU 2.4 - */ - void reset(); - - /** - * ICU "poor man's RTTI", returns a UClassID for this class. - * - * @stable ICU 2.4 - */ - static UClassID U_EXPORT2 getStaticClassID(); - - /** - * ICU "poor man's RTTI", returns a UClassID for the actual class. - * - * @stable ICU 2.4 - */ - virtual UClassID getDynamicClassID() const; - - // ======================= PRIVATES =========================== - - protected: - - // endElement and nextElements are really UChar32's, but we keep - // them as signed int32_t's so we can do comparisons with - // endElement set to -1. Leave them as int32_t's. - /** The set - * @stable ICU 2.4 - */ - const UnicodeSet* set; - /** End range - * @stable ICU 2.4 - */ - int32_t endRange; - /** Range - * @stable ICU 2.4 - */ - int32_t range; - /** End element - * @stable ICU 2.4 - */ - int32_t endElement; - /** Next element - * @stable ICU 2.4 - */ - int32_t nextElement; - //UBool abbreviated; - /** Next string - * @stable ICU 2.4 - */ - int32_t nextString; - /** String count - * @stable ICU 2.4 - */ - int32_t stringCount; - - /** - * Points to the string to use when the caller asks for a - * string and the current iteration item is a code point, not a string. - * @internal - */ - UnicodeString *cpString; - - /** Copy constructor. Disallowed. - * @stable ICU 2.4 - */ - UnicodeSetIterator(const UnicodeSetIterator&); // disallow - - /** Assignment operator. Disallowed. - * @stable ICU 2.4 - */ - UnicodeSetIterator& operator=(const UnicodeSetIterator&); // disallow - - /** Load range - * @stable ICU 2.4 - */ - virtual void loadRange(int32_t range); - -}; - -inline UBool UnicodeSetIterator::isString() const { - return codepoint == (UChar32)IS_STRING; -} - -inline UChar32 UnicodeSetIterator::getCodepoint() const { - return codepoint; -} - -inline UChar32 UnicodeSetIterator::getCodepointEnd() const { - return codepointEnd; -} - - -U_NAMESPACE_END - -#endif