lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mohammad Norouzi" <mnr...@gmail.com>
Subject Re: WhitespaceAnalyzer [was: Re: regaridng Reader.terms()]
Date Tue, 29 May 2007 05:39:03 GMT
Hi Chris,

>
>     * It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or
>       PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0',
>       '\u2007', '\u202F').
>     * It is '\u0009', HORIZONTAL TABULATION.
>     * It is '\u000A', LINE FEED.
>     * It is '\u000B', VERTICAL TABULATION.
>     * It is '\u000C', FORM FEED.
>     * It is '\u000D', CARRIAGE RETURN.
>     * It is '\u001C', FILE SEPARATOR.
>     * It is '\u001D', GROUP SEPARATOR.
>     * It is '\u001E', RECORD SEPARATOR.
>     * It is '\u001F', UNIT SEPARATOR.






...are there Persian characters with a category type of SPACE_SEPARATOR,
> LINE_SEPARATOR, or PARAGRAPH_SEPARATOR ?
>
>
>
How can I know that?

-- 
Regards,
Mohammad
--------------------------
see my blog: http://brainable.blogspot.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message