Return-Path: Delivered-To: apmail-jakarta-commons-dev-archive@apache.org Received: (qmail 70107 invoked from network); 22 Aug 2002 21:02:17 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 22 Aug 2002 21:02:17 -0000 Received: (qmail 25429 invoked by uid 97); 22 Aug 2002 21:02:12 -0000 Delivered-To: qmlist-jakarta-archive-commons-dev@jakarta.apache.org Received: (qmail 25361 invoked by uid 97); 22 Aug 2002 21:02:11 -0000 Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Jakarta Commons Developers List" Reply-To: "Jakarta Commons Developers List" Delivered-To: mailing list commons-dev@jakarta.apache.org Received: (qmail 25214 invoked by uid 98); 22 Aug 2002 21:02:10 -0000 X-Antivirus: nagoya (v4218 created Aug 14 2002) From: Ola Berg To: commons-dev@jakarta.apache.org Reply-To: Ola Berg MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit User-Agent: Tripnet Webmail (IMP/PHP) Sender: ola.berg@arkitema.se Subject: [lang] charset strings char utils etc Message-Id: Date: Thu, 22 Aug 2002 23:01:37 +0200 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Before you elaborate too much on the miniature RE thing, let me briefly explain what I have: /** An IntClass defines a possibly infinite set of ints, like fx primes or even numbers. This is a kind of predicate */ public interface IntClass { /** evaluates to true if the int i belongs*/ public boolean isA( int i); } /** The int needs per contract to in fact be an unsigned byte, encoded as an int like in the input streams. */ public interface UnsignedByteClass extends IntClass{} /** The int needs per contract to in fact be a char, encoded as an int like in the readers. */ public interface CharClass extends IntClass{} The CharClassUtils contains a lot of constants /** All white space */ public static final CharClass WS; /** All horizontal white space */ public static final CharClass HWS plus things like SEPARATORS, STRING_DELIMITERS and other common western character groups plus constants for the ISO-8859-1 entity names of all characters (COMMA, AMPERSAND etc) plus boolean decorators like in PredicateUtils etc. Then I have a CharStream interface that isolates the stream view of a Reader (I used the Reader before, but run into trouble since Reader isn\'t an interface, and having both a stream view and a block view). public interface CharStream { /** -1 indicates end of stream, use CharClassUtils.END_OF_STREAM **/ public int read() throws IOException; public void close() throws IOException; } CharStreamUtils provides adapters to and from Readers, Strings and char[], plus the ability to merge streams in different ways. CharStreamUtils also provides the methods: public static String readTo( CharClass cc) throws IOException; public static String readWhile( CharClass cc) throws IOException; public static String readToInc( String endString) throws IOException; for easy scanning of a char stream. These are located in the nu.viggo.text package. There is also a parser package that builds on top of this allowing for the creation of very efficient parsers. The parser tools deals with buffered implementations of the CharStream. I find this simple architecture very useful and clear when it comes to string and byte handling (such as when implementing network protocols etc). Yes, there is a ByteStream working in a similar way. Is this something for lang\'s char and byte utility functionality? Small footprint, easy, small public interface? /O -------------------- ola.berg@arkitema.se 0733 - 99 99 17 -- To unsubscribe, e-mail: For additional commands, e-mail: