commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ola Berg <>
Subject [lang] charset strings char utils etc
Date Thu, 22 Aug 2002 21:01:37 GMT
Before you elaborate too much on the miniature RE thing, let me briefly explain what I have:

An IntClass defines a possibly infinite set of ints,
like fx primes or even numbers. This is a kind of
public interface IntClass {
    /** evaluates to true if the int i belongs*/
    public boolean isA( int i); 

The int needs per contract to in fact be an unsigned byte, encoded as an int like
in the input streams.
public interface UnsignedByteClass extends IntClass{}

The int needs per contract to in fact be a char, encoded as an int like in the readers.
public interface CharClass extends IntClass{}

The CharClassUtils contains a lot of constants

All white space
public static final CharClass WS;

All horizontal white space
public static final CharClass HWS

plus things like SEPARATORS, STRING_DELIMITERS and other common western character groups

plus constants for the ISO-8859-1 entity names of all characters (COMMA, AMPERSAND etc)

plus boolean decorators like in PredicateUtils


Then I have a CharStream interface that isolates the stream view of a Reader (I used the Reader
before, but run into trouble since Reader isn\'t an interface, and having both a stream view
and a block view).

public interface CharStream {
    -1 indicates end of stream, use
    public int read() throws IOException;

    public void close() throws IOException;

CharStreamUtils provides adapters to and from Readers, Strings and char[], plus the ability
to merge streams in different ways.

CharStreamUtils also provides the methods:

public static String readTo( CharClass cc) throws IOException;
public static String readWhile( CharClass cc) throws IOException;
public static String readToInc( String endString) throws IOException; 

for easy scanning of a char stream. These are located in the nu.viggo.text package. There
is also a parser package that builds on top of this allowing for the creation of very efficient
parsers. The parser tools deals with buffered implementations of the CharStream.

I find this simple architecture very useful and clear when it comes to string and byte handling
(such as when implementing network protocols etc). Yes, there is a ByteStream working in a
similar way.

Is this something for lang\'s char and byte utility functionality? Small footprint, easy,
small public interface? 


0733 - 99 99 17

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message