commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Horman" <>
Subject [functor] generators
Date Wed, 23 Apr 2003 08:28:57 GMT
Sorry if this is long winded...

I find the CollectionAlgorithms very useful. Particularly collect (map),
reject, and select (filter). The problem is that right now
CollectionAlgorithms only works with Collections, or Iterators, hence
"Collection"Algorithms. I would like to be able to use these algorithms when
dealing with files, readers, sockets, directories, whatever. The problem
with just creating Iterators for files, and sockets, etc, is that iterators
are not particularly good at managing state. Example:

CollectionAlgorithms.collect(new FileIterator(file), function);

When will the file pointer be closed, when !hasNext()? How could we stop the
iterator early and still close the file? Should CollectionAlgorithms know
this is a closeable iterator and call close() at the end of collect. The
other option would be:

FileIterator iter = new FileIterator(file);
CollectionAlgorithms.collect(iter, function);

The whole point of functors (for me) is to reduce the amount of code
written, this example triples it. Also, in true functional style, the 1st
example has less side effects.

Supporting "closeable" in iterators seems strange anyway, and becomes messy
when CollectionAlgorithms has to do instanceof checks to see if iterators
have to close.

I think a much more elegant solution would be to support what I am calling
generators (kind of like Python). A generator is a iterator flipped upside
down. It generates records to be consumed. A generator is "run" and given a
function to execute. The generator can then maintain its own state. For
example, a file line generator might have this run method:

public void run(UnaryProcedure proc) {
    BufferedReader reader = new BufferedReader(new FileReader(file));
    try {
        String line = null;
        while(!isStopped() && (line = lineReader.readLine()) != null) {
    } catch (IOException e) {
        throw new GeneratorException(e);
    } finally {

The generator opens the file, performs the looping, and closes the file.

Now for convenience sake, Generators might as well implement each of the
CollectionAlgorithms internally as well. This would allow you to do things
like this.

new EachLine(file).select(predicate);

I also think that these "algorithms" should be "higher order" and not return
Collections, but return Generators. This provides a few benefits.

new EachLine(file)
                .to(new ArrayList()); (lispy huh)

Notice the to() method is what turns the final generator into a collection.
Collections are supported, as well as generic transformers for turning
generators into whatever is needed.

The other benefit is that "generation" doesn't start until the final
generator is actually run. Each additional "algorithm" just returns a new
generator, wrapping the previous one, that does the right thing when
executed. In the code example above, with the current framework, 3
collections would have been created, but only 1 was created here.

Generators can be stopped. For example, I might want to do this:

new EachLine(file).until(new MaxIterations(10)).collect(function);

Meaning only collect 10 lines from the file. until() is defined as such:

// return a new generator that wraps the passed in generator that when run
// stops when the predicate becomes true.
public static final Generator until(final Generator gen, final
UnaryPredicate pred) {
    return new Generator(gen) {
        public void run(final UnaryProcedure proc) {
   UnaryProcedure() {
                public void run(Object obj) {
                    if (pred.test(obj)) {
                    } else {

I have written all of this already and want to find out what people think.
The tar listing looks like this:


I have changed CollectionAlgorithms quite a bit, so I actually copied it to:


Iterators and generators are supported. Iterators are just wrapped with the

EachLine is the most useful generator right now. It can generate the lines
of a file or reader, and optionally close the streams at the end of

EachElement is for generating from Collections, Maps, or Object[].

NumberRange will generate numbers within a range. (more python)

Everything is unit tested, mostly :) I wasn't too sure how to unit test the
EachLine(file) constructor and achieve 100% coverage without actually
opening a file. I suppose I could just open the maven project file or
something. All of the original CollectionAlgorithms unit tests pass for

I also added an additional non generator method to Algorithms called
recurse. It is a simple method for supporting tail recursion. BinarySearch
is an example of a RecursiveFunction.

I am not sure if commons protocol would have me attach the tar.gz file to
this email, so I won't. Let me know if your interested and I can either
email commons or directly.

-jason horman

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message