commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rich Dougherty" <r...@rd.gen.nz>
Subject RE: [Collections] [SUBMIT] Trie
Date Fri, 06 Dec 2002 22:04:59 GMT
> I do think there is room for negotiation about the exact naming and
> behaviour of the methods in the interface.  Returning a Set of
> Map.Entrys seems to be useful, but returning a Trie would be tie closer
> with the SortedMap interface.  I haven't got a strong opinion here but
> it would be nice to hear the pros and cons.

This is simple to implement and I cannot think of any
disadvantages. It sounds like a good idea to me.

> And regarding comments from Pranas - I can't think where /I/ would want
> to use a non-string based trie, but I don't see what's gained by
> restricting the interface to Strings.
>
> <thinking-out-loud>
> Maybe Gene sequences would be a useful nonstring example... Sure they
> can be represented as Strings of A,G,C,Ts but they must be better served
> using a 2 bit encoding rather than 16?
> </thinking-out-loud>

Any information which is indexed by a sequence would be
appropriate. Strings, genes and numbers are examples where the
sequence is made up of a finite "alphabet". Other examples which are a
sequence with an infinite alphabet are URIs and file names.

> > I agree that we need to focus on the interface and
> > implementation to make sure that they're the best we can come up with.
>
> +1 I think this is probably the best aim.  However it's pretty difficult
> to work out the best interface without using it a bit.  Maybe we should
> commit the current code with a strong "THIS IS ALPHA" warning and let
> people start to use the code and submit patches to get the abstraction
> right?  We already have a precedent in the primitives subpackage of code
> that is in CVS but isn't considered ready for release so I don't see
> that concerns over back compat should be an issue.

Ok, I'll jump in and level a few criticisms at the interface and
implementation I submitted. :-)

--- Trie - is it the right name? ---

I must admit, the more I've looked at this, the more uncomfortable I
am with the name. In my reading, I've only ever heard 'trie' applied
to a data structure indexed by strings. However, a trie does seem to
fit into a more general class of data structures. It looks like these
structures are called prefix trees--although I can't find a solid
definition. However, it may be a very elementary data structure:

  http://proteacher.net/archive/posts/2001/09/26/21017.html

So conceptually, we have this sort of breakdown:

Prefix trees
 - Prefix trees for handling strings
    - Trie
    - The data structure used in suffix trees
 - Prefix trees for handling binary data
    - Patricia trees
 - Prefix trees for handling sequences of anything
    - AbstractObjectArrayTrie (which should be renamed)

--- Separating the interface from the implementation ---

Perhaps the interface doesn't need to be named after the
implementation at all. There may be other data structures which
provide efficient operations on the prefix of keys, but aren't
implemented as a prefix tree.

--- Suggestion ---

Here's a rough suggestion.

/**
 * A Map which provides several operations based on the
 * prefix of keys.
 */
public interface PrefixMap extends Map {

    public PrefixMap subPrefixMap(Object keyPrefix);

    public Map.Entry getLongest(Object candidateKey);

}

There is a nice symmetry in providing a SortedPrefixMap, although
actual implementation would involve a reasonable amount of work. (But
I'll do it if there is consensus about it.)

public interface SortedPrefixMap extends PrefixMap, SortedMap {

    // This would return a SortedPrefixMap. I can't override the
    // return type, can I?
    public PrefixMap subPrefixMap(Object keyPrefix);

}

/**
 * An implementation of PrefixMap. Subclasses must provide an
 * implementation for translating keys to and from object arrays
 * and a mechanism for storing outgoing references.
 */
public abstract class AbstractObjectArrayPrefixTree
        implements PrefixMap {
    ...
}

public abstract class AbstractObjectArraySortedPrefixTree
        extends AbstractObjectArrayPrefixTree
        implements SortedPrefixMap {

    // extra SortedMap methods

    // requires subclasses to return items in order with
    // outgoingTransitionIterator().

    ...
}

/**
 * Stores outgoing references in a Map, subclasses must
 * still provide an implementation for translating keys to
 * and from object arrays.
 */
public abstract class AbstractOutgoingTransitionPrefixMap
        extends AbstractObjectArrayPrefixTree {
    ...
}

public abstract class AbstractOutgoingTransitionSortedPrefixMap
        extends AbstractObjectArraySortedPrefixTree {
    // needs to be passed in comparator
    ...
}

/**
 * A SortedPrefixMap for strings.
 */
public class Trie implements SortedPrefixMap
        extends AbstractObjectArraySortedPrefixTree {
    // Future implementations may not extend this
    // class. They may be more efficient, and based
    // on native chars or ints.
    ...
}

Any comments on this? Particularly:

- The hierarchy of abstract classes. Should it be flattened? At the
  moment I'm separating out storage of the transitions. This is nice,
  but perhaps it clutters the package too much?

- Stephen has mentioned some sort of TrieUtils? Perhaps this should be
  called PrefixMapUtils, or go into MapUtils.

Rich



--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message