commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <>
Subject Re: [lang] Longest common substring / Suffix Tree
Date Mon, 12 Mar 2012 20:36:39 GMT
Am 12. März 2012 21:20 schrieb Thomas Neidhart <>:
> Hi,
> on the weekend, I started to work on issue LANG-680
> (, which is about adding
> support for finding the longest common substring of a set of Strings.
> Suffix Trees are a standard data structure to efficiently solve this
> problem, and I created a variant of Ukkonen's algorithm to construct
> such a tree in linear time.
> After some research to create a generalized version of it (to support
> multiple strings), I found a very nice implementation from Alessandro
> Bahgat (, which is very similar to
> my own, and contacted the author if he is interested to put his code
> under apache license (which he did already).
> So the question basically is how to proceed, as adding a data structure
> like a Suffix Tree to commons-lang may be out-of-scope. On the other
> hand there are several string related problems that can be efficiently
> solved with a Suffix Tree:
>  * longest common substring
>  * longest repeated substring
>  * exact string set matching
>  * palindrom finding
>  * finding frequent substrings

Maybe it would make sense to create an new class "SubstringUtils"
instead of adding all that logic to StringUtils (which already has a
fair amount of funtionality).

> Implementing a longest common substring for two strings is trivial
> (using dynamic programming ~20 lines of code), but for a set of strings
> a suffix tree would be needed.
> Henri already outlined a possible API as comment to the issue, which I
> like, so what do other people think about adding a suffix tree to
> commons-lang?
> Thomas
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message