commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <benerit...@googlemail.com>
Subject Re: [lang] Longest common substring / Suffix Tree
Date Mon, 12 Mar 2012 20:36:39 GMT
Am 12. März 2012 21:20 schrieb Thomas Neidhart <thomas.neidhart@gmail.com>:
> Hi,
>
> on the weekend, I started to work on issue LANG-680
> (https://issues.apache.org/jira/browse/LANG-680), which is about adding
> support for finding the longest common substring of a set of Strings.
>
> Suffix Trees are a standard data structure to efficiently solve this
> problem, and I created a variant of Ukkonen's algorithm to construct
> such a tree in linear time.
>
> After some research to create a generalized version of it (to support
> multiple strings), I found a very nice implementation from Alessandro
> Bahgat (https://github.com/abahgat/suffixtree), which is very similar to
> my own, and contacted the author if he is interested to put his code
> under apache license (which he did already).
>
> So the question basically is how to proceed, as adding a data structure
> like a Suffix Tree to commons-lang may be out-of-scope. On the other
> hand there are several string related problems that can be efficiently
> solved with a Suffix Tree:
>
>  * longest common substring
>  * longest repeated substring
>  * exact string set matching
>  * palindrom finding
>  * finding frequent substrings
>

Maybe it would make sense to create an new class "SubstringUtils"
instead of adding all that logic to StringUtils (which already has a
fair amount of funtionality).

> Implementing a longest common substring for two strings is trivial
> (using dynamic programming ~20 lines of code), but for a set of strings
> a suffix tree would be needed.
>
> Henri already outlined a possible API as comment to the issue, which I
> like, so what do other people think about adding a suffix tree to
> commons-lang?
>
> Thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message