commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Neidhart <thomas.neidh...@gmail.com>
Subject Re: [lang] Longest common substring / Suffix Tree
Date Tue, 13 Mar 2012 23:08:21 GMT
On 03/13/2012 08:55 AM, Luc Maisonobe wrote:
> Le 13/03/2012 00:53, James Carman a écrit :
>> A lot of bioinformaticians would love us if we added this!

I picked this topic up as I find it interesting to myself and it would
be a useful addition for many other people too I guess, but from what I
have seen so far, bioinformaticians wouldn't be necessarily impressed by
that ;-). Afaik they have pretty good tools, and there exist special
algorithms to compute suffix trees for really large strings in clusters
or on disk as they wont fit in memory anymore.

> In the same spirit, I know an implementation of the Myers difference
> algorithm that runs on any object implementing equals and also provides
> an API for browsing the "edit script" resulting from the comparison.
> This allows for example to retrieve only the shared elements, or only
> the ones in the first or the second sequence, or "running" the script,
> or whatever.
> 
> If you consider this could be a good addition to [lang] or another
> component ([graph] ?) I can ask for a grant for this.

this would be a perfect companion for the longest common substring
problem, the o.a.c.l.text package looks like a good fit for these things
imho.

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message