commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <>
Subject Re: [codec] getting the bmpm code out there
Date Thu, 11 Aug 2011 19:56:40 GMT
Hello All!

Topic 1: Housekeeping: package name and POM.

The next codec release out of trunk will be major release labeled 2.0,
the current release is 1.5.

In trunk, I've removed deprecated methods and the project now requires
Java 5. This means 2.0 will not be a drop-in binary compatible release
for 1.5.

I'd like to confirm or deny that this means the package name will
change to o.a.c.codec2 and that the POM groupId will have to change
from commons-codec to org.apache.commons. 2.0 and 1.5 would be able to
live side by side.

I'd like to get this out of the way first hence topic 1.

Topic 2: Beider-Morse (BM) Encoder API

BM is a new codec for 2.0.

The encode API returns a set of encodings.

In trunk, this is currently a String in the format "s1|s2|s3".

I think this is not the best design, a set should be a Set, in this
case, an ordered set. Or, a List. Generally, it should be a Collection
of Strings.

There was concern with call sites that generically use a [codec]
Encoder with the signature "Object encoder(Object)" and call
toString() on the result.

If we set the API to "CharSequence encode(Set<CharSequence>)" or
"String encode(Set<String>)", doing a toString() on a HashSet will
yield a usable String similar as to what trunk does now. For example,
for a HashSet of Strings "a", "b" and "c", HashSet.toString() returns
"[a, b, c]" which no worse than "a|b|c" IMO. At least it is a
documented and stable format.

Topic 3: Generics

This will be in a separate thread but I'd like to get this in 2.0
because this will likely break the API and I only want to break things
once and not have to do a codec3 for generics.

Thank you all,

On Thu, Aug 11, 2011 at 2:38 PM, Matthew Pocock
<> wrote:
> Hi,
> As those of you who've been following the CODEC-125 ticket will know, with
> Greg's help I've got a port of the beider morse phonetic
> matching (bmpm) algorithm in as a string encoder. As far as I can tell, it's
> ready for people to use and abuse. It ideally needs more test-case words,
> but to the best of my knowledge it doesn't have any horrendous bugs or
> performance issues.
> The discussion on the ticket started to stray off bmpm and on to policy for
> releases and changing APIs, and Sebb said we should discuss it on the list.
> So, here we are.
> Ideally, I'd like there to be a release of commons-codec some time soon so
> that users can start to try out bmpm right away, and so that we can start
> the process of adding it to the list of supported indexing methods in solr.
> What do people think?
> Matthew
> --
> Dr Matthew Pocock
> Visitor, School of Computing Science, Newcastle University
> mailto:
> gchat:
> msn:
> drdozer
> tel: (0191) 2566550
> mob: +447535664143

Thank you,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message