commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eyal Allweil <eyal_allw...@yahoo.com.INVALID>
Subject Re: [LANG] Add alphabet conversion API
Date Wed, 07 Sep 2016 14:27:43 GMT
Hi Simo,
I'm not sure I understood how BitSets would be used in this case. For example, an example
with chars might look like this.
AlphabetConverter ac = new AlphabetConverter(['a','b','c','d'], ['a','e','f','g'],['a']) //
'a' is not encoded

and the mapping would become a -> a, b -> e, c -> f, d -> g
so encoding encode("abc") would become "aef".
Ints can be used instead of chars to support unicode code points that don't fit in a single
char (which was our case, but if that seems overkill, the chars implementation is much more
direct).
How did you mean the BitSet to be used?
Regards,Eyal

 

    On Thursday, September 1, 2016 12:26 PM, Simone Tripodi <simonetripodi@apache.org>
wrote:
 

 Hi,I personally think it would a very "nice to have" feature, I had to face similar issues
in the past and, if that feature was available would have saved me developing time.
I just have a small request/suggestion: since int/char can be casted to each other, I would
use BitSets rather than Sets.
Good luck!-Simo

http://people.apache.org/~simonetripodi/
http://twitter.com/simonetripodi
On Thu, Sep 1, 2016 at 10:53 AM, Eyal Allweil <eyal_allweil@yahoo.com.invalid> wrote:

Hi guys,
Would you be interested in adding a utility class that creates alphabet converters, perhaps
using a helper method available from StringUtils? It doesn't have to stay the way it is now,
but the API for the class - AlphabetConverter - is currently:
/** * The input is integers representing code points, but we can make it accept chars as
well * * doNotEncode represents chars we want to leave in the original state (not to encode
them using the chars in encoding) */
public AlphabetConverter(Set<Integer> original, Set<Integer> encoding, Set<Integer>
doNotEncode);
public String encode (String original);

public String decode (String encoded);
In StringUtils, we could add

public AlphabetConverter getAlphabetConverter (Set<Integer> original, Set<Integer>
encoding, Set<Integer> doNotEncode);
I used it to convert from unicode to latin letters, without using any chars I wanted as delimiters,
and preserving the English alphabet as is for readability. If you'd like to add it, I'll clean
up the code and prepare it for a pull request so you can review it.

It makes sense to me to add a method that returns the HashMaps used internally for the mappings
so they can be serialized (and deserialized) for preserving the mapping.
Regards,Eyal Allweil (PayPal)







   
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message