commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eyal Allweil (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LANG-1266) Add alphabet converter
Date Tue, 13 Sep 2016 08:09:20 GMT
Eyal Allweil created LANG-1266:
----------------------------------

             Summary: Add alphabet converter
                 Key: LANG-1266
                 URL: https://issues.apache.org/jira/browse/LANG-1266
             Project: Commons Lang
          Issue Type: New Feature
          Components: lang.text.*
            Reporter: Eyal Allweil


(as described in [the mailing list|http://mail-archives.apache.org/mod_mbox/commons-dev/201609.mbox/%3c289983494.3057706.1472720010277@mail.yahoo.com%3e])

This is a utility class I wrote for converting from one alphabet to another - for example,
from unicode to latin, without using some of the chars in latin. The usage looks like this:

{code}
Set<Character> originals; // a, b, c, d
Set<Character> encoding; // 0, 1, d
Set<Character> doNotEncode; // d

AlphabetConverter ac = AlphabetConverter.createConverter(originals, encoding, doNotEncode);

ac.encode("a"); // 00
ac.encode("b"); // 01
ac.encode("c"); // 0d
ac.encode("d"); // d
ac.encode("abcd"); // 00010dd

{code}

Of course, x.equals(ac.decode(ac.encode(x))) should always be true.

The implementation provided makes the encodings of fixed length, other than the "do not encode"
chars, which remain as they are (length one).

In addition, in order to make it easier to preserve the encoding scheme, I've added a human-readable
toString implementation, and a constructor that can recreate an AlphabetConverter from the
encoding map, such that:

{code}
AlphabetConverter ac;

ac.equals(AlphabetConverter.createConverterFromMap(ac.getOriginalToEncoded())); // always
should be true
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message