commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eyal Allweil (JIRA)" <>
Subject [jira] [Commented] (TEXT-19) Add alphabet converter
Date Mon, 19 Sep 2016 07:21:21 GMT


Eyal Allweil commented on TEXT-19:

I opened [a new pull request|]. Some of Rob's
comments are addressed there:

- I removed the doNotEncodeMap data member (the only price is a slightly more expensive check
when decoding)
- I added a null check for the equals method.
- I added an example of usage to the javadoc
- I took care of the stylistic differences he mentioned

I didn't address the following, which can be discussed:

- Do we want to accommodate non-invertible or non-decodable encodings (e.g. new AlphabetConverter([‘a’,’b’,’c’,’d’],[‘a’,’e’,’f’,’e’],[‘a’]))?
- Do we want to accommodate alphabets over concatenated chars (e.g. new AlphabetConverter([‘ab’,’c’,’d’,e’],[‘a’,’k’,’hi’,’z’],[]))?
- the name of the class 

> Add alphabet converter
> ----------------------
>                 Key: TEXT-19
>                 URL:
>             Project: Commons Text
>          Issue Type: New Feature
>            Reporter: Eyal Allweil
>             Fix For: 1.0
> (as described in [the mailing list|])
> This is a utility class I wrote for converting from one alphabet to another - for example,
from unicode to latin, without using some of the chars in latin. The usage looks like this:
> {code}
> Set<Character> originals; // a, b, c, d
> Set<Character> encoding; // 0, 1, d
> Set<Character> doNotEncode; // d
> AlphabetConverter ac = AlphabetConverter.createConverter(originals, encoding, doNotEncode);
> ac.encode("a"); // 00
> ac.encode("b"); // 01
> ac.encode("c"); // 0d
> ac.encode("d"); // d
> ac.encode("abcd"); // 00010dd
> {code}
> Of course, x.equals(ac.decode(ac.encode(x))) should always be true.
> The implementation provided makes the encodings of fixed length, other than the "do not
encode" chars, which remain as they are (length one).
> In addition, in order to make it easier to preserve the encoding scheme, I've added a
human-readable toString implementation, and a constructor that can recreate an AlphabetConverter
from the encoding map, such that:
> {code}
> AlphabetConverter ac;
> ac.equals(AlphabetConverter.createConverterFromMap(ac.getOriginalToEncoded())); // always
should be true
> {code}

This message was sent by Atlassian JIRA

View raw message