commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrian Crum (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COLLECTIONS-577) PatriciaTrie bugs when only a few bits change
Date Fri, 25 Sep 2015 22:46:04 GMT

    [ https://issues.apache.org/jira/browse/COLLECTIONS-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908800#comment-14908800
] 

Adrian Crum commented on COLLECTIONS-577:
-----------------------------------------

(Copied from original description)

Take the following code for example, which shows the bugs happening in a very simple context.
I create two strings. The first string is always a prefix of the second string. Here are the
characters I am using:

final char u0000 = Character.toChars(0)[0]; // U+0000 (0000000000000000)
final char u0001 = Character.toChars(1)[0]; // U+0001 (0000000000000001)
final char u8000 = Character.toChars(32768)[0]; // U+8000 (1000000000000000)
final char uffff = Character.toChars(65535)[0]; // U+FFFF (1111111111111111)
final char char_b = 'b'; // 1100010
final char char_c = 'c'; // 1100011
final PatriciaTrie<String> trie = new PatriciaTrie<>();

And here is a quick example of the trie working correctly, showing how it should work:

final String prefixString = "" + char_b;
final String longerString = prefixString + char_c;
System.out.println(trie.prefixMap(prefixString)); //
{b=prefixString, bc=longerString}

// correct!

In the first example, I show that a character who's bits are all zeros is always mishandled:

final String prefixString = "" + char_b;
final String longerString = prefixString + u0000;

System.out.println(prefixString);
System.out.println(prefixString.length()); // 1

System.out.println(longerString);
System.out.println(longerString.length()); // 2

System.out.println(longerString.startsWith(prefixString)); // true

trie.put(prefixString, "prefixString");
System.out.println(trie); // prefixString is in

trie.put(longerString, "longerString");
System.out.println(trie); // only longerString shown... prefixString has disappeared

System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but should be 2
System.out.println(trie.prefixMap(prefixString)); //
{b =longerString}

// prefixString should be here, but isn't

trie.put(prefixString, "prefixString");
System.out.println(trie); // prefixString is in again, but longerString has now disappeared

System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but should be 2
System.out.println(trie.prefixMap(prefixString)); //
{b=prefixString}

// longerString should be here, but isn't

Next, I show that if the longer string is only 1 bit longer (ignoring zeros) than the prefix
string, then the PatriciaTree fails to include it in the prefix map.
Here the string would look like: 0000000001100011 1000000000000000

final String prefixString = "" + char_c; // you can use any character for the prefix, same
results
final String longerString = prefixString + u8000;

System.out.println(prefixString);
System.out.println(prefixString.length()); // 1

System.out.println(longerString);
System.out.println(longerString.length()); // 2

System.out.println(longerString.startsWith(prefixString)); // true

trie.put(prefixString, "prefixString");
System.out.println(trie); // prefixString is in

trie.put(longerString, "longerString");
System.out.println(trie); // both are in

System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but should be 2
System.out.println(trie.prefixMap(prefixString)); //
{c=prefixString} // longerString should be here, but isn't


And again, except flipping it so that the prefix ends with a 1, and the longer string starts
with 1's:

final String prefixString = "" + u0003;
final String longerString = prefixString + u8000; // can also use uffff here, same result

System.out.println(prefixString);
System.out.println(prefixString.length()); // 1

System.out.println(longerString);
System.out.println(longerString.length()); // 2

System.out.println(longerString.startsWith(prefixString)); // true

trie.put(prefixString, "prefixString");
System.out.println(trie); // prefixString is in

trie.put(longerString, "longerString");
System.out.println(trie); // both are in

System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but should be 2
System.out.println(trie.prefixMap(prefixString)); // {c=prefixString}

// longerString should be here, but isn't

The class is honestly pretty complex and I wasn't able to completely debug why this is behaving
badly, but I believe it is because of how you are handling the "bitIndex" and "bitLength".
For example, in the method AbstractPatriciaTrie .subtree(), my comments in bold (this definitely
isn't the only place with weird handling of bitIndex and lengthInBits):

TrieEntry<K, V> subtree(final K prefix, final int offsetInBits, final int lengthInBits)
{
TrieEntry<K, V> current = root.left;
TrieEntry<K, V> path = root;
while(true) {
// If our bit index has only increased by 1, then our bitIndex will never get to be greater
than lengthInBits, we could still start with the prefix, yet we have no way to break out of
here with current being set to the longer string.
if (current.bitIndex <= path.bitIndex || lengthInBits < current.bitIndex)
{ break; }

path = current;
if (!isBitSet(prefix, offsetInBits + current.bitIndex, offsetInBits + lengthInBits))
{ current = current.left; }

else
{ current = current.right; }

}
...

Can you also make AbstractPatriciaTrie public, and your other package level methods into protected
level, that way I don't have to copy the entire class and subclasse's code out into another
class just to extend it?

thank you,
Chris Duncan (github user: VEQRYN)

> PatriciaTrie bugs when only a few bits change
> ---------------------------------------------
>
>                 Key: COLLECTIONS-577
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-577
>             Project: Commons Collections
>          Issue Type: Bug
>          Components: Map
>    Affects Versions: 4.0
>            Reporter: Chris Duncan
>            Priority: Critical
>
> I have a bug report for you, for the class AbstractPatriciaTrie.  It has to do with how
you handle bits when they are very close to each other.  For example, some of your methods
seem to think that if the only difference between a prefix and a longer string, is a single
additional bit, then they are actually the same data.  Or if the only difference is some number
of zero bits, then it also thinks they are the same data.  There are also MANY situations
where the prefixMap does not return all the strings that start with the prefix.
> Take the following code for example, which shows the bugs happening in a very simple
context.
> I create two strings.  The first string is always a prefix of the second string.  Here
are the characters I am using:
>     final char u0000 = Character.toChars(0)[0]; // U+0000 (0000000000000000)
>     final char u0001 = Character.toChars(1)[0]; // U+0001 (0000000000000001)
>     final char u8000 = Character.toChars(32768)[0]; // U+8000 (1000000000000000)
>     final char uffff = Character.toChars(65535)[0]; // U+FFFF (1111111111111111)
>     final char char_b = 'b'; // 1100010
>     final char char_c = 'c'; // 1100011
>     final PatriciaTrie<String> trie = new PatriciaTrie<>();
> And here is a quick example of the trie working correctly, showing how it should work:
>     final String prefixString = "" + char_b;
>     final String longerString = prefixString + char_c;
>     System.out.println(trie.prefixMap(prefixString)); // {b=prefixString, bc=longerString}
// correct!
> In the first example, I show that a character who's bits are all zeros is always mishandled:
>     final String prefixString = "" + char_b;
>     final String longerString = prefixString + u0000;
>     System.out.println(prefixString);
>     System.out.println(prefixString.length()); // 1
>     System.out.println(longerString);
>     System.out.println(longerString.length()); // 2
>     System.out.println(longerString.startsWith(prefixString)); // true
>     trie.put(prefixString, "prefixString");
>     System.out.println(trie); // prefixString is in
>     trie.put(longerString, "longerString");
>     System.out.println(trie); // only longerString shown... prefixString has disappeared
>     System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but should
be 2
>     System.out.println(trie.prefixMap(prefixString)); // {b =longerString} // prefixString
should be here, but isn't
>     trie.put(prefixString, "prefixString");
>     System.out.println(trie); // prefixString is in again, but longerString has now disappeared
>     System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but should
be 2
>     System.out.println(trie.prefixMap(prefixString)); // {b=prefixString} // longerString
should be here, but isn't
> Next, I show that if the longer string is only 1 bit longer (ignoring zeros) than the
prefix string, then the PatriciaTree fails to include it in the prefix map.
> Here the string would look like: 0000000001100011 1000000000000000
>     final String prefixString = "" + char_c;  // you can use any character for the prefix,
same results
>     final String longerString = prefixString + u8000;
>     System.out.println(prefixString);
>     System.out.println(prefixString.length()); // 1
>     System.out.println(longerString);
>     System.out.println(longerString.length()); // 2
>     System.out.println(longerString.startsWith(prefixString)); // true
>     trie.put(prefixString, "prefixString");
>     System.out.println(trie); // prefixString is in
>     trie.put(longerString, "longerString");
>     System.out.println(trie); // both are in
>     System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but should
be 2
>     System.out.println(trie.prefixMap(prefixString)); // {c=prefixString} // longerString
should be here, but isn't
> And again, except flipping it so that the prefix ends with a 1, and the longer string
starts with 1's:
>     final String prefixString = "" + u0003;
>     final String longerString = prefixString + u8000; // can also use uffff here, same
result
>     System.out.println(prefixString);
>     System.out.println(prefixString.length()); // 1
>     System.out.println(longerString);
>     System.out.println(longerString.length()); // 2
>     System.out.println(longerString.startsWith(prefixString)); // true
>     trie.put(prefixString, "prefixString");
>     System.out.println(trie); // prefixString is in
>     trie.put(longerString, "longerString");
>     System.out.println(trie); // both are in
>     System.out.println(trie.prefixMap(prefixString).size()); // prints 1, but should
be 2
>     System.out.println(trie.prefixMap(prefixString)); // {c=prefixString} // longerString
should be here, but isn't
> The class is honestly pretty complex and I wasn't able to completely debug why this is
behaving badly, but I believe it is because of how you are handling the "bitIndex" and "bitLength".
> For example, in the method AbstractPatriciaTrie .subtree(), my comments in bold (this
definitely isn't the only place with weird handling of bitIndex and lengthInBits):
>     TrieEntry<K, V> subtree(final K prefix, final int offsetInBits, final int lengthInBits)
{
>         TrieEntry<K, V> current = root.left;
>         TrieEntry<K, V> path = root;
>         while(true) {
> // If our bit index has only increased by 1, then our bitIndex will never get to be greater
than lengthInBits, we could still start with the prefix, yet we have no way to break out of
here with current being set to the longer string.
>             if (current.bitIndex <= path.bitIndex || lengthInBits < current.bitIndex)
{
>                 break;
>             }
>             path = current;
>             if (!isBitSet(prefix, offsetInBits + current.bitIndex, offsetInBits + lengthInBits))
{
>                 current = current.left;
>             } else {
>                 current = current.right;
>             }
>         }
>       ...
> Can you also make AbstractPatriciaTrie public, and your other package level methods into
protected level, that way I don't have to copy the entire class and subclasse's code out into
another class just to extend it?
> thank you,
> Chris Duncan (github user: VEQRYN)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message