commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Zeigermann <oliver.zeigerm...@gmail.com>
Subject [Collections] DFAMap? (LONG)
Date Tue, 19 Oct 2004 05:30:44 GMT
Folks,

while thinking about cache invalidation strategies in the Slide
project a long abandoned idea come back to my mind: a DFA based map
for String keys only! When I implemented a scetch of this once it
turned out to be slower in all respects than hash maps. These are my
old measurements:

DFA: 100000 puts: 5,7 secs / 100000 gets: 1 sec
HashMap: 100000 puts: 3,2 secs / 100000 gets: .661 sec 

Even hand tuned versions could not catch up (no measurements - I guess
I was too frustrated at that time).

Anyway, what I need now is a map *plus* something that returns all
values of the keys that have a certain prefix. E.g. if I have three
entries like

olli -> v1
olli2 -> v2
ollo -> v3

the prefix of "oll" would get me v1,v2,v3; "olli" would get me v1,v2; etc.

Now when the number of entries is small you could just iterate over
the entries and do something like a startsWith and collect the values.
In a scenario with thousands or even millions of entries with frequent
checks this is prohibitive, though.

Coming to the point now, when I organise the look up as a DFA the
search space for prefixed values should be dramatically decreased! The
map DFA would be some sort of  tree and classes might look like this.

class DFAState {
  Object value;
  Edge[] mapping;
}

class Edge {
 char label;
 DFAState nextState;
}

Every state would have an associated value and a number of outgoing
edges to other states. Upon lookup they can be traversed when the next
character is the labeled one. There can at most be one character per
label, so this is deterministic.

If there is no such label then the look fails. If the whole key has
been matched the look up value is the one associated with the DFA.
Those states are called end states. All other states have no
associated value.

Applied to the example above the DFA might look like this. s{i} are
state names, |s{i}| are end state, - are edges. Above and below the
edges, I have put the labels.

     o       l       l       i       2
s1  -  s2  - s3  - s4  - |s5|  - |s6|
                          \ 
                         o |s7|  

(hope this makes sense to anyone)

Now, suppose the prefix is "oll". Traversing the DFA would bring me to
s4. Doing a simple depth first search from there for all reachable end
states (s5,s6,s7) will return me all entries having this prefix, i.e.
v1,v2,v3. "olli" would bring me to s5 and the end states reachable
from there (s5,s6) will bring me the v1,v2.

And there I am...

In case anyone has read this through, does it make sense? Or should I
start taking those pills?

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message