commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Zeigermann <oliver.zeigerm...@gmail.com>
Subject Re: [Collections] DFAMap? (LONG)
Date Wed, 27 Oct 2004 14:34:23 GMT
After discussion and more tests in the Slide group I have abandoned
the approach.

All this does not work for realistic scenarios and realistic sizes.
Main problem is that if the keys do not share much common prefixes the
size of states gets enormously and impracticably high.

Sources still can be found in the Slide CVS.

Cheers and thanks for the attention,

Oliver

On Tue, 19 Oct 2004 07:30:44 +0200, Oliver Zeigermann
<oliver.zeigermann@gmail.com> wrote:
> Folks,
> 
> while thinking about cache invalidation strategies in the Slide
> project a long abandoned idea come back to my mind: a DFA based map
> for String keys only! When I implemented a scetch of this once it
> turned out to be slower in all respects than hash maps. These are my
> old measurements:
> 
> DFA: 100000 puts: 5,7 secs / 100000 gets: 1 sec
> HashMap: 100000 puts: 3,2 secs / 100000 gets: .661 sec
> 
> Even hand tuned versions could not catch up (no measurements - I guess
> I was too frustrated at that time).
> 
> Anyway, what I need now is a map *plus* something that returns all
> values of the keys that have a certain prefix. E.g. if I have three
> entries like
> 
> olli -> v1
> olli2 -> v2
> ollo -> v3
> 
> the prefix of "oll" would get me v1,v2,v3; "olli" would get me v1,v2; etc.
> 
> Now when the number of entries is small you could just iterate over
> the entries and do something like a startsWith and collect the values.
> In a scenario with thousands or even millions of entries with frequent
> checks this is prohibitive, though.
> 
> Coming to the point now, when I organise the look up as a DFA the
> search space for prefixed values should be dramatically decreased! The
> map DFA would be some sort of  tree and classes might look like this.
> 
> class DFAState {
>   Object value;
>   Edge[] mapping;
> }
> 
> class Edge {
>  char label;
>  DFAState nextState;
> }
> 
> Every state would have an associated value and a number of outgoing
> edges to other states. Upon lookup they can be traversed when the next
> character is the labeled one. There can at most be one character per
> label, so this is deterministic.
> 
> If there is no such label then the look fails. If the whole key has
> been matched the look up value is the one associated with the DFA.
> Those states are called end states. All other states have no
> associated value.
> 
> Applied to the example above the DFA might look like this. s{i} are
> state names, |s{i}| are end state, - are edges. Above and below the
> edges, I have put the labels.
> 
>      o       l       l       i       2
> s1  -  s2  - s3  - s4  - |s5|  - |s6|
>                           \
>                          o |s7|
> 
> (hope this makes sense to anyone)
> 
> Now, suppose the prefix is "oll". Traversing the DFA would bring me to
> s4. Doing a simple depth first search from there for all reachable end
> states (s5,s6,s7) will return me all entries having this prefix, i.e.
> v1,v2,v3. "olli" would bring me to s5 and the end states reachable
> from there (s5,s6) will bring me the v1,v2.
> 
> And there I am...
> 
> In case anyone has read this through, does it make sense? Or should I
> start taking those pills?
> 
> Oliver
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message