incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hien. To Trong" <hie...@vng.com.vn>
Subject Load balancing - Order preserving partition - New Keys prob.
Date Thu, 16 Sep 2010 07:24:23 GMT
Hi all,
Thanks you for your comments.
I already read some papers about load balancing in P2P which some of them allow range query.
During this time, I find another problem: "NEW KEYS"

Karger - Simple efficient load balancing algorithms for p2p systems. (support OPP)
John Byers - Simple load balancing for distributed hash table.
Third: Ananth Rao - Load balancing in structured P2P systems.
I think the first paper is the best to get the background.

Prefix Hash Tree: An Indexing Data Structure over Distributed Hash Tables
and Range queries over DHTs
These paper use PHT to allow range query.

James Aspnes - Skip Graphs (most recent as I known)
and Distributed Balanced Tables, Not making a hash of it All.

All of these papers deal with load balancing and range query probs by creating schemes or
strategies 
based on only "load - number of key on each nodes", not care about new keys and highly-accessed
keys. 
HOWEVER, in fact, "the most recent keys are likely accessed more frequently".

I suppose, we have 400.000 "NEW KEYS" in 2 recent days (are likely accessed more frequently).
--> A scheme: these new keys are uniformly partitioned and divided into some successive
nodes. For example, there are 10 node (N1...N10), keys in day 1 will be put into node_1_2,
keys in day 2 will be put into node_3_4 and so on... But, the prob is that number of keys
and nodes increase/decrease day by day (not fixed) --> the number of node used to store
keys in each day may increase/decrease (1 or 3 node for example).

Does any one have any ideas or know papers to deal with this prob (new keys and highly-accessed
keys)?

Thanks a lot.
Mime
View raw message