hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Jarecki <mjare...@bigpond.net.au>
Subject Modelling threaded messages
Date Fri, 01 Apr 2011 00:56:23 GMT
Hi all,

I'm modelling a schema for storing and retrieving threaded messages, where, for planning purposes:

	- there are many millions of users.
	- a user might have up to 1000 threads.
	- each thread might have up to 50000 messages (with some threads being sparse with only a
few messages).
	- the Stargate REST interface is used. 

I want to be able to execute the following queries:

	- retrieve x latest active threads, with the latest message.
	- retrieve x latest active threads, with the latest message, offset by y.
	- retrieve x latest messages from a thread.
	- retrieve x latest messages from a thread, offset by y.

I've come up with a few possible methods for modelling this. But any insights would be greatly
appreciated.

Thanks in advance,

Mark


Possible solution 1:

TABLE: 		threads
KEY: 		userID : threadID
COLUMN: 	latest_message

TABLE: 		messages
KEY: 		userID : threadID : timestamp
COLUMN: 	message

Messages are first written to the messages table, and then the threads table's thread is updated
with the latest message.

To fetch the latest x active threads, with the latest message:
	
	- I retrieve all threads and then sort and reduce the results on the client.

A concern with this is the fetching of all threads to sort on each request. This could be
unwieldy!


Possible solution 2:

TABLE: 		threads
KEY: 		userID : timestamp : threadID
COLUMN: 	latest_message

TABLE: 		messages
KEY: 		userID : threadID : timestamp
COLUMN: 	message

Messages are first written to the messages table, and then the threads table's is updated
with the latest message. The previous latest message is then deleted from the threads table.

To fetch the latest x active threads, with the latest message:
	
	- I scan the threads table until I get x unique threads.

A concern with this could be the issue of keeping the threads table in sync with the messages
table - especially with the deletion of old latest messages.


Possible solution 3:

TABLE: 		messages
KEY: 		userID : timestamp : threadID
COLUMN: 	message

To fetch the latest x active threads, with the latest message:
	
	- I scan the messages table until I get x unique threads.

One of my concerns with this method is that some threads will be busier than others, forcing
a scan through nearly all of a user's messages. And there will be an ever increasing number
of messages. A periodic archiving process - moving older messages to another table - might
alleviate things here.


Possible solution 4:

Use SOLR/Elastic search or equivalent.





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message