hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: Modelling threaded messages
Date Sat, 02 Apr 2011 04:56:47 GMT
Not original with me, I have to admit.

Some of the ideas are best described in the OpenTSDB descriptions.

On Fri, Apr 1, 2011 at 8:01 PM, M. C. Srivas <mcsrivas@gmail.com> wrote:

> Ted, this is a pretty clever idea.
>
>
> On Thu, Mar 31, 2011 at 9:27 PM, Ted Dunning <tdunning@maprtech.com>wrote:
>
>> Solr/Elastic search is a fine solution, but probably won't be quite as
>> fast
>> as a well-tuned hbase solution.
>>
>> One key assumption you seem to be making is that you will store messages
>> only once.  If you are willing to make multiple updates to tables, then
>> you
>> can arrange the natural ordering of the table to get what you want.  For
>> instance, you could keep the most recent messages (say the last 10 from
>> each
>> of the 1000 most recently updated threads) in an in memory table.  Then
>> you
>> could store messages in a thread table indexed by thread:timestamp.
>>  Finally
>> you could store messages in a table indexed by user:thread or
>> user:timestamp.  This would allow you to display the most recent messages
>> or
>> thread in near zero time, to display all or the most recent messages from
>> a
>> particular thread with only one retrieval and all of the messages from a
>> particular user in time order in one retrieval.
>>
>> On Thu, Mar 31, 2011 at 5:56 PM, Mark Jarecki <mjarecki@bigpond.net.au
>> >wrote:
>>
>> > Hi all,
>> >
>> > I'm modelling a schema for storing and retrieving threaded messages,
>> where,
>> > for planning purposes:
>> >
>> >        - there are many millions of users.
>> >        - a user might have up to 1000 threads.
>> >        - each thread might have up to 50000 messages (with some threads
>> > being sparse with only a few messages).
>> >        - the Stargate REST interface is used.
>> >
>> > I want to be able to execute the following queries:
>> >
>> >        - retrieve x latest active threads, with the latest message.
>> >        - retrieve x latest active threads, with the latest message,
>> offset
>> > by y.
>> >        - retrieve x latest messages from a thread.
>> >        - retrieve x latest messages from a thread, offset by y.
>> >
>> > I've come up with a few possible methods for modelling this. But any
>> > insights would be greatly appreciated.
>> >
>> > Thanks in advance,
>> >
>> > Mark
>> >
>> >
>> > Possible solution 1:
>> >
>> > TABLE:          threads
>> > KEY:            userID : threadID
>> > COLUMN:         latest_message
>> >
>> > TABLE:          messages
>> > KEY:            userID : threadID : timestamp
>> > COLUMN:         message
>> >
>> > Messages are first written to the messages table, and then the threads
>> > table's thread is updated with the latest message.
>> >
>> > To fetch the latest x active threads, with the latest message:
>> >
>> >        - I retrieve all threads and then sort and reduce the results on
>> the
>> > client.
>> >
>> > A concern with this is the fetching of all threads to sort on each
>> request.
>> > This could be unwieldy!
>> >
>> >
>> > Possible solution 2:
>> >
>> > TABLE:          threads
>> > KEY:            userID : timestamp : threadID
>> > COLUMN:         latest_message
>> >
>> > TABLE:          messages
>> > KEY:            userID : threadID : timestamp
>> > COLUMN:         message
>> >
>> > Messages are first written to the messages table, and then the threads
>> > table's is updated with the latest message. The previous latest message
>> is
>> > then deleted from the threads table.
>> >
>> > To fetch the latest x active threads, with the latest message:
>> >
>> >        - I scan the threads table until I get x unique threads.
>> >
>> > A concern with this could be the issue of keeping the threads table in
>> sync
>> > with the messages table - especially with the deletion of old latest
>> > messages.
>> >
>> >
>> > Possible solution 3:
>> >
>> > TABLE:          messages
>> > KEY:            userID : timestamp : threadID
>> > COLUMN:         message
>> >
>> > To fetch the latest x active threads, with the latest message:
>> >
>> >        - I scan the messages table until I get x unique threads.
>> >
>> > One of my concerns with this method is that some threads will be busier
>> > than others, forcing a scan through nearly all of a user's messages. And
>> > there will be an ever increasing number of messages. A periodic
>> archiving
>> > process - moving older messages to another table - might alleviate
>> things
>> > here.
>> >
>> >
>> > Possible solution 4:
>> >
>> > Use SOLR/Elastic search or equivalent.
>> >
>> >
>> >
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message