incubator-esme-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Pollak <feeder.of.the.be...@gmail.com>
Subject Re: Statefulness and algorithms for social networks/graphs
Date Mon, 30 Nov 2009 22:24:44 GMT
On Mon, Nov 30, 2009 at 2:00 PM, Markus Kohler <markus.kohler@gmail.com>wrote:

>
> > So, that means that each year, there will be 36,000M (36B) mailbox
> entries.
> >
>
>
> I don't understand why we would need to store all entries in a cache,
> instead of only keeping the last n entries for each user based on some
> heuristics such as the last 3 days or something. I would somehow expect
> that
> the probability that a user wants to see a message is exponentially
> decreasing with the messages age. For example that someone wants to see  a
> message that is the 1000 newest message in his timeline is probably almost
> zero.
>

Some people mine their timelines for information.  I agree that some aging
policy is necessary as 36B entries will consume a lot of storage in RAM or
on disk, but the last 1,000 is likely too few based on what I have seen of
actual user behavior.

In terms of an aging policy in an RDBMS, the cost of aging out old entries
is likely to be an index scan or something on that order (DELETE FROM
mailbox WHERE date < xxx or a user-by-user DELETE WHERE id IN (SELECT
messages > 1000 in mailbox))


>
> > During peak load, we will need to prioritize which Users are processing
> > messages/actions such that the system retains responsiveness and can
> drain
> > the load.  Put another way, knowing which Users have associated
> long-lived
> > sessions allows us to prioritize the message processing for those Users.
> >  We
> > allow more threads to drain the message queues for those Users while
> > providing fewer threads for session-less Users.  Yeah, we could
> prioritize
> > on other heuristics, but long-lived session is dead simple and will cost
> us
> > 5K bytes per logged in user.  Not a huge cost and lots of benefit.
> >
> >
> I have no issue with some session state and 5K is really low, and therefore
> this is not an issue.  I don't get why it has to be in the session's state
> because you could as well use the information that a user is online as a
> guidance, even if the state would be stored somewhere out of the session.
> Wouldn't make a difference I guess and storing it in the session looks
> natural.
>

The state itself is not in the session.  The session is the guide that the
user is online.  The session contains a listener that is attached to the
User.  The only real state that resides in the session is the state
necessary to batch up any messages that the User has forwarded to the
listener in between the HTTP polling requests.  If there is an HTML front
end, state about that front end will reside in the session as well, but
that's a different issue.


>
>
> > So, between the existing long-lived session long polling is more
> efficient
> > than shortlived session repeated polling and the upcoming need for
> message
> > prioritization indicate that long-lived sessions are the right design
> > choice.
> >
> > Also, I hope that the above discussion makes it clear why I am insistent
> on
> > message-oriented APIs rather than document/REST oriented APIs.  ESME's
> > design is not traditional and there are fewer tools helping us get the
> > implementation right.  On the other hand, implementing ESME on top of a
> > relational/REST model cannot be done.  Let's keep our design consistent
> > from
> > the APIs back.
> >
> >
> I'm really not religious about REST, but I would somehow assume that in an
> Enterprise context it could be an requirement to send a link to someone
> else
> pointing to a specific potentially old message in a certain Pool.



Yes.  That's perfectly reasonable.  That message is like a static file on
disk.  Once it's written, it remains unchanged until it's deleted.  This is
an ideal application of a REST-style approach.  That's why I've advocated
for a "message based" approach first, but a REST/static approach when the
message based approach doesn't make sense.  What I am opposed to is a "try
to make everything fit the REST model" approach to API design.


> That
> sounds to me like a requirement for some kind of REST API.
> Would it be costly in your model to get the message nr. X  (+ n  older
> messages) in a users timeline?.
>

A message will exist outside of a timeline.  There exists a cache of
recently accessed messages.  Sometimes there will be a historic message that
is referenced and that will be materialized from backing store and rendered.
 It will likely fall out of cache if it's historical and not accessed again.

Thanks,

David


>
> Regards,
> Markus
>
>
>
> > Thanks,
> >
> > David
> >
> > --
> > Lift, the simply functional web framework http://liftweb.net
> > Beginning Scala http://www.apress.com/book/view/1430219890
> > Follow me: http://twitter.com/dpp
> > Surf the harmonics
> >
>



-- 
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Surf the harmonics

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message