Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 20220 invoked from network); 25 Mar 2010 14:36:16 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 25 Mar 2010 14:36:16 -0000 Received: (qmail 15514 invoked by uid 500); 25 Mar 2010 09:09:36 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 15421 invoked by uid 500); 25 Mar 2010 09:09:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 15410 invoked by uid 99); 25 Mar 2010 09:09:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Mar 2010 09:09:35 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erezef@gmail.com designates 74.125.82.44 as permitted sender) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Mar 2010 09:09:30 +0000 Received: by wwg30 with SMTP id 30so4665779wwg.31 for ; Thu, 25 Mar 2010 02:09:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=cIClvaLgOSPlI3gusolHJBvGC8mT5fE1Agbstvzkygs=; b=YbLi+GgOaZnbFQlENvmG4FwtXVpxW9Eni7nltaRD32S3ur4+YhU2jSlvcHO8ym58aQ 6ImMRGlfPxrdTfbsNQMb2UpZ9oXj28uoqhAnQFx7oxcDyPkeyUNNMI+isvhI1UFDF3Dj Qbi/J1gcqaGcRsCNLGvIWKVmFiPefyvae+qXU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=nodQmhTKxgVmalueHGs4ksziXBugVQjBFVwEapHuUMV6gy6yDkyCyGgbpAnXbIS9Nb 0sySr8qn/8X60IH5r7V/rXJ5H9orSxqwBxKGggsP5AjLbvR+ZX8iWeUkGvrGxRHOZZlU Q8ozURXBwA18MJSy/fbsSOiKBWDNVYfdBjr+A= MIME-Version: 1.0 Received: by 10.216.88.136 with SMTP id a8mr1225747wef.77.1269508147824; Thu, 25 Mar 2010 02:09:07 -0700 (PDT) In-Reply-To: <36fe576b1003250203k3378f801h4f2e29539b42e29b@mail.gmail.com> References: <36fe576b1003241535j2f85a993l82ddba5620f06b8c@mail.gmail.com> <36fe576b1003250203k3378f801h4f2e29539b42e29b@mail.gmail.com> Date: Thu, 25 Mar 2010 11:09:07 +0200 Message-ID: Subject: Re: Model Question From: Erez Efrati To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6d64c2330c94f04829c6655 --0016e6d64c2330c94f04829c6655 Content-Type: text/plain; charset=ISO-8859-1 You are correct Chris. I am a newbie too in this field. I like the Cassandra/NoSQL way and I am trying to see if it can fit my model. Thanks, Erez On Thu, Mar 25, 2010 at 11:03 AM, Christopher Brind < christopher.brind@googlemail.com> wrote: > Hi, > > I wondered if you were eluding to something more complex. You'd probably > want to create a index using something along the lines that Peter suggested. > :) > > But I'm a Cassandra / Column DB newbie, so my experience ends just about > ... here. :) > > Cheers, > Chris > > > On 25 March 2010 08:59, Erez Efrati wrote: > >> Hi Chris, >> >> So, if I get it right, you suggest that I pull all the columns for in a >> single row and do the sorting client side? >> The user-friends-messages was just an example and maybe not the best I >> could come up with cause I agree that there are not too many friends in >> general that send you messages. >> >> What I wanted to keep track of companies and user-visit count. Each >> company can have potentially millions of users. Then for each company I want >> to display in pages from the top visiting user to the least one. >> Would you still upload the whole company row columns and sort it on the >> client? >> How do keep updating the visits? >> >> Thanks, >> Erez >> >> On Thu, Mar 25, 2010 at 12:35 AM, Christopher Brind < >> christopher.brind@googlemail.com> wrote: >> >>> Hi Erez, >>> >>> Don't know how many friends a user in your system is likely to have, but >>> are they likely to have received so many messages from friends that you >>> can't sort it in your client app? >>> >>> See: >>> >>> http://java.sun.com/j2se/1.4.2/docs/api/java/util/Collections.html#sort(java.util.List) >>> >>> Assuming the user has 10,000 friends (I'm sure I don't even know 10,000 >>> people :) with Java's Collections.sort which guarantees performance of O(n >>> log(n)) let's say it takes 1ms to process each item, you're looking at >>> 40,000ms to do a sort plus a little overhead to avoid the O( n2 log(n)) - >>> that's 40 seconds to sort for 10,000 friends... >>> >>> On Facebook I have 363 friends that's 929ms + overhead, i.e. around a >>> second. Apparently the average Facebook user has 130 friends: >>> http://www.facebook.com/press/info.php?statistics >>> >>> So I can't imagine the sort exceeding much more than a second or so >>> except for the most popular users - in practice I would hope sub-second >>> easily. Does that help? Or is there something special happening in your >>> system? >>> >>> Cheers, >>> Chris >>> >>> >>> >>> On 24 March 2010 20:36, Erez Efrati wrote: >>> >>>> Hi, >>>> >>>> I can't figure out how to use model the following using column family >>>> and the way the columns are sorted (by their name). >>>> >>>> Lets say I have a list of users and for each user I wish to display a >>>> list of all the friends he has ordered by the number of messages they sent >>>> him so far (desc from most to least). >>>> >>>> I can't see how this is going to work since the columns sorting is >>>> always by the name of the column and not its value. I thought of having a >>>> row for each user and the columns will be the friends that email him. But >>>> the column name needs to be the number of messages to be sorted and the >>>> value will be the friend's user ID. But then, when a friend is sending a >>>> message to another user how do I increment his count of message he sent so >>>> far to that user? >>>> >>>> How can I model this with Cassandra? Is it possible? >>>> >>>> Thanks in advance, >>>> >>>> Erez Efrati >>>> >>> >>> >> > --0016e6d64c2330c94f04829c6655 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
You are correct Chris.

I am a newbie to= o in this field.=A0
I like the Cassandra/NoSQL way and I am tryin= g to see if it can fit my model.

Thanks,
Erez

On Thu, Mar 25, 2010 at 11:03 A= M, Christopher Brind <christopher.brind@googlemail.com> wrote:<= br>
H= i,

I wondered if you were=A0eludi= ng=A0to something more complex. =A0 You'd probably want to create a ind= ex using something along the lines that Peter suggested. :)

But I'm a Cassandra / Column DB new= bie, so my experience ends just about ... here. :)

Cheers,<= /div>
Chris
=


On 25 March 2010 08:59, Erez Efrati <erezef@gmail.com> wrote:=
Hi Chris,

So, if I get it right, you su= ggest that I pull all the columns for in a single row and do the sorting cl= ient side?
The user-friends-messages was just an example and mayb= e not the best I could come up with cause I agree that there are not too ma= ny friends in general that send you messages.

What I wanted to keep track of companies and user-visit= count. Each company can have potentially millions of users. Then for each = company I want to display in pages from the top visiting user to the least = one.=A0
Would you still upload the whole company row columns and sort it on th= e client?
How do keep updating the visits?

Thanks,
Erez=A0

On Thu, Mar 25, 2010 at 12:35 AM, Christopher Br= ind <christopher.brind@googlemail.com> wrote:=
Hi Erez,

Don't know how many friends a user in your system is likel= y to have, but are they likely to have received so many messages from frien= ds that you can't sort it in your client app? =A0

See:

Assuming the user has 10,000 friend= s (I'm sure I don't even know 10,000 people :) with Java's Coll= ections.sort which guarantees performance of O(n log(n)) let's say it t= akes 1ms to process each item, you're looking at 40,000ms to do a sort = plus a little overhead to avoid the O(=A0n2=A0log(n))=A0=A0- that's 40 second= s to sort for 10,000 friends...=A0

On Facebook I have 363 friends that= 's 929ms + overhead, i.e. around a second. =A0Apparently the average Fa= cebook user has 130 friends:

So I can't imagine the s= ort=A0exceeding much more than a second or so except for the most popular u= sers - in practice I would hope sub-second easily. =A0Does that help? =A0Or= is there something special happening in your system?

Cheers,
Chris



O= n 24 March 2010 20:36, Erez Efrati <erezef@gmail.com> wrote:<= br>
Hi,

I ca= n't figure out how to use model the following using column family and t= he way the columns are sorted (by their name).

Lets say I have a list of users and for each user I wis= h to display a list of all the friends he has ordered by the number of mess= ages they sent him so far (desc from most to least).

I can't see how this is going to work since the col= umns sorting is always by the name of the column and not its value. I thoug= ht of having a row for each user and the columns will be the friends that e= mail him. But the column name needs to be the number of messages to be sort= ed and the value will be the friend's user ID. But then, when a friend = is sending a message to another user how do I increment his count of messag= e he sent so far to that user?

How can I model this with Cassandra? Is it possible?

Thanks in advance,

Erez Efrati




--0016e6d64c2330c94f04829c6655--