Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 76778 invoked from network); 8 May 2010 23:13:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 May 2010 23:13:18 -0000 Received: (qmail 15084 invoked by uid 500); 8 May 2010 23:13:17 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 15051 invoked by uid 500); 8 May 2010 23:13:17 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 15043 invoked by uid 99); 8 May 2010 23:13:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 May 2010 23:13:17 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 May 2010 23:13:10 +0000 Received: by wwb39 with SMTP id 39so260282wwb.31 for ; Sat, 08 May 2010 16:12:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.168.135 with SMTP id k7mr1197743wel.129.1273360369312; Sat, 08 May 2010 16:12:49 -0700 (PDT) Received: by 10.216.8.132 with HTTP; Sat, 8 May 2010 16:12:49 -0700 (PDT) In-Reply-To: References: <9FDD40D1-30B3-4504-8350-A21157EBB82C@gmail.com> Date: Sat, 8 May 2010 16:12:49 -0700 Message-ID: Subject: Re: Data Modeling Conundrum From: Ed Anuff To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001485f1e2ec7be50b04861d50b5 X-Virus-Checked: Checked by ClamAV on apache.org --001485f1e2ec7be50b04861d50b5 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I was thinking it was going to be a lot more than that, you might want to consider just storing them all as a single serialized array of timestamps and uuids. By my math, you could fit up to 40 uuid/timestamp pairs for under 1K. Then you'd just store something like this: // Row key is userId 12345 : { last_seen : 387587235233, // timestamp of last visit last_uuid: =91256fb890-5a4b-11df-a08a-0800200c9a66=92, history : 0x000....., // serialized array of N timestamp/uuid pairs (24 bytes per pair) } On Sat, May 8, 2010 at 3:54 PM, William Ashley wrote: > That is a good question, because realistically I see N being under 10, an= d > there are no current plans to make use of a large historical record. I co= uld > have the update process pull all columns and issue deletes as necessary s= uch > that only M (M >=3D N) are kept. > > Thanks for the inspiration. > > > On May 8, 2010, at 3:42 PM, Ed Anuff wrote: > > Sorry, missed that. I'm not sure if there's a cleaner way than using the > approaches you've looked at, hopefully someone else has an answer. How b= ig > is N and do you need to keep more than N around? > > On Sat, May 8, 2010 at 10:26 AM, William Ashley wrote= : > >> This would be a solution if I wanted to get the N most recently CREATED >> guids, but I'm interested in the most recently SEEN guids. >> >> > > --001485f1e2ec7be50b04861d50b5 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I was thinking it was going to be a lot more than that, you might want to c= onsider just storing them all as a single serialized array of timestamps an= d uuids.=A0 By my math, you could fit up to 40 uuid/timestamp pairs for und= er 1K.=A0 Then you'd just store something like this:

// Row key is userId
12345 : {
=A0 last_seen : 387587235233, // t= imestamp of last visit
=A0 last_uuid: =91256fb890-5a4b-11df-a08a-0800200= c9a66=92,
=A0 history : 0x000....., // serialized array of N timestamp/u= uid pairs (24 bytes per pair)
}

On Sat, May 8, 2010 at 3:54 PM, William= Ashley <washley@= gmail.com> wrote:
That is a good question, because= realistically I see N being under 10, and there are no current plans to ma= ke use of a large historical record. I could have the update process pull a= ll columns and issue deletes as necessary such that only M (M >=3D N) ar= e kept.

Thanks for the inspiration.


On May 8, 2010, at 3:42= PM, Ed Anuff wrote:

Sorry, missed that.= =A0 I'm not sure if there's a cleaner way than using the approaches= you've looked at, hopefully someone else has an answer.=A0 How big is = N and do you need to keep more than N around?

On Sat, May 8, 2010 at 10:26 AM, William Ashley <<= a href=3D"mailto:washley@gmail.com" target=3D"_blank">washley@gmail.com= > wrote:
This would be a soluti= on if I wanted to get the N most recently CREATED guids, but I'm intere= sted in the most recently SEEN guids.




--001485f1e2ec7be50b04861d50b5--