Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 963D410EFA for ; Fri, 6 Mar 2015 23:07:40 +0000 (UTC) Received: (qmail 10319 invoked by uid 500); 6 Mar 2015 23:07:37 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 10274 invoked by uid 500); 6 Mar 2015 23:07:37 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 10264 invoked by uid 99); 6 Mar 2015 23:07:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Mar 2015 23:07:37 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of graham@vast.com designates 209.85.212.171 as permitted sender) Received: from [209.85.212.171] (HELO mail-wi0-f171.google.com) (209.85.212.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Mar 2015 23:07:30 +0000 Received: by widex7 with SMTP id ex7so6312692wid.3 for ; Fri, 06 Mar 2015 15:06:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; bh=xTgZfaDMh12uedtJjCkx5WxsWEx1215Zh/QIiYE9v0I=; b=R8OfBtAD5cmBNQvStLMCm0OJR0q2GogNMF4aUbEeLw3iuzXvgkgcVa4g2peCRaTznX 9zSm/aZNmMKmp5ctVzeWAdSsC/8uK9zqycD+LPVilcSgtuoDoibfMtH4qOW7c9HPY48V AhA/+XrsU1BJ1bsHMJyJ/BNNYdKlsV4IHX7WNf0MLHx+n5/BEZiAbFugkmHZJJOxzHKS SrRMFvD0BsAS9eTeyvZ07D0Q4vAbLfTc947xcjqxydtGkfIZ5+TAVE15CwIXs5qlE2a5 8YJWXWgOE0U8dA3dHJ6o24NQqXTw0BJmlq1pbIVWm8/zDzoVEiRhqarXGj/R3jC6RpoC sWJA== X-Gm-Message-State: ALoCoQl3ogKIbc/gHpI0ZiksxooIW3i73X2yNePcXoSAq1QJYKB5vU521c9963NYylLndCMTFOGB X-Received: by 10.180.171.35 with SMTP id ar3mr38232004wic.24.1425683184685; Fri, 06 Mar 2015 15:06:24 -0800 (PST) Received: from [192.168.1.112] (cpe-70-113-52-246.austin.res.rr.com. [70.113.52.246]) by mx.google.com with ESMTPSA id l6sm14609575wjx.33.2015.03.06.15.06.22 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 06 Mar 2015 15:06:23 -0800 (PST) From: graham sanderson Content-Type: multipart/signed; boundary="Apple-Mail=_AA2A6093-8392-4FC6-BF83-677D3578BE3B"; protocol="application/pkcs7-signature"; micalg=sha1 Message-Id: <71B4A6DC-CE4D-43F5-8CBE-B6FCCF68475B@vast.com> Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: best practices for time-series data with massive amounts of records Date: Fri, 6 Mar 2015 17:06:20 -0600 References: <1425390004.830307.234875049.55B00794@webmail.messagingengine.com> <1425399055.871834.234948865.2E0D7303@webmail.messagingengine.com> To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.2070.6) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_AA2A6093-8392-4FC6-BF83-677D3578BE3B Content-Type: multipart/alternative; boundary="Apple-Mail=_72A807D8-7108-409D-9F8D-2EC574EAA29A" --Apple-Mail=_72A807D8-7108-409D-9F8D-2EC574EAA29A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Note that using static column(s) for the =E2=80=9Chead=E2=80=9D value, = and trailing TTLed values behind is something we=E2=80=99re considering. = Note this is especially nice if your head state includes say a map which = is updated by small deltas (individual keys) We have not yet studied the effect of static columns on say DTCS > On Mar 6, 2015, at 4:42 PM, Clint Kelly wrote: >=20 > Hi all, >=20 > Thanks for the responses, this was very helpful. >=20 > I don't know yet what the distribution of clicks and users will be, = but I expect to see a few users with an enormous amount of interactions = and most users having very few. The idea of doing some additional = manual partitioning, and then maintaining another table that contains = the "head" partition for each user makes sense, although it would add = additional latency when we want to get say the most recent 1000 = interactions for a given user (which is something that we have to do = sometimes for applications with tight SLAs). >=20 > FWIW I doubt that any users will have so many interactions that they = exceed what we could reasonably put in a row, but I wanted to have a = strategy to deal with this. >=20 > Having a nice design pattern in Cassandra for maintaining a row with = the N-most-recent interactions would also solve this reasonably well, = but I don't know of any way to implement that without running batch jobs = that periodically clean out data (which might be okay). >=20 > Best regards, > Clint >=20 >=20 >=20 >=20 > On Tue, Mar 3, 2015 at 8:10 AM, mck > wrote: >=20 > > Here "partition" is a random digit from 0 to (N*M) > > where N=3Dnodes in cluster, and M=3Darbitrary number. >=20 >=20 > Hopefully it was obvious, but here (unless you've got hot partitions), > you don't need N. > ~mck >=20 --Apple-Mail=_72A807D8-7108-409D-9F8D-2EC574EAA29A Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Note that using static column(s) for the =E2=80=9Chead=E2=80=9D= value, and trailing TTLed values behind is something we=E2=80=99re = considering. Note this is especially nice if your head state includes = say a map which is updated by small deltas (individual keys)

We have not yet studied = the effect of static columns on say DTCS
On = Mar 6, 2015, at 4:42 PM, Clint Kelly <clint.kelly@gmail.com> wrote:

Hi all,

Thanks for the responses, this was very helpful.

I don't know yet what = the distribution of clicks and users will be, but I expect to see a few = users with an enormous amount of interactions and most users having very = few.  The idea of doing some additional manual partitioning, and = then maintaining another table that contains the "head" partition for = each user makes sense, although it would add additional latency when we = want to get say the most recent 1000 interactions for a given user = (which is something that we have to do sometimes for applications with = tight SLAs).

FWIW I doubt that any users will have so many interactions = that they exceed what we could reasonably put in a row, but I wanted to = have a strategy to deal with this.

Having a nice design pattern in = Cassandra for maintaining a row with the N-most-recent interactions = would also solve this reasonably well, but I don't know of any way to = implement that without running batch jobs that periodically clean out = data (which might be okay).

Best regards,
Clint




On Tue, Mar 3, 2015 at 8:10 AM, mck <mck@apache.org> wrote:

> Here "partition" is a random digit from 0 to (N*M)
> where N=3Dnodes in cluster, and M=3Darbitrary number.


Hopefully it was obvious, but here (unless you've got hot = partitions),
you don't need N.
~mck


= --Apple-Mail=_72A807D8-7108-409D-9F8D-2EC574EAA29A-- --Apple-Mail=_AA2A6093-8392-4FC6-BF83-677D3578BE3B Content-Disposition: attachment; filename=smime.p7s Content-Type: application/pkcs7-signature; name=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIICuzCCArcw ggIgAgIBTDANBgkqhkiG9w0BAQUFADCBojELMAkGA1UEBhMCVVMxCzAJBgNVBAgTAk9SMREwDwYD VQQHEwhQb3J0bGFuZDEWMBQGA1UEChMNT21uaS1FeHBsb3JlcjEWMBQGA1UECxMNSVQgRGVwYXJ0 bWVudDEbMBkGA1UEAxMSd3d3LmNvcm5lcmNhc2UuY29tMSYwJAYJKoZIhvcNAQkBFhdibG9ja291 dEBjb3JuZXJjYXNlLmNvbTAeFw0xMTA0MDYxNjE0MzFaFw0yMTA0MDMxNjE0MzFaMIGjMQswCQYD VQQGEwJVUzETMBEGA1UECBMKQ2FsaWZvcm5pYTEWMBQGA1UEBxMNU2FuIEZyYW5jaXNjbzEWMBQG A1UEChMNVmFzdC5jb20gSW5jLjEUMBIGA1UECxMLRW5naW5lZXJpbmcxGTAXBgNVBAMTEEdyYWhh bSBTYW5kZXJzb24xHjAcBgkqhkiG9w0BCQEWD2dyYWhhbUB2YXN0LmNvbTCBnzANBgkqhkiG9w0B AQEFAAOBjQAwgYkCgYEAm4K/W/0VdaOiS6tC1G8tSCAw989XCsJXxVPiny/hND6T0jVv4vP0JRiO vNzH6uoINoKQfgUKa+GCqILdY7Jdx61/WKqxltFTu5D0H8sFFNIKgf9cd3yU6t2susKrxaDXRCul pmcJ3AFg4xuG3ZUZt+XTYhBebQfjwgGQh3/pkQUCAwEAATANBgkqhkiG9w0BAQUFAAOBgQCKW+hQ JqNkPRht5fl8FHku80BLAH9ezEJtZJ6EU9fcK9jNPkAJgSEgPXQ++jE+4iYI2nIb/h5RILUxd1Ht m/yZkNRUVCg0+0Qj6aMT/hfOT0kdP8/9OnbmIp2T6qvNN2rAGU58tt3cbuT2j3LMTS2VOGykK4He iNYYqr+K6sPDHTGCAy0wggMpAgEBMIGpMIGiMQswCQYDVQQGEwJVUzELMAkGA1UECBMCT1IxETAP BgNVBAcTCFBvcnRsYW5kMRYwFAYDVQQKEw1PbW5pLUV4cGxvcmVyMRYwFAYDVQQLEw1JVCBEZXBh cnRtZW50MRswGQYDVQQDExJ3d3cuY29ybmVyY2FzZS5jb20xJjAkBgkqhkiG9w0BCQEWF2Jsb2Nr b3V0QGNvcm5lcmNhc2UuY29tAgIBTDAJBgUrDgMCGgUAoIIB2TAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNTAzMDYyMzA2MjBaMCMGCSqGSIb3DQEJBDEWBBRXzw7g ijCVuOkyqgio0e8uJ+WORjCBugYJKwYBBAGCNxAEMYGsMIGpMIGiMQswCQYDVQQGEwJVUzELMAkG A1UECBMCT1IxETAPBgNVBAcTCFBvcnRsYW5kMRYwFAYDVQQKEw1PbW5pLUV4cGxvcmVyMRYwFAYD VQQLEw1JVCBEZXBhcnRtZW50MRswGQYDVQQDExJ3d3cuY29ybmVyY2FzZS5jb20xJjAkBgkqhkiG 9w0BCQEWF2Jsb2Nrb3V0QGNvcm5lcmNhc2UuY29tAgIBTDCBvAYLKoZIhvcNAQkQAgsxgayggakw gaIxCzAJBgNVBAYTAlVTMQswCQYDVQQIEwJPUjERMA8GA1UEBxMIUG9ydGxhbmQxFjAUBgNVBAoT DU9tbmktRXhwbG9yZXIxFjAUBgNVBAsTDUlUIERlcGFydG1lbnQxGzAZBgNVBAMTEnd3dy5jb3Ju ZXJjYXNlLmNvbTEmMCQGCSqGSIb3DQEJARYXYmxvY2tvdXRAY29ybmVyY2FzZS5jb20CAgFMMA0G CSqGSIb3DQEBAQUABIGASW6TcHvT456fucOdeAIcuIRBAD63djqjPOMRf+nNFhR0JmmYEMwJVqnt pj5967zJ7HCLQJ+cP1ojHfmXupfyaklOthddomDyTEMhmpp40kLEEhHPpLDj/asCoJIqE5bxjACJ vOD2fodIgDAx1p5Mins5XkzcvohQ7oWMVsfPRpEAAAAAAAA= --Apple-Mail=_AA2A6093-8392-4FC6-BF83-677D3578BE3B--