Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4532540FE for ; Sat, 9 Jul 2011 21:16:51 +0000 (UTC) Received: (qmail 74724 invoked by uid 500); 9 Jul 2011 21:16:49 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 74441 invoked by uid 500); 9 Jul 2011 21:16:48 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 74424 invoked by uid 99); 9 Jul 2011 21:16:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jul 2011 21:16:48 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [129.13.185.201] (HELO scc-mailout.scc.kit.edu) (129.13.185.201) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jul 2011 21:16:41 +0000 Received: from vpn-cl-192-35.scc.kit.edu (vpn-cl-192-35.scc.kit.edu [141.3.192.35]) by scc-mailout-01 with esmtpsa (Exim 4.72 #1) id 1Qfest-0008NI-MR; Sat, 09 Jul 2011 23:16:21 +0200 From: =?iso-8859-1?Q?G=FCnter_Ladwig?= Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: multipart/signed; boundary=Apple-Mail-42--920993613; protocol="application/pkcs7-signature"; micalg=sha1 Subject: Re: Storing single rows on multiple nodes Date: Sun, 10 Jul 2011 00:16:11 +0300 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: <7E285F14-057A-408F-ACCB-1F5F7C805133@kit.edu> X-Mailer: Apple Mail (2.1082) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-42--920993613 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Hi, On 09.07.2011, at 23:37, Dan Kuebrich wrote: > Perhaps I misunderstand your proposal, but it seems that even with = your manual key placement schemes, the row would still be huge, no = matter what node it gets placed on. A better solution might be figuring = out how to make each row into a few smaller ones to get better balancing = of load and also faster reads. I probably could have been more clear. The idea is to randomly choose = one node among several for a single key each time some data is added to = a row. Say, a particular key is normally assigned to node n. Then, for = each write to that key, we randomly choose one of the nodes n, n+1, n+2, = ..., n+k to write the data (or we could choose the node with the least = load). Of course, if one wants to read all data for that key, all these = nodes have to be queried, because each node will store a chunk of the = data for the key. > - Can you segment the column(s) of the row into different, = predictably-named rows? Yes and no. We actually do not know in advance which of the rows will be = the ones that grow that large. However, and this is the second solution = I described, it would be possible to randomly choose some suffix that is = added when some data is added to a key. For example, we might have a key = "abc" and a predefined list of suffixes (1, 2, 3). When adding data = instead of writing to key "abc", we randomly choose one of the suffixes = and then, for example, write to "abc1". Of course, the number of = suffixes determines how high the probability is that the modified keys = will actually be stored on different nodes. > - Or segment into different rows and use a secondary index to find the = rows that are part of a particular RDF? This is actually something I hadn't looked at yet, thanks for the = pointer! > - And/or compress the RDF data (maybe you're already doing that) to = reduce the impact of large rows? While compression would certainly help, it does not really change the = underlying problem, just delay its effect. Actually, I hope that = Cassandra itself will at some point take care of compression ;) Another = problem is that you can't actually just increase cluster size to scale = to larger datasets, because the constraint is the disk space on single = nodes. Cheers, G=FCnter >=20 > On Sat, Jul 9, 2011 at 4:27 PM, G=FCnter Ladwig = wrote: > Hi all, >=20 > we are currently looking at using Cassandra to store highly skewed RDF = data. With the indexes we use it may happen that a single row contains = up to 20% of the whole dataset, meaning that it can grow larger than = available disk space on single nodes. In [1], it says that this = limitation is not likely to change in the future, but I was wondering if = anybody has looked at this problem? >=20 > One thing that comes to mind is a simple approach to DHT = load-balancing [2], where keys are assigned to one node of several = random alternatives (which means that for reading, all these nodes have = to be queried). This is a bit similar to replication, except, of course, = that only one copy of the data is stored. As this would require changes = to the Cassandra code base, we could "simulate" this by randomly = choosing one of several predefined suffixes and appending it to a key = before storing it. By modifying a key this way, we could be somewhat = sure that it will be stored at a different node. The first solution = would certainly be preferable. >=20 > Any thoughts or experiences? Failing that, maybe someone can give me a = pointer into the Cassandra code base, where something like the [2] = should be implemented. >=20 > Cheers, > G=FCnter >=20 > [1] http://wiki.apache.org/cassandra/CassandraLimitations > [2] Byers at el.: Simple Load Balancing for Distributed Hash Tables, = http://www.springerlink.com/content/r9r4qcqxc2bmfqmr/ >=20 > -- >=20 > Dipl.-Inform. G=FCnter Ladwig >=20 > Karlsruhe Institute of Technology (KIT) > Institute AIFB >=20 > Englerstra=DFe 11 (Building 11.40, Room 250) > 76131 Karlsruhe, Germany > Phone: +49 721 608-47946 > Email: guenter.ladwig@kit.edu > Web: www.aifb.kit.edu >=20 > KIT =96 University of the State of Baden-W=FCrttemberg and National = Large-scale Research Center of the Helmholtz Association >=20 >=20 -- =20 Dipl.-Inform. G=FCnter Ladwig Karlsruhe Institute of Technology (KIT) Institute AIFB Englerstra=DFe 11 (Building 11.40, Room 250) 76131 Karlsruhe, Germany Phone: +49 721 608-47946 Email: guenter.ladwig@kit.edu Web: www.aifb.kit.edu KIT =96 University of the State of Baden-W=FCrttemberg and National = Large-scale Research Center of the Helmholtz Association --Apple-Mail-42--920993613 Content-Disposition: attachment; filename=smime.p7s Content-Type: application/pkcs7-signature; name=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFZjCCBWIw ggRKoAMCAQICBBATXTwwDQYJKoZIhvcNAQEFBQAwgb8xCzAJBgNVBAYTAkRFMRswGQYDVQQIExJC YWRlbi1XdWVydHRlbWJlcmcxEjAQBgNVBAcTCUthcmxzcnVoZTEqMCgGA1UEChMhS2FybHNydWhl IEluc3RpdHV0ZSBvZiBUZWNobm9sb2d5MScwJQYDVQQLEx5TdGVpbmJ1Y2ggQ2VudHJlIGZvciBD b21wdXRpbmcxDzANBgNVBAMTBktJVC1DQTEZMBcGCSqGSIb3DQEJARYKY2FAa2l0LmVkdTAeFw0x MDA1MTkxMjU1MDlaFw0xMzA1MTgxMjU1MDlaMGoxCzAJBgNVBAYTAkRFMSowKAYDVQQKEyFLYXJs c3J1aGUgSW5zdGl0dXRlIG9mIFRlY2hub2xvZ3kxFjAUBgNVBAsTDUluc3RpdHV0IEFJRkIxFzAV BgNVBAMTDkd1ZW50ZXIgTGFkd2lnMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxXvV yGRj1MKYWV8T31fkCIAXPyKEI1TLWmeyi5TwOGjXyjvqMBIYjcJ5Uz6fIXRuEIAIiYEkAMm2V5ex UmPkgxoDsuBsc2UPgW1YlQ9rQAa/2IqrEeKAU4KTJm5uxshx7GiVna3I6OJWDTQtLseMko/sG08j sxkDsi6cysbrY8HNkTt3Sscf1bF+/2R+0deyCJaUIRVavzRSwcO6RdLVGRETh2fP18csM7vnSIpU fjZbvauxbQeGA8cJhtNDWS9LkUQB5O3t4O8kMh3Mwh8kbd0xmR686YseOD6EcDXTq/xtJTIuMLbE kBrA6T1mMGm9+2Y6lXSnNRR/UsOlCVRIyQIDAQABo4IBuDCCAbQwCQYDVR0TBAIwADALBgNVHQ8E BAMCBeAwKQYDVR0lBCIwIAYIKwYBBQUHAwIGCCsGAQUFBwMEBgorBgEEAYI3FAICMB0GA1UdDgQW BBTr4mpDDoWy90/aL3G8SJILQefJFjAfBgNVHSMEGDAWgBQfdGX0mh169jHp32EbcysNbdAzSTAh BgNVHREEGjAYgRZndWVudGVyLmxhZHdpZ0BraXQuZWR1MHcGA1UdHwRwMG4wNaAzoDGGL2h0dHA6 Ly9jZHAxLnBjYS5kZm4uZGUva2l0LWNhL3B1Yi9jcmwvY2FjcmwuY3JsMDWgM6Axhi9odHRwOi8v Y2RwMi5wY2EuZGZuLmRlL2tpdC1jYS9wdWIvY3JsL2NhY3JsLmNybDCBkgYIKwYBBQUHAQEEgYUw gYIwPwYIKwYBBQUHMAKGM2h0dHA6Ly9jZHAxLnBjYS5kZm4uZGUva2l0LWNhL3B1Yi9jYWNlcnQv Y2FjZXJ0LmNydDA/BggrBgEFBQcwAoYzaHR0cDovL2NkcDIucGNhLmRmbi5kZS9raXQtY2EvcHVi L2NhY2VydC9jYWNlcnQuY3J0MA0GCSqGSIb3DQEBBQUAA4IBAQAsu7j7JTw/LGE2NqekzbimXN07 sfZXc9VxzFXGCoPBc+jEXb8zwCBd8vMCairVC3MWwWDhkI/UziYuXL+f5RkH8ypo46q6xO+LLcOh 5+ZIIjBZqp3wB56FnjOSzV3zKEmTr7SGRkP8wRVq5dSYE17LcxWhva5noWOHBSGtzdK9NAx9AA1m 2GqKcYY5Fo2lJJl7tmuyFF42KvFVnlLneMUAqQDVRdMYa/1DQN8HJuC3VohJFCceLXfKhaeGTjTe z+I6kanLeRX2x3tnIQAVPlrFFztXzcslRCwhAAerLa+t1AX4HFciNv43TTkZ3JYlzzNyc0gDHXSP Wa1oMffjSLdEMYIECzCCBAcCAQEwgcgwgb8xCzAJBgNVBAYTAkRFMRswGQYDVQQIExJCYWRlbi1X dWVydHRlbWJlcmcxEjAQBgNVBAcTCUthcmxzcnVoZTEqMCgGA1UEChMhS2FybHNydWhlIEluc3Rp dHV0ZSBvZiBUZWNobm9sb2d5MScwJQYDVQQLEx5TdGVpbmJ1Y2ggQ2VudHJlIGZvciBDb21wdXRp bmcxDzANBgNVBAMTBktJVC1DQTEZMBcGCSqGSIb3DQEJARYKY2FAa2l0LmVkdQIEEBNdPDAJBgUr DgMCGgUAoIICFzAYBgkqhkiG9w0BCQMxCwYJKoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xMTA3 MDkyMTE2MTJaMCMGCSqGSIb3DQEJBDEWBBSJqJkxUwFdsMETik7c0BGyx+uBfzCB2QYJKwYBBAGC NxAEMYHLMIHIMIG/MQswCQYDVQQGEwJERTEbMBkGA1UECBMSQmFkZW4tV3VlcnR0ZW1iZXJnMRIw EAYDVQQHEwlLYXJsc3J1aGUxKjAoBgNVBAoTIUthcmxzcnVoZSBJbnN0aXR1dGUgb2YgVGVjaG5v bG9neTEnMCUGA1UECxMeU3RlaW5idWNoIENlbnRyZSBmb3IgQ29tcHV0aW5nMQ8wDQYDVQQDEwZL SVQtQ0ExGTAXBgkqhkiG9w0BCQEWCmNhQGtpdC5lZHUCBBATXTwwgdsGCyqGSIb3DQEJEAILMYHL oIHIMIG/MQswCQYDVQQGEwJERTEbMBkGA1UECBMSQmFkZW4tV3VlcnR0ZW1iZXJnMRIwEAYDVQQH EwlLYXJsc3J1aGUxKjAoBgNVBAoTIUthcmxzcnVoZSBJbnN0aXR1dGUgb2YgVGVjaG5vbG9neTEn MCUGA1UECxMeU3RlaW5idWNoIENlbnRyZSBmb3IgQ29tcHV0aW5nMQ8wDQYDVQQDEwZLSVQtQ0Ex GTAXBgkqhkiG9w0BCQEWCmNhQGtpdC5lZHUCBBATXTwwDQYJKoZIhvcNAQEBBQAEggEAjhBr1/st L+EX49iM3D+B3bL0pUFz7Xpe6Nu6naGI1aUhfRySgNU/jHWpIy7VzojpSxGGVuxrYp3qxiQf3b6o nXiS0o/2SO0f2/hGXjlFZsqEBAAHFptlSsfxxbdBM9gdqu0ktNlW7PnFj3cscUJ5KryrOJUOhq4H POPQUvjn2J+lA8s/HdFo56j7y4pacok/uKDU1/ISrbsBnLeaAe4oA2yVT3kKiCS1muM/6R0dUDAW KkDP+wUjBzmwX7ETIzIb+C4YUw/Qnd2Z+l/BfuDe67IkIjaNaoWIllLtGT47oaQKDCG2n8TqoYRz EcWuF5f0EX0BpBGEHOnwun7X7bxJcQAAAAAAAA== --Apple-Mail-42--920993613--