Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C572B4E57 for ; Sat, 9 Jul 2011 20:28:13 +0000 (UTC) Received: (qmail 35352 invoked by uid 500); 9 Jul 2011 20:28:11 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 35270 invoked by uid 500); 9 Jul 2011 20:28:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 35262 invoked by uid 99); 9 Jul 2011 20:28:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jul 2011 20:28:10 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [129.13.185.202] (HELO scc-mailout.scc.kit.edu) (129.13.185.202) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jul 2011 20:28:05 +0000 Received: from vpn-cl-192-35.scc.kit.edu (vpn-cl-192-35.scc.kit.edu [141.3.192.35]) by scc-mailout-02.scc.kit.edu with esmtpsa (Exim 4.72 #1) id 1Qfe7p-0005Cj-IK; Sat, 09 Jul 2011 22:27:42 +0200 From: =?iso-8859-1?Q?G=FCnter_Ladwig?= Content-Type: multipart/signed; boundary=Apple-Mail-41--923907881; protocol="application/pkcs7-signature"; micalg=sha1 Subject: Storing single rows on multiple nodes Date: Sat, 9 Jul 2011 23:27:37 +0300 Message-Id: To: user@cassandra.apache.org Mime-Version: 1.0 (Apple Message framework v1082) X-Mailer: Apple Mail (2.1082) --Apple-Mail-41--923907881 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Hi all, we are currently looking at using Cassandra to store highly skewed RDF = data. With the indexes we use it may happen that a single row contains = up to 20% of the whole dataset, meaning that it can grow larger than = available disk space on single nodes. In [1], it says that this = limitation is not likely to change in the future, but I was wondering if = anybody has looked at this problem?=20 One thing that comes to mind is a simple approach to DHT load-balancing = [2], where keys are assigned to one node of several random alternatives = (which means that for reading, all these nodes have to be queried). This = is a bit similar to replication, except, of course, that only one copy = of the data is stored. As this would require changes to the Cassandra = code base, we could "simulate" this by randomly choosing one of several = predefined suffixes and appending it to a key before storing it. By = modifying a key this way, we could be somewhat sure that it will be = stored at a different node. The first solution would certainly be = preferable. Any thoughts or experiences? Failing that, maybe someone can give me a = pointer into the Cassandra code base, where something like the [2] = should be implemented. Cheers, G=FCnter [1] http://wiki.apache.org/cassandra/CassandraLimitations [2] Byers at el.: Simple Load Balancing for Distributed Hash Tables, = http://www.springerlink.com/content/r9r4qcqxc2bmfqmr/ -- =20 Dipl.-Inform. G=FCnter Ladwig Karlsruhe Institute of Technology (KIT) Institute AIFB Englerstra=DFe 11 (Building 11.40, Room 250) 76131 Karlsruhe, Germany Phone: +49 721 608-47946 Email: guenter.ladwig@kit.edu Web: www.aifb.kit.edu KIT =96 University of the State of Baden-W=FCrttemberg and National = Large-scale Research Center of the Helmholtz Association --Apple-Mail-41--923907881 Content-Disposition: attachment; filename=smime.p7s Content-Type: application/pkcs7-signature; name=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFZjCCBWIw ggRKoAMCAQICBBATXTwwDQYJKoZIhvcNAQEFBQAwgb8xCzAJBgNVBAYTAkRFMRswGQYDVQQIExJC YWRlbi1XdWVydHRlbWJlcmcxEjAQBgNVBAcTCUthcmxzcnVoZTEqMCgGA1UEChMhS2FybHNydWhl IEluc3RpdHV0ZSBvZiBUZWNobm9sb2d5MScwJQYDVQQLEx5TdGVpbmJ1Y2ggQ2VudHJlIGZvciBD b21wdXRpbmcxDzANBgNVBAMTBktJVC1DQTEZMBcGCSqGSIb3DQEJARYKY2FAa2l0LmVkdTAeFw0x MDA1MTkxMjU1MDlaFw0xMzA1MTgxMjU1MDlaMGoxCzAJBgNVBAYTAkRFMSowKAYDVQQKEyFLYXJs c3J1aGUgSW5zdGl0dXRlIG9mIFRlY2hub2xvZ3kxFjAUBgNVBAsTDUluc3RpdHV0IEFJRkIxFzAV BgNVBAMTDkd1ZW50ZXIgTGFkd2lnMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxXvV yGRj1MKYWV8T31fkCIAXPyKEI1TLWmeyi5TwOGjXyjvqMBIYjcJ5Uz6fIXRuEIAIiYEkAMm2V5ex UmPkgxoDsuBsc2UPgW1YlQ9rQAa/2IqrEeKAU4KTJm5uxshx7GiVna3I6OJWDTQtLseMko/sG08j sxkDsi6cysbrY8HNkTt3Sscf1bF+/2R+0deyCJaUIRVavzRSwcO6RdLVGRETh2fP18csM7vnSIpU fjZbvauxbQeGA8cJhtNDWS9LkUQB5O3t4O8kMh3Mwh8kbd0xmR686YseOD6EcDXTq/xtJTIuMLbE kBrA6T1mMGm9+2Y6lXSnNRR/UsOlCVRIyQIDAQABo4IBuDCCAbQwCQYDVR0TBAIwADALBgNVHQ8E BAMCBeAwKQYDVR0lBCIwIAYIKwYBBQUHAwIGCCsGAQUFBwMEBgorBgEEAYI3FAICMB0GA1UdDgQW BBTr4mpDDoWy90/aL3G8SJILQefJFjAfBgNVHSMEGDAWgBQfdGX0mh169jHp32EbcysNbdAzSTAh BgNVHREEGjAYgRZndWVudGVyLmxhZHdpZ0BraXQuZWR1MHcGA1UdHwRwMG4wNaAzoDGGL2h0dHA6 Ly9jZHAxLnBjYS5kZm4uZGUva2l0LWNhL3B1Yi9jcmwvY2FjcmwuY3JsMDWgM6Axhi9odHRwOi8v Y2RwMi5wY2EuZGZuLmRlL2tpdC1jYS9wdWIvY3JsL2NhY3JsLmNybDCBkgYIKwYBBQUHAQEEgYUw gYIwPwYIKwYBBQUHMAKGM2h0dHA6Ly9jZHAxLnBjYS5kZm4uZGUva2l0LWNhL3B1Yi9jYWNlcnQv Y2FjZXJ0LmNydDA/BggrBgEFBQcwAoYzaHR0cDovL2NkcDIucGNhLmRmbi5kZS9raXQtY2EvcHVi L2NhY2VydC9jYWNlcnQuY3J0MA0GCSqGSIb3DQEBBQUAA4IBAQAsu7j7JTw/LGE2NqekzbimXN07 sfZXc9VxzFXGCoPBc+jEXb8zwCBd8vMCairVC3MWwWDhkI/UziYuXL+f5RkH8ypo46q6xO+LLcOh 5+ZIIjBZqp3wB56FnjOSzV3zKEmTr7SGRkP8wRVq5dSYE17LcxWhva5noWOHBSGtzdK9NAx9AA1m 2GqKcYY5Fo2lJJl7tmuyFF42KvFVnlLneMUAqQDVRdMYa/1DQN8HJuC3VohJFCceLXfKhaeGTjTe z+I6kanLeRX2x3tnIQAVPlrFFztXzcslRCwhAAerLa+t1AX4HFciNv43TTkZ3JYlzzNyc0gDHXSP Wa1oMffjSLdEMYIECzCCBAcCAQEwgcgwgb8xCzAJBgNVBAYTAkRFMRswGQYDVQQIExJCYWRlbi1X dWVydHRlbWJlcmcxEjAQBgNVBAcTCUthcmxzcnVoZTEqMCgGA1UEChMhS2FybHNydWhlIEluc3Rp dHV0ZSBvZiBUZWNobm9sb2d5MScwJQYDVQQLEx5TdGVpbmJ1Y2ggQ2VudHJlIGZvciBDb21wdXRp bmcxDzANBgNVBAMTBktJVC1DQTEZMBcGCSqGSIb3DQEJARYKY2FAa2l0LmVkdQIEEBNdPDAJBgUr DgMCGgUAoIICFzAYBgkqhkiG9w0BCQMxCwYJKoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xMTA3 MDkyMDI3MzdaMCMGCSqGSIb3DQEJBDEWBBQy9ffnMyLFJv8s/b9JCWfyCHFDCzCB2QYJKwYBBAGC NxAEMYHLMIHIMIG/MQswCQYDVQQGEwJERTEbMBkGA1UECBMSQmFkZW4tV3VlcnR0ZW1iZXJnMRIw EAYDVQQHEwlLYXJsc3J1aGUxKjAoBgNVBAoTIUthcmxzcnVoZSBJbnN0aXR1dGUgb2YgVGVjaG5v bG9neTEnMCUGA1UECxMeU3RlaW5idWNoIENlbnRyZSBmb3IgQ29tcHV0aW5nMQ8wDQYDVQQDEwZL SVQtQ0ExGTAXBgkqhkiG9w0BCQEWCmNhQGtpdC5lZHUCBBATXTwwgdsGCyqGSIb3DQEJEAILMYHL oIHIMIG/MQswCQYDVQQGEwJERTEbMBkGA1UECBMSQmFkZW4tV3VlcnR0ZW1iZXJnMRIwEAYDVQQH EwlLYXJsc3J1aGUxKjAoBgNVBAoTIUthcmxzcnVoZSBJbnN0aXR1dGUgb2YgVGVjaG5vbG9neTEn MCUGA1UECxMeU3RlaW5idWNoIENlbnRyZSBmb3IgQ29tcHV0aW5nMQ8wDQYDVQQDEwZLSVQtQ0Ex GTAXBgkqhkiG9w0BCQEWCmNhQGtpdC5lZHUCBBATXTwwDQYJKoZIhvcNAQEBBQAEggEAnLRoiwoG q+C1AHoqXC64vAVYL7aML3spvs0uQIidXV+YMKjNqB9xZXZBo/O5loC6Ait9g5owF+taA85XE5Dr uMLW7msbh63EZmuqS7TYPvg+zxtrKz/3c0XJpZ/bSDXSY+1gLvyrBr6c1bq3Q7tTuztU35U656JS VBbGeMpoRpPjx4H8bMjLJxHQ6CjqsBEgBweiyx1d0ehI0CNgvXlDktcBJaxYsh98L78LRyCukc6i WKn09kCz4n6Wzbtddoy5dh/dgfynLYs9bOCEtE0dmAbLde49zRLxATj8Taj33L93VsafjRnfWy4V E+UgsB33YHAXaN/a2Wvp6mDpzLuoewAAAAAAAA== --Apple-Mail-41--923907881--