Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9C6C67313 for ; Wed, 9 Nov 2011 19:55:12 +0000 (UTC) Received: (qmail 5528 invoked by uid 500); 9 Nov 2011 19:55:10 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 5500 invoked by uid 500); 9 Nov 2011 19:55:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 5492 invoked by uid 99); 9 Nov 2011 19:55:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Nov 2011 19:55:10 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gabaden@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Nov 2011 19:55:05 +0000 Received: by yenl7 with SMTP id l7so1298508yen.31 for ; Wed, 09 Nov 2011 11:54:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=BlWlceq167gQZZTVzvajR3PAeJAqYJz1VOdapo5edCc=; b=yG2o8T3BIC9Q4+pNXX5Z9VnTaF5YEjPCvE25CHuDrfF3NhpCwYhsCNMyJAthUwS6gK 01CpZfsdxD/LsjG6FEeQDwoNtURFQ0O5SzoiBh/sXzZz21mIiDlTtwL43cbAMPHidnzb WzntjlFe85qPgOV9i2gEqxYZj70zoQt4kougM= Received: by 10.68.12.201 with SMTP id a9mr7758262pbc.8.1320868484097; Wed, 09 Nov 2011 11:54:44 -0800 (PST) MIME-Version: 1.0 Received: by 10.142.218.20 with HTTP; Wed, 9 Nov 2011 11:54:22 -0800 (PST) In-Reply-To: References: From: Denis Gabaydulin Date: Wed, 9 Nov 2011 22:54:22 +0300 Message-ID: Subject: Re: Physical data layout of columns in super column family To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks for the explanation, Konstantin. I'm a novice in the Cassandra and not so familiar with the terminology. You understood the topology well. I had a quick look at the Cassandra source code and found that my query from Hector is translated to a list of read commands(inside CassandraServer). Single command is for each row key. To be fully understood, my question was about retrieving all super columns data of single row key(single read command). On Wed, Nov 9, 2011 at 21:04, Konstantin Naryshkin w= rote: > I assume that Reports is the Super column family, the first 1: is the > report id and in the topology is the row key, that the second 1: is > the report line and in the Cassandra topology the super column, and > that "value 1" is the column name. If this is not the case, maybe > explain the topology better. > >> Can I get guarantees that all reports lines of one report will be >> located on the same node in such configuration? > > Yes. If I understood the topology right each replica of a report will > be stored together on a single node (and even be stored in only a few > locations on disk if you do not update the reports much). > > On Wed, Nov 9, 2011 at 04:47, Denis Gabaydulin wrote: >> Hi, first of all, let me say thank you for the the amazing product :-) >> So, I have a couple of questions about internal physical data layout. >> >> Suppose, I have the following data schema: >> >> Reports:{ >> =A0 =A01:{ >> =A0 =A0 =A0 =A01:{"value1":"some val", "value2":"some val"}, >> =A0 =A0 =A0 =A02:{"value1":"some val", "value2":"some val"} >> =A0 =A0 =A0 =A0... >> =A0 =A0}, >> =A0 =A02:{ >> =A0 =A0 =A0 =A01:{"value1":"some val", "value2":"some val"}, >> =A0 =A0 =A0 =A02:{"value1":"some val", "value2":"some val"} >> =A0 =A0 =A0 =A0... >> =A0 =A0} >> =A0 =A0... >> } >> >> An each report is represented by a set of report records. >> >> Most of the data queries select report by id and all his report lines. >> I'm going to use the multiget super slice query with ranges(in term of >> Hector client) for it. Will it be efficient? >> >> Another question related with physical layout of the data. I'm going >> to apply SimpleStrategy with the random partitioner. >> The replication factor is 1 or 2(it depends on numbers of nodes in the >> production environment). >> Can I get guarantees that all reports lines of one report will be >> located on the same node in such configuration? >> >