Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of gabaden@gmail.com designates
 209.85.213.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAD8jd5hmQu28MnVrR3+SX1E1WFeJowaF8d_NnFAqAHs6Ku-nfQ@mail.gmail.com>
References: 
 <CAB1b6fGpz6cTo+aLACLWwVQHAiL387gk5krjD0_-ju4KDKJCtQ@mail.gmail.com>
 <CAD8jd5hmQu28MnVrR3+SX1E1WFeJowaF8d_NnFAqAHs6Ku-nfQ@mail.gmail.com>
From: Denis Gabaydulin <gabaden@gmail.com>
Date: Wed, 9 Nov 2011 22:54:22 +0300
Message-ID: 
 <CAB1b6fHCzUC-wNzAbj20ROsuq3RgMuAJ8NyaWZWiE2GjY9yNWA@mail.gmail.com>
Subject: Re: Physical data layout of columns in super column family
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks for the explanation, Konstantin. I'm a novice in the Cassandra
and not so familiar with the terminology.
You understood the topology well.

I had a quick look at the Cassandra source code and found that my
query from Hector is translated to a list of read commands(inside
CassandraServer). Single command is for each row key. To be fully
understood, my question was about retrieving all super columns data of
single row key(single read command).

On Wed, Nov 9, 2011 at 21:04, Konstantin Naryshkin <konstantinn@a-bb.net> w=
rote:
> I assume that Reports is the Super column family, the first 1: is the
> report id and in the topology is the row key, that the second 1: is
> the report line and in the Cassandra topology the super column, and
> that "value 1" is the column name. If this is not the case, maybe
> explain the topology better.
>
>> Can I get guarantees that all reports lines of one report will be
>> located on the same node in such configuration?
>
> Yes. If I understood the topology right each replica of a report will
> be stored together on a single node (and even be stored in only a few
> locations on disk if you do not update the reports much).
>
> On Wed, Nov 9, 2011 at 04:47, Denis Gabaydulin <gabaden@gmail.com> wrote:
>> Hi, first of all, let me say thank you for the the amazing product :-)
>> So, I have a couple of questions about internal physical data layout.
>>
>> Suppose, I have the following data schema:
>>
>> Reports:{
>> =A0 =A01:{
>> =A0 =A0 =A0 =A01:{"value1":"some val", "value2":"some val"},
>> =A0 =A0 =A0 =A02:{"value1":"some val", "value2":"some val"}
>> =A0 =A0 =A0 =A0...
>> =A0 =A0},
>> =A0 =A02:{
>> =A0 =A0 =A0 =A01:{"value1":"some val", "value2":"some val"},
>> =A0 =A0 =A0 =A02:{"value1":"some val", "value2":"some val"}
>> =A0 =A0 =A0 =A0...
>> =A0 =A0}
>> =A0 =A0...
>> }
>>
>> An each report is represented by a set of report records.
>>
>> Most of the data queries select report by id and all his report lines.
>> I'm going to use the multiget super slice query with ranges(in term of
>> Hector client) for it. Will it be efficient?
>>
>> Another question related with physical layout of the data. I'm going
>> to apply SimpleStrategy with the random partitioner.
>> The replication factor is 1 or 2(it depends on numbers of nodes in the
>> production environment).
>> Can I get guarantees that all reports lines of one report will be
>> located on the same node in such configuration?
>>
>