Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Received-SPF: softfail (athena.apache.org: transitioning domain of
 sylvain@yakaz.com does not designate 209.85.216.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <AANLkTimu8e-_GGATAZaBXbuz85SeR6MNDHtKpYmwWGOJ@mail.gmail.com>
References: <3CCCC121-BD60-4D3B-B7AA-353CEAB9C241@oskarsson.nu>
 <AANLkTi=dEnGrUOKNJxZAzw4z7W+GMPgWEk9RWoJOetZ_@mail.gmail.com>
 <AANLkTimoaJ8vQnbJWi8YQNxUNz1JnvdmvhgnZZxuP_=G@mail.gmail.com>
 <AANLkTik1bMESuMDqe9XD70NyTUS1c8oAowQua+DHBKLq@mail.gmail.com>
 <AANLkTimSWnc__4Db+=hjR_K6QvGGOL7wXpT2BoYzq-Fz@mail.gmail.com>
 <4C0416C5-6422-48D0-9055-092543C47C42@oskarsson.nu>
 <AANLkTinUOmoV3vqsBD5HzHBVG0jRhMbmhCO8CqaK-ZvE@mail.gmail.com>
 <F881C660-BCA1-4A96-81D9-302302FA7643@oskarsson.nu>
 <AANLkTi=uT3h2Q-OLQh0jt_0k2Dm3DMOuxUYoriu3dYNN@mail.gmail.com>
 <AANLkTi=P9V+AjC4RYq3mgGKcOHP1VWoGRkky30PbB663@mail.gmail.com>
 <AANLkTikowEyaWr5FmJjyJS4g93esYLNYcTXYn=bZ4HPM@mail.gmail.com>
 <AANLkTim6cu3h95=SOOj0_so7FaQSvigzfYaT-FkdF3gO@mail.gmail.com>
 <AANLkTikA9wsgAjaghXzVPr+VbjsPQCTwqyquGnDj8NMZ@mail.gmail.com>
 <AANLkTimoZCzA_p=LJZmQe-u7d5_rr8-w0cJUpvEBe5Q7@mail.gmail.com>
 <AANLkTiniU12ZfDm+33Vshk3=E6p7ehSV2kEhRAa9Hznp@mail.gmail.com>
 <AANLkTik5P=pR=6y6q6txKD_Jx_aK3tpL5-vCPffpvgfN@mail.gmail.com>
 <0073704B-BDD4-44D1-8CCD-44C9B084A3EF@gmail.com>
 <AANLkTinYd3=ZV-f3+RsnF7GBBq-iz7RqCy7KU0VtUZ+K@mail.gmail.com>
 <AANLkTinhakOyE7LuYHj+GXtOD3_Gqr24-NyK8feknFve@mail.gmail.com>
 <AANLkTin0T_SPTKdRPikCizJzV9aQD4BbHv4+C5xjmAAW@mail.gmail.com>
 <AANLkTinWFYcSiin3TvjGqPqv1U3o+15PyBMiMLyqPghv@mail.gmail.com>
 <AANLkTin6YdHt2_9-7dAkATCV6k3f7Rn0Rp4nXXH_JWsG@mail.gmail.com>
 <AANLkTimL7EbRNREXX-tPO0DbgV2KO-QebYB8PrJDTgsX@mail.gmail.com>
 <AANLkTimu8e-_GGATAZaBXbuz85SeR6MNDHtKpYmwWGOJ@mail.gmail.com>
From: Sylvain Lebresne <sylvain@yakaz.com>
Date: Fri, 1 Oct 2010 16:42:40 +0200
Message-ID: <AANLkTimusxVMSViHvQ7FGhB_qA318WaPjCsLmDWj9xz1@mail.gmail.com>
Subject: Re: [DISCUSSION] High-volume counters in Cassandra
To: dev@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Sep 30, 2010 at 6:29 PM, Ryan King <ryan@twitter.com> wrote:
> On Tue, Sep 28, 2010 at 10:14 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>> On Tue, Sep 28, 2010 at 4:00 PM, Sylvain Lebresne <sylvain@yakaz.com> wrote:
>>> I agree that it is worth adding a support for counter as supercolumns
>>> in 1546 and that's fairly trivial, so I will add that as soon as possible
>>> (but please understand that I'm working on this for a good part during
>>> my free time).
>>>
>>> As for supercolumns of counters, there is what Jonathan proposes, but
>>> I'll add that encoding a supercolumns CF to a standard column CF is
>>> almost always a fairly trivial encoding. Worst case scenario it requires
>>> you to roll up your own comparator and it's slightly less convenient
>>> for client code.
>>
>> Supporting supercolumns to allow multiple counters per row, but
>> requiring encoding with a custom comparator for deeper nesting, seems
>> like a reasonable compromise to me.
>
> I don't understand how this would work.

This is a general thing, not related to counter. Let's get a little bit
precise.

The idea is that you will encode the following super row:
  key: {
    scol1 : {
      col1 : v1,
      col2 : v2,
      col3 : v3
    },
    scol2 : {
      col4 : v4,
      col5 : v5
    }
  }
as the standard row:
  key : {
    scol1|col1 : v1,
    scol1|col2 : v2,
    scol1|col3 : v3,
    scol2|col4 : v4,
    scol2|col5 : v5
  }
To get slightly more technical, scol1|col1 could be:
  [length of scol1][scol1 bytes][0][col1 bytes]

And by that I mean that the bytes of the column in the encoding will start by
4 bytes for the length of scol1 (a byte[]), then scol1, then a 0 byte, then
col1.
The reason for the 0 byte after the super column name is for slice queries, to
express the end of the super column (in the encoding). More precisely, for
slice queries, the start of the scol1 is
  [length of scol1][scol1 bytes][0]
and the end of scol1 is
  [length of scol1][scol1 bytes][1]

The custom comparator is fairly easy to write. It takes the super column
comparator (comp1) and the column comparator (comp2). To compare two (encoded)
keys, it first read the super column name of the two keys (using the size at
the start to each key) and compare them with comp1. If there are not equal,
return the comparison value. Otherwise, read the next byte of each key. If
unequal, biggest key is the one with the 1. If equal, read the two columns
name and compare using comp2.

Translating slice predicates is fairly trivial. You'll just have to iterate
over the result to regroup the columns into super columns, but no biggy.

The only thing that is less efficient is querying super columns by names
(querying sub columns by name is fine however). For that, you'll have to
issue one slice query for each requested name.

Some remove operation could probably also be slightly less efficient, but in
the end removes is broken with counters (both in 1072 and 1546, I'll refer you
to the comments of this last ticket), so it's not a big deal.

To sum up, I can see the following drawbacks to such encoding:
  - querying SC by names is less efficient.
  - it takes more disk space (but that's the cheapest resource we have
isn't it).
They have however at least one advantage:
  - your super columns are indexed, you don't have to deserialize them
    entirely each time.

I'd say these are fair compromises.

--
Sylvain