From user-return-33224-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Apr 4 21:45:38 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E28CEFF31 for ; Thu, 4 Apr 2013 21:45:38 +0000 (UTC) Received: (qmail 63113 invoked by uid 500); 4 Apr 2013 21:45:36 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 63014 invoked by uid 500); 4 Apr 2013 21:45:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 62986 invoked by uid 99); 4 Apr 2013 21:45:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 21:45:35 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [66.111.4.221] (HELO new1-smtp.messagingengine.com) (66.111.4.221) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 21:45:30 +0000 Received: from compute6.internal (compute6.nyi.mail.srv.osa [10.202.2.46]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id DAA3019AA for ; Thu, 4 Apr 2013 17:45:09 -0400 (EDT) Received: from frontend1.nyi.mail.srv.osa ([10.202.2.160]) by compute6.internal (MEProxy); Thu, 04 Apr 2013 17:45:09 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=venarc.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=mesmtp; bh=5NviqM2+gCxLyxcTlJ7FjdTJurI=; b=bEaIg B7eyLJZnVQzmEdVFMm1GKGOlK7b5qEPj5OujHoQLTy2GsNR78bZhwJltLCSw+3ay PSfpexbf6DqnkyGV26U0svMQ+VH7z3GygWEaBamg2PPqbUtqcp5a/N0JQQj0eeCp ilw7c1GnVtFYiu1l021v8nWrdcdg8Pudk6ic5s= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; s=smtpout; bh=5NviqM2+g CxLyxcTlJ7FjdTJurI=; b=H5lRt10wA7ZxnHsk9DD6w4do3vnquLf1b4NDhkVpQ t0i9wPQFr5eT6BJsiwsD1IKnHOUiXfoac4nb+cPNsuaZtq61PsVBwPEO+vlBBnRv /BjaMPuuadOuVMLJbSheiWsXYCpDx6qkktvTbE4BXKzC4jjeOc5yUMype9UJDF55 QI= X-Sasl-enc: hYK/MmKPVzdtXFwDxXQnC9XHq4QKCATfERB9p+40vtE2 1365111909 Received: from [192.168.1.2] (unknown [108.60.62.58]) by mail.messagingengine.com (Postfix) with ESMTPA id 0FF6EC80008 for ; Thu, 4 Apr 2013 17:45:08 -0400 (EDT) From: Drew Kutcharian Content-Type: multipart/alternative; boundary="Apple-Mail=_6E23ACFE-561B-4ABB-B8AF-C31E3F4C7BAD" Message-Id: <65958708-4EBB-4EB8-8C6A-C0E7EF81082D@venarc.com> Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: Data Modeling: How to keep track of arbitrarily inserted column names? Date: Thu, 4 Apr 2013 14:45:06 -0700 References: <682A0397-D205-4A46-BBDD-1C0F27DE7762@venarc.com> To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1503) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_6E23ACFE-561B-4ABB-B8AF-C31E3F4C7BAD Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Hi Edward, I anticipate that the column names will be reused a lot. For example, = key1 will be in many rows. So I think the number of distinct column = names will be much much smaller than the number of rows. Is there a way = to have a separate CF that keeps track of the column names?=20 What I was thinking was to have a separate CF that I write only the = column name with a null value in there every time I write a key/value to = the main CF. In this case if that column name exist, then it will just = be overridden. Now if I wanted to get all the column names, then I can = just query that CF. Not sure if that's the best approach at high load = (100k inserts a second). -- Drew On Apr 4, 2013, at 12:02 PM, Edward Capriolo = wrote: > You can not get only the column name (which you are calling a key) you = can use get_range_slice which returns all the columns. When you specify = an empty byte array (new byte[0]{}) as the start and finish you get back = all the columns. =46rom there you can return only the columns to the = user in a format that you like. >=20 >=20 > On Thu, Apr 4, 2013 at 2:18 PM, Drew Kutcharian = wrote: > Hey Guys, >=20 > I'm working on a project and one of the requirements is to have a = schema free CF where end users can insert arbitrary key/value pairs per = row. What would be the best way to know what are all the "keys" that = were inserted (preferably w/o any locking). For example, >=20 > Row1 =3D> key1 -> XXX, key2 -> XXX > Row2 =3D> key1 -> XXX, key3 -> XXX > Row3 =3D> key4 -> XXX, key5 -> XXX > Row4 =3D> key2 -> XXX, key5 -> XXX > =85 >=20 > The query would be give me all the inserted keys and the response = would be {key1, key2, key3, key4, key5} >=20 > Thanks, >=20 > Drew >=20 >=20 --Apple-Mail=_6E23ACFE-561B-4ABB-B8AF-C31E3F4C7BAD Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 Hi = Edward,

I anticipate that the column names will be = reused a lot. For example, key1 will be in many rows. So I think the = number of distinct column names will be much much smaller than the = number of rows. Is there a way to have a separate CF that keeps = track of the column names? 

What I was = thinking was to have a separate CF that I write only the column name = with a null value in there every time I write a key/value to the main = CF. In this case if that column name exist, then it will just be = overridden. Now if I wanted to get all the column names, then I can just = query that CF. Not sure if that's the best approach at high load (100k = inserts a second).

-- = Drew


On Apr 4, 2013, at 12:02 PM, = Edward Capriolo <edlinuxguru@gmail.com> = wrote:

You can not get only the column name = (which you are calling a key) you can use get_range_slice which returns = all the columns. When you specify an empty byte array (new byte[0]{}) as = the start and finish you get back all the columns. =46rom there you can = return only the columns to the user in a format that you like.


On = Thu, Apr 4, 2013 at 2:18 PM, Drew Kutcharian <drew@venarc.com> wrote:
Hey Guys,

I'm working on a project and one of the requirements is to have a schema = free CF where end users can insert arbitrary key/value pairs per row. = What would be the best way to know what are all the "keys" that were = inserted (preferably w/o any locking). For example,

Row1 =3D> key1 -> XXX, key2 -> XXX
Row2 =3D> key1 -> XXX, key3 -> XXX
Row3 =3D> key4 -> XXX, key5 -> XXX
Row4 =3D> key2 -> XXX, key5 -> XXX
=85

The query would be give me all the inserted keys and the response would = be {key1, key2, key3, key4, key5}

Thanks,

Drew



= --Apple-Mail=_6E23ACFE-561B-4ABB-B8AF-C31E3F4C7BAD--