incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Data Modeling: How to keep track of arbitrarily inserted column names?
Date Thu, 04 Apr 2013 23:20:04 GMT
Your reverse index of "which rows contain a column named X" will have very
wide rows. You could look at cassandra's secondary indexing, or possibly
look at a solandra/solr approach. Another option is you can shift the
problem slightly, "which rows have column X that was added between time y
and time z". Remember with few distinct column names that reverse index of
column to row is going to be a very big list.


On Thu, Apr 4, 2013 at 5:45 PM, Drew Kutcharian <drew@venarc.com> wrote:

> Hi Edward,
>
> I anticipate that the column names will be reused a lot. For example, key1
> will be in many rows. So I think the number of distinct column names will
> be much much smaller than the number of rows. Is there a way to have a
> separate CF that keeps track of the column names?
>
> What I was thinking was to have a separate CF that I write only the column
> name with a null value in there every time I write a key/value to the main
> CF. In this case if that column name exist, then it will just be
> overridden. Now if I wanted to get all the column names, then I can just
> query that CF. Not sure if that's the best approach at high load (100k
> inserts a second).
>
> -- Drew
>
>
> On Apr 4, 2013, at 12:02 PM, Edward Capriolo <edlinuxguru@gmail.com>
> wrote:
>
> You can not get only the column name (which you are calling a key) you can
> use get_range_slice which returns all the columns. When you specify an
> empty byte array (new byte[0]{}) as the start and finish you get back all
> the columns. From there you can return only the columns to the user in a
> format that you like.
>
>
> On Thu, Apr 4, 2013 at 2:18 PM, Drew Kutcharian <drew@venarc.com> wrote:
>
>> Hey Guys,
>>
>> I'm working on a project and one of the requirements is to have a schema
>> free CF where end users can insert arbitrary key/value pairs per row. What
>> would be the best way to know what are all the "keys" that were inserted
>> (preferably w/o any locking). For example,
>>
>> Row1 => key1 -> XXX, key2 -> XXX
>> Row2 => key1 -> XXX, key3 -> XXX
>> Row3 => key4 -> XXX, key5 -> XXX
>> Row4 => key2 -> XXX, key5 -> XXX
>> …
>>
>> The query would be give me all the inserted keys and the response would
>> be {key1, key2, key3, key4, key5}
>>
>> Thanks,
>>
>> Drew
>>
>>
>
>

Mime
View raw message