incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jian Fang <>
Subject Re: Cassandra 0.8 questions
Date Tue, 24 May 2011 17:41:01 GMT
Thanks a lot. This is really helpful.


On Tue, May 24, 2011 at 1:34 PM, Victor Kabdebon

> It's not really possible to give a general answer your second question, it
> depends of your implementation. Personally I do two thing : the first one is
> to map arrays with a key and then name of column as a key of your array and
> value of column as the data storage. However for some application, as I am
> using Java I just serialize my ArrayList (or List) and push all the content
> to one column. It all depends on what you want to achieve.
> Third question: try to make CF according to what you want to achieve. I am
> designing an internal messaging system I use only two column family to hold
> the message lists, message and message box. I would have used one; but I
> need one that is sorted by TimeUUID and the other one by UTF8Type. I think
> there is a general consensus here : try to avoid super columns. 2 sets of
> columns can do the same jobs has one SuperColumn and it's
> the preferred scheme.
> Again just experiment and be ready to change your organization if you begin
> with Cassandra, this is the best way to figure out what to do for your data
> organization.
> Victor Kabdebon
> 2011/5/24 Jian Fang <>
>> Does anyone have a good suggestion on my second question? I believe that
>> question is a pretty common one.
>> My third question is a design question. For the same data, we can stored
>> them into multiple column families or a single column family with multiple
>> super columns.
>> From Cassandra read/write performance point of view, what are the general
>> rules to make mutliple column families and when to use a single column
>> family?
>> Thanks again,
>> John
>> On Mon, May 23, 2011 at 5:47 PM, Jian Fang <
>> > wrote:
>>> Hi,
>>> I am pretty new to Cassandra and am going to use Cassandra 0.8.0. I have
>>> two questions (sorry if they are very basic ones):
>>> 1) I have a column family to hold many super columns, say 30. When I
>>> first insert the data to the column family, do I need to insert each column
>>> one at a time or can I insert the whole column family in one transaction (or
>>> call?)? The latter one seems to be more efficient to me. Does Cassandra
>>> support that?
>>> For example, I saw the following code to do insertion (with Hector),
>>> Mutator m = HFactory.createMutator(keyspace, stringSerializer);
>>>                 //Mutator<String> m =
>>> HFactory.createMutator(keyspace,stringSerializer);
>>>                 m.insert(p.getCassandraKey(), colFamily,
>>>                         HFactory.createStringColumn("type",
>>> p.getStringValue()));
>>>                 m.insert(p.getCassandraKey(), colFamily,
>>>                         HFactory.createColumn("data",
>>> p.getCompressedXML(), StringSerializer.get(),
>>>                                 BytesArraySerializer.get()));
>>> Will the insertions be two separate calls to Cassandra? Or they are just
>>> one transaction? If it is the former case, is there any way to make them as
>>> one call to Cassandra?
>>> 2) How to store a list/array of data in Cassandra? For example, I have a
>>> data field called categories, which include none or many categories and each
>>> category includes a category id and a category description. Usually, how do
>>> people handle this scenario when they use Cassandra?
>>> Thanks in advance,
>>> John

View raw message