2010/4/28 Даниел Симеонов <firstname.lastname@example.org>:
> Hi,No it's not. Only the columns you request are deserialized in memory. The only
> I have a question about if a row in a Column Family has only columns
> whether all of the columns are deserialized in memory if you need any of
> them? As I understood it is the case,
thing is that, as of now, during compaction the entire row will be
once. So it just have to still fit in memory. But depending of the
typical size of
your column, you can easily millions of columns in a row without it
being a problem
Yes, that part is true. That is the problem with the current
> and if the Column Family is super
> Column Family, then only the Super Column (entire) is brought up in memory?
implementation of super
columns. While you can have lots of column in one row, you probably
don't want to
have lots of columns in one super column (but it's no problem to have
lots of super
column in one row).
Be careful with row cache. If row cache is enable, then yes, any read
> What about row cache, is it different than memtable?
in a row will read
the entire row. So you typically don't want to use row cache in column
family where rows
have lots of columns (unless you always read all the columns in the
row each time of
No, cassandra can't do that for you. But you should be okay with what
> I have another one question, let's say there is only data to be inserted and
> a solution to it is to have columns to be added to rows in Column Family, is
> it possible in Cassandra to split the row if certain threshold is reached,
> say 100 columns per row, what if there are concurrent inserts?
below. That is, if a given row corresponds to an hour of data, it will
limit it's size.
And again, the number of column in a row is not really limited as long as the
overall size of the row fits easily in memory.
> The original data model and use case is to insert timestamped data and to
> make range queries. The original keys of CF rows were in the form of
> <id>.<timestamp> and then a single column with data, OPP was used. This is
> not an optimal solution, since nodes are hotter than others, I am thinking
> of changing the model in the way to have keys like <id>.<year/month/day> and
> then a list of columns with timestamps within this range and
> RandomPartitioner or using OPP but preprocess part of the key with MD5, i.e.
> the key is MD5(<id>.<year/month/day>) + "hour of the day" . Just the problem
> is how to deal with large number of columns being inserted in a particular
> Thank you very much!
> Best regards, Daniel.