cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Даниел Симеонов <dsimeo...@gmail.com>
Subject Re: question about how columns are deserialized in memory
Date Wed, 28 Apr 2010 13:11:21 GMT
Hi Sylvain,
  Thank you very much! I still have some further questions, I didn't find
how row cache is being configured? Regarding the splitting of rows, I
understand that it is not so necessary, still I am curious whether it is
implementable by the client code.
Best regards, Daniel.

2010/4/28 Sylvain Lebresne <sylvain@yakaz.com>

> 2010/4/28 Даниел Симеонов <dsimeonov@gmail.com>:
> > Hi,
> >    I have a question about if a row in a Column Family has only columns
> > whether all of the columns are deserialized in memory if you need any of
> > them? As I understood it is the case,
>
> No it's not. Only the columns you request are deserialized in memory. The
> only
> thing is that, as of now, during compaction the entire row will be
> deserialize at
> once. So it just have to still fit in memory. But depending of the
> typical size of
> your column, you can easily millions of columns in a row without it
> being a problem
> at all.
>
> >  and if the Column Family is super
> > Column Family, then only the Super Column (entire) is brought up in
> memory?
>
> Yes, that part is true. That is the problem with the current
> implementation of super
> columns. While you can have lots of column in one row, you probably
> don't want to
> have lots of columns in one super column (but it's no problem to have
> lots of super
> column in one row).
>
> > What about row cache, is it different than memtable?
>
> Be careful with row cache. If row cache is enable, then yes, any read
> in a row will read
> the entire row. So you typically don't want to use row cache in column
> family where rows
> have lots of columns (unless you always read all the columns in the
> row each time of
> course).
>
> > I have another one question, let's say there is only data to be inserted
> and
> > a solution to it is to have columns to be added to rows in Column Family,
> is
> > it possible in Cassandra to split the row if certain threshold is
> reached,
> > say 100 columns per row, what if there are concurrent inserts?
>
> No, cassandra can't do that for you. But you should be okay with what
> you describe
> below. That is, if a given row corresponds to an hour of data, it will
> limit it's size.
> And again, the number of column in a row is not really limited as long as
> the
> overall size of the row fits easily in memory.
>
> > The original data model and use case is to insert timestamped data and to
> > make range queries. The original keys of CF rows were in the form of
> > <id>.<timestamp> and then a single column with data, OPP was used. This
> is
> > not an optimal solution, since nodes are hotter than others, I am
> thinking
> > of changing the model in the way to have keys like <id>.<year/month/day>
> and
> > then a list of columns with timestamps within this range and
> > RandomPartitioner or using OPP but preprocess part of the key with MD5,
> i.e.
> > the key is MD5(<id>.<year/month/day>) + "hour of the day" . Just the
> problem
> > is how to deal with large number of columns being inserted in a
> particular
> > row.
> > Thank you very much!
> > Best regards, Daniel.
>

Mime
View raw message