incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: deletion
Date Thu, 14 Oct 2010 19:45:16 GMT
I would recommend using epoch time for your timestamp and comparing as LongType. The version
1 UUID includes the MAC of the machine that generated it, it two different machines will create
different UUID's for the some time. They are meant to be unique after all http://en.wikipedia.org/wiki/Universally_Unique_Identifier#Version_1_.28MAC_address.29

You may also want to adjust your model, see the discussion on supercolumn limitations here http://wiki.apache.org/cassandra/CassandraLimitations .
Your current model is going to create very big super columns, which will degrade in performance
over time. Perhaps use a standard CF and use "ticket:measure" as the row key, then you can
add 2billion (i think) columns on there for each time. You may still want to break the rows
up further depending on your use case, e.g. ticket:measure:day then perhaps pull back the
entire row to get every value for the day or delete the entire day easily.

For your deletion issue, batch_mutate is your friend. The Deletion struct lets you delete:
- a row, by excluding the predicate and super_column
- a super_column by including super_column and not predicate 
- a column

Some of the things that were not implemented were fixed in 0.6.4 i think. Anyway they all
work AFAIK. 

Hope that helps. 
Aaron


On 15 Oct, 2010,at 07:55 AM, Koert Kuipers <Koert.Kuipers@diamondnotch.com> wrote:

Hello All,
 
I am testing Cassandra 0.7 with the Avro api on a single machine as a financial time series
server, so my setup looks something like this:
keyspace = timeseries, column family = tickdata, key = ticker, super column = field (price,
volume, high, low), column = timestamp.
 
So a single value, say a price of 140.72 for IBM today at 14:00 would be stored as
tickdata[“IBM”][“price”][“2010-10-14 14:00”] = 140.72 (well of course everything
needs to be encoded properly but you get the point).
 
My subcomparator type is TimeUUIDType so that I can do queries over time ranges. Inserting
and querying all work reasonably well so far.
 
But sometimes I have a need to wipe out all the data for all day. To be more precise: I need
to delete the stored values for all keys (tickers) and all super-columns (fields) for a given
time period (condition on column). How would I go about doing that? First a multiget_slice
and then a remove command for each value? Or am I missing an easier way?
 
Is slice deletion within batch_mutate still scheduled to be implemented?
 
Thanks for your help,
Koert
 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message