cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William R Speirs <bill.spe...@gmail.com>
Subject Re: Schema Design
Date Thu, 27 Jan 2011 01:14:38 GMT
It makes sense that the single row for a system (with a growing number of 
columns) will reside on a single machine.

With that in mind, here is my updated schema:

- A single column family for all the messages. The row keys will be the TimeUUID 
of the message with the following columns: date/time (in UTC POSIX), system 
name/id (with an index for fast/easy gets), the actual message payload.

- A column family for each system. The row keys will be UTC POSIX time with 1 
second (maybe 1 minute) bucketing, and the column names will be the TimeUUID of 
any messages that were logged during that time bucket.

My only hesitation with this design is that buddhasystem warned that each column 
family, "is allocated a piece of memory on the server." I'm not sure what the 
implications of this are and/or if this would be a problem if a I had a number 
of systems on the order of hundreds.

Thanks...

Bill-

On 01/26/2011 06:51 PM, Shu Zhang wrote:
> Each row can have a maximum of 2 billion columns, which a logging system will probably
hit eventually.
>
> More importantly, you'll only have 1 row per set of system logs. Every row is stored
on the same machine(s), which you means you'll definitely not be able to distribute your load
very well.
> ________________________________________
> From: Bill Speirs [bill.speirs@gmail.com]
> Sent: Wednesday, January 26, 2011 1:23 PM
> To: user@cassandra.apache.org
> Subject: Re: Schema Design
>
> I like this approach, but I have 2 questions:
>
> 1) what is the implications of continually adding columns to a single
> row? I'm unsure how Cassandra is able to grow. I realize you can have
> a virtually infinite number of columns, but what are the implications
> of growing the number of columns over time?
>
> 2) maybe it's just a restriction of the CLI, but how do I do issue a
> slice request? Also, what if start (or end) columns don't exist? I'm
> guessing it's smart enough to get the columns in that range.
>
> Thanks!
>
> Bill-
>
> On Wed, Jan 26, 2011 at 4:12 PM, David McNelis
> <dmcnelis@agentisenergy.com>  wrote:
>> I would say in that case you might want  to try a  single column family
>> where the key to the column is the system name.
>> Then, you could name your columns as the timestamp.  Then when retrieving
>> information from the data store you can can, in your slice request, specify
>> your start column as  X and end  column as Y.
>> Then you can use the stored column name to know when an event  occurred.
>>
>> On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirs<bill.speirs@gmail.com>  wrote:
>>>
>>> I'm looking to use Cassandra to store log messages from various
>>> systems. A log message only has a message (UTF8Type) and a data/time.
>>> My thought is to create a column family for each system. The row key
>>> will be a TimeUUIDType. Each row will have 7 columns: year, month,
>>> day, hour, minute, second, and message. I then have indexes setup for
>>> each of the date/time columns.
>>>
>>> I was hoping this would allow me to answer queries like: "What are all
>>> the log messages that were generated between X&  Y?" The problem is
>>> that I can ONLY use the equals operator on these column values. For
>>> example, I cannot issuing: get system_x where month>  1; gives me this
>>> error: "No indexed columns present in index clause with operator EQ."
>>> The equals operator works as expected though: get system_x where month
>>> = 1;
>>>
>>> What schema would allow me to get date ranges?
>>>
>>> Thanks in advance...
>>>
>>> Bill-
>>>
>>> * ColumnFamily description *
>>>     ColumnFamily: system_x_msg
>>>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>>>       Row cache size / save period: 0.0/0
>>>       Key cache size / save period: 200000.0/3600
>>>       Memtable thresholds: 1.1671875/249/60
>>>       GC grace seconds: 864000
>>>       Compaction min/max thresholds: 4/32
>>>       Read repair chance: 1.0
>>>       Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572,
>>> proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468,
>>> proj_1_msg.7365636f6e64, proj_1_msg.79656172]
>>>       Column Metadata:
>>>         Column Name: year (year)
>>>           Validation Class: org.apache.cassandra.db.marshal.IntegerType
>>>           Index Type: KEYS
>>>         Column Name: month (month)
>>>           Validation Class: org.apache.cassandra.db.marshal.IntegerType
>>>           Index Type: KEYS
>>>         Column Name: second (second)
>>>           Validation Class: org.apache.cassandra.db.marshal.IntegerType
>>>           Index Type: KEYS
>>>         Column Name: minute (minute)
>>>           Validation Class: org.apache.cassandra.db.marshal.IntegerType
>>>           Index Type: KEYS
>>>         Column Name: hour (hour)
>>>           Validation Class: org.apache.cassandra.db.marshal.IntegerType
>>>           Index Type: KEYS
>>>         Column Name: day (day)
>>>           Validation Class: org.apache.cassandra.db.marshal.IntegerType
>>>           Index Type: KEYS
>>
>>
>>
>> --
>> David McNelis
>> Lead Software Engineer
>> Agentis Energy
>> www.agentisenergy.com
>> o: 630.359.6395
>> c: 219.384.5143
>> A Smart Grid technology company focused on helping consumers of energy
>> control an often under-managed resource.
>>
>>

Mime
View raw message