cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <static.void....@gmail.com>
Subject Re: Columns limit
Date Sat, 07 Aug 2010 18:43:50 GMT
On 8/7/10 11:30 AM, Mark wrote:
> On 8/7/10 4:22 AM, Thomas Heller wrote:
>>> Ok, I think the part I was missing was the concatenation of the key and
>>> partition to do the look ups. Is this the preferred way of 
>>> accomplishing
>>> needs such as this? Are there alternatives ways?
>> Depending on your needs you can concat the row key or use super columns.
>>
>>> How would one then "query" over multiple days? Same question for all 
>>> days.
>>> Should I use range_slice or multiget_slice? And if its range_slice 
>>> does that
>>> mean I need OrderPreservingPartitioner?
>> The last 3 days is pretty simple: ['2010-08-07', '2010-08-06',
>> '2010-08-05'], as is 7, 31, etc. Just generate the keys in your app
>> and use multiget_slice.
>>
>> If you want to get all days where a specific ip address had some
>> requests you'll just need another CF where the row key is the addr and
>> column names are the days (values optional again). Pretty much the
>> same all over again, just add another CF and insert the data you need.
>>
>> get_range_slice in my experience is better used for "offline" tasks
>> where you really want to process every row there is.
>>
>> /thomas
> Ok... as an example using looking up logs by ip for a certain 
> timeframe/range would this work?
>
> <ColumnFamily Name="SearchLog"/>
>
> <ColumnFamily Name="IPSearchLog"
>                            ColumnType="Super"
>                            CompareWith="UTF8Type"
>                            CompareSubcolumnsWith="TimeUUIDType"/>
>
> Resulting in a structure like:
>
> {
>   "127.0.0.1" : {
>        "2010080711" : {
>             uuid1 : ""
>             uuid2: ""
>             uuid3: ""
>        }
>       "2010080712" : {
>             uuid1 : ""
>             uuid2: ""
>             uuid3: ""
>        }
>    }
>   "some.other.ip" : {
>        "2010080711" : {
>             uuid1 : ""
>        }
>    }
> }
>
> Whereas each uuid is the key used for SearchLog.  Is there anything 
> wrong with this? I know there is a 2 billion column limit but in this 
> case that would never be exceeded because each column represents an 
> hour. However does the above "schema" imply that for any certain IP 
> there can only be a maxium of 2GB of data stored?
Or should I invert the ip with the time slices? The limitation of this 
seems like there can only be 2 billion unique ips per hour which is more 
than enough for our application :)

{
   "2010080711" : {
        "127.0.0.1" : {
             uuid1 : ""
             uuid2: ""
             uuid3: ""
        }
       "some.other.ip" : {
             uuid1 : ""
             uuid2: ""
             uuid3: ""
        }
    }
   "2010080712" : {
        "127.0.0.1" : {
             uuid1 : ""
        }
    }
}


Mime
View raw message