cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Black...@b3k.us>
Subject Re: Columns limit
Date Sat, 07 Aug 2010 21:33:29 GMT
Right, this is an index row per time interval (your previous email was not).

On Sat, Aug 7, 2010 at 11:43 AM, Mark <static.void.dev@gmail.com> wrote:
> On 8/7/10 11:30 AM, Mark wrote:
>>
>> On 8/7/10 4:22 AM, Thomas Heller wrote:
>>>>
>>>> Ok, I think the part I was missing was the concatenation of the key and
>>>> partition to do the look ups. Is this the preferred way of accomplishing
>>>> needs such as this? Are there alternatives ways?
>>>
>>> Depending on your needs you can concat the row key or use super columns.
>>>
>>>> How would one then "query" over multiple days? Same question for all
>>>> days.
>>>> Should I use range_slice or multiget_slice? And if its range_slice does
>>>> that
>>>> mean I need OrderPreservingPartitioner?
>>>
>>> The last 3 days is pretty simple: ['2010-08-07', '2010-08-06',
>>> '2010-08-05'], as is 7, 31, etc. Just generate the keys in your app
>>> and use multiget_slice.
>>>
>>> If you want to get all days where a specific ip address had some
>>> requests you'll just need another CF where the row key is the addr and
>>> column names are the days (values optional again). Pretty much the
>>> same all over again, just add another CF and insert the data you need.
>>>
>>> get_range_slice in my experience is better used for "offline" tasks
>>> where you really want to process every row there is.
>>>
>>> /thomas
>>
>> Ok... as an example using looking up logs by ip for a certain
>> timeframe/range would this work?
>>
>> <ColumnFamily Name="SearchLog"/>
>>
>> <ColumnFamily Name="IPSearchLog"
>>                           ColumnType="Super"
>>                           CompareWith="UTF8Type"
>>                           CompareSubcolumnsWith="TimeUUIDType"/>
>>
>> Resulting in a structure like:
>>
>> {
>>  "127.0.0.1" : {
>>       "2010080711" : {
>>            uuid1 : ""
>>            uuid2: ""
>>            uuid3: ""
>>       }
>>      "2010080712" : {
>>            uuid1 : ""
>>            uuid2: ""
>>            uuid3: ""
>>       }
>>   }
>>  "some.other.ip" : {
>>       "2010080711" : {
>>            uuid1 : ""
>>       }
>>   }
>> }
>>
>> Whereas each uuid is the key used for SearchLog.  Is there anything wrong
>> with this? I know there is a 2 billion column limit but in this case that
>> would never be exceeded because each column represents an hour. However does
>> the above "schema" imply that for any certain IP there can only be a maxium
>> of 2GB of data stored?
>
> Or should I invert the ip with the time slices? The limitation of this seems
> like there can only be 2 billion unique ips per hour which is more than
> enough for our application :)
>
> {
>  "2010080711" : {
>       "127.0.0.1" : {
>            uuid1 : ""
>            uuid2: ""
>            uuid3: ""
>       }
>      "some.other.ip" : {
>            uuid1 : ""
>            uuid2: ""
>            uuid3: ""
>       }
>   }
>  "2010080712" : {
>       "127.0.0.1" : {
>            uuid1 : ""
>       }
>   }
> }
>
>

Mime
View raw message