incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <static.void....@gmail.com>
Subject Re: Columns limit
Date Sat, 07 Aug 2010 18:30:26 GMT
On 8/7/10 4:22 AM, Thomas Heller wrote:
>> Ok, I think the part I was missing was the concatenation of the key and
>> partition to do the look ups. Is this the preferred way of accomplishing
>> needs such as this? Are there alternatives ways?
>>      
> Depending on your needs you can concat the row key or use super columns.
>
>    
>> How would one then "query" over multiple days? Same question for all days.
>> Should I use range_slice or multiget_slice? And if its range_slice does that
>> mean I need OrderPreservingPartitioner?
>>      
> The last 3 days is pretty simple: ['2010-08-07', '2010-08-06',
> '2010-08-05'], as is 7, 31, etc. Just generate the keys in your app
> and use multiget_slice.
>
> If you want to get all days where a specific ip address had some
> requests you'll just need another CF where the row key is the addr and
> column names are the days (values optional again). Pretty much the
> same all over again, just add another CF and insert the data you need.
>
> get_range_slice in my experience is better used for "offline" tasks
> where you really want to process every row there is.
>
> /thomas
>    
Ok... as an example using looking up logs by ip for a certain 
timeframe/range would this work?

<ColumnFamily Name="SearchLog"/>

<ColumnFamily Name="IPSearchLog"
                            ColumnType="Super"
                            CompareWith="UTF8Type"
                            CompareSubcolumnsWith="TimeUUIDType"/>

Resulting in a structure like:

{
   "127.0.0.1" : {
        "2010080711" : {
             uuid1 : ""
             uuid2: ""
             uuid3: ""
        }
       "2010080712" : {
             uuid1 : ""
             uuid2: ""
             uuid3: ""
        }
    }
   "some.other.ip" : {
        "2010080711" : {
             uuid1 : ""
        }
    }
}

Whereas each uuid is the key used for SearchLog.  Is there anything 
wrong with this? I know there is a 2 billion column limit but in this 
case that would never be exceeded because each column represents an 
hour. However does the above "schema" imply that for any certain IP 
there can only be a maxium of 2GB of data stored?

Mime
View raw message