Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CAH53hK2MQxe5unuS4b6kCMoGAdLm-h1sOeP+NZSDEnhUxfPVOg@mail.gmail.com>
References: 
 <CAL+hLkRdiBv1xprNaxrsOANO3eydnD9iev03vqu5oT2SgjKibQ@mail.gmail.com>
	<CAH53hK2MQxe5unuS4b6kCMoGAdLm-h1sOeP+NZSDEnhUxfPVOg@mail.gmail.com>
Date: Mon, 31 Oct 2011 11:08:03 -0700
Message-ID: 
 <CAL+hLkSZ1SPKBH6-yk+tTVYri+5GxFHuV8BeDPvsBNxVH-M_jw@mail.gmail.com>
Subject: Re: data model for unique users in a time period
From: Ed Anuff <ed@anuff.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks, good point, splitting wide rows via sharding is a good
optimization for the get_count approach.

On Mon, Oct 31, 2011 at 10:58 AM, Zach Richardson
<j.zach.richardson@gmail.com> wrote:
> Ed,
>
> I could be completely wrong about this working--I haven't specifically
> looked at how the counts are executed, but I think this makes sense.
>
> You could potentially shard across several rows, based on a hash of
> the username combined with the time period as the row key. =A0Run a
> count across each row and then add them up. =A0If your cluster is large
> enough this could spread the computation enough to make each query for
> the count a bit faster.
>
> Depending on how often this query would be hit, I would still
> recommend caching, but you could calculate reality a little more
> often.
>
> Zach
>
>
> On Mon, Oct 31, 2011 at 12:22 PM, Ed Anuff <ed@anuff.com> wrote:
>> I'm looking at the scenario of how to keep track of the number of
>> unique visitors within a given time period. =A0Inserting user ids into a
>> wide row would allow me to have a list of every user within the time
>> period that the row represented. =A0My experience in the past was that
>> using get_count on a row to get the column count got slow pretty quick
>> but that might still be the easiest way to get the count of unique
>> users with some sort of caching of the count so that it's not
>> expensive subsequently. =A0Using Hadoop is overkill for this scenario.
>> Any other approaches?
>>
>> Ed
>>
>