hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Software Dev <static.void....@gmail.com>
Subject Re: Help with row and column design
Date Wed, 30 Apr 2014 17:28:56 GMT
Yes ill be storing at multiple levels of aggregation.



On Wed, Apr 30, 2014 at 9:21 AM, Rendon, Carlos (KBB) <CRendon@kbb.com> wrote:
>> Ok didnt know if the sheer number of gets would be a limiting factor. Thanks
>
> Yes retrieving and summing thousands of rows is much slower and requires more network,
memory, cpu, than doing that for a hundred or <10.
> Perhaps day-level, week-level, or month-level granularity would be a better fit for a
6 month aggregation?
> You did say you were going to store data at multiple levels of time aggregation right?
>
>
> -----Original Message-----
> From: Software Dev [mailto:static.void.dev@gmail.com]
> Sent: Tuesday, April 29, 2014 8:05 PM
> To: user@hbase.apache.org
> Subject: Re: Help with row and column design
>
> Ok didnt know if the sheer number of gets would be a limiting factor. Thanks
>
> On Tue, Apr 29, 2014 at 7:57 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> As I said this afternoon:
>> See the following API in HTable for batching Get's :
>>
>>   public Result[] get(List<Get> gets) throws IOException {
>>
>> Cheers
>>
>>
>> On Tue, Apr 29, 2014 at 7:45 PM, Software Dev <static.void.dev@gmail.com>wrote:
>>
>>> Nothing against your code. I just meant that if we are doing a scan
>>> say for hourly metrics across a 6 month period we are talking about
>>> 4K+ gets. Is that something that can easily be handled?
>>>
>>> On Tue, Apr 29, 2014 at 5:08 PM, Rendon, Carlos (KBB)
>>> <CRendon@kbb.com>
>>> wrote:
>>> >> Gets a bit hairy when doing say a shitload of gets thought.. no?
>>> >
>>> > If you by "hairy" you mean the code is ugly, it was written for
>>> > maximal
>>> clarity.
>>> > I think you'll find a few sensible loops makes it fairly clean.
>>> > Otherwise I'm not sure what you mean.
>>> >
>>> > -----Original Message-----
>>> > From: Software Dev [mailto:static.void.dev@gmail.com]
>>> > Sent: Tuesday, April 29, 2014 5:02 PM
>>> > To: user@hbase.apache.org
>>> > Subject: Re: Help with row and column design
>>> >
>>> >> Yes. See total_usa vs. total_female_usa above. Basically you have
>>> >> to
>>> pre-store every level of aggregation you care about.
>>> >
>>> > Ok I think this makes sense. Gets a bit hairy when doing say a
>>> > shitload
>>> of gets thought.. no?
>>> >
>>> > On Tue, Apr 29, 2014 at 4:43 PM, Rendon, Carlos (KBB)
>>> > <CRendon@kbb.com>
>>> wrote:
>>> >> You don't do a scan, you do a series of gets, which I believe you
>>> >> can
>>> batch into one call.
>>> >>
>>> >> last 5 days query in pseudocode
>>> >> res1 = Get( hash("2014-04-29") + "2014-04-29")
>>> >> res2 = Get( hash("2014-04-28") + "2014-04-28")
>>> >> res3 = Get( hash("2014-04-27") + "2014-04-27")
>>> >> res4 = Get( hash("2014-04-26") + "2014-04-26")
>>> >> res5 = Get( hash("2014-04-25") + "2014-04-25")
>>> >>
>>> >> For each result you look for the particular column or columns you
>>> >> are interested in Total_usa = res1.get("c:usa") +
>>> >> res2.get("c:usa") +
>>> res3.get("c:usa") + ...
>>> >> Total_female_usa = res1.get("c:usa:sex:f") + ...
>>> >>
>>> >> "What happens when we add more fields? Do we just keep adding in
>>> >> more
>>> column qualifiers? If so, how would we filter across columns to get
>>> an aggregate total?"
>>> >>
>>> >> Yes. See total_usa vs. total_female_usa above. Basically you have
>>> >> to
>>> pre-store every level of aggregation you care about.
>>> >>
>>> >> -----Original Message-----
>>> >> From: Software Dev [mailto:static.void.dev@gmail.com]
>>> >> Sent: Tuesday, April 29, 2014 4:36 PM
>>> >> To: user@hbase.apache.org
>>> >> Subject: Re: Help with row and column design
>>> >>
>>> >>> The downside is it still has a hotspot when inserting, but when
>>> >>> reading a range of time it does not
>>> >>
>>> >> How can you do a scan query between dates when you hash the date?
>>> >>
>>> >>> Column qualifiers are just the collection of items you are
>>> >>> aggregating on. Values are increments. In your case qualifiers
>>> >>> might look like c:usa, c:usa:sex:m, c:usa:sex:f, c:italy:sex:m,
>>> >>> c:italy:sex:f, c:italy,
>>> >>
>>> >> What happens when we add more fields? Do we just keep adding in
>>> >> more
>>> column qualifiers? If so, how would we filter across columns to get
>>> an aggregate total?
>>>

Mime
View raw message