Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of prasanna.sreepathi@gmail.com
 designates 74.125.82.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAH-BiJgmaOfjGrvy-9zDaLp-xKYVaYQ+3VWYs3XHgDtSE76rSg@mail.gmail.com>
References: 
 <CAH-BiJifzyZhv47JRbj7zrin=vGQ5r9SJRceZoF83NV1KEcaoQ@mail.gmail.com>
	<CAH-BiJjODbe0symP83sLEr-a35vqR=Tt=cYtSi7JvtxtQv35RQ@mail.gmail.com>
	<CA7BD8E0F7CDC843864175DF4EF19B010109D9A05F@irv-edc-mb02.corp.kbb.com>
	<CAH-BiJiPjqKZtMt039gzNbM7CgXHvz=tUJBcy8zd1zR=9jr4Gw@mail.gmail.com>
	<CA7BD8E0F7CDC843864175DF4EF19B010109D9A097@irv-edc-mb02.corp.kbb.com>
	<CAH-BiJjbcM2L-q6TBW2SdMZ2WacAAe78S4M2RWL5sox-QhKoAA@mail.gmail.com>
	<CA7BD8E0F7CDC843864175DF4EF19B010109D9A102@irv-edc-mb02.corp.kbb.com>
	<CAH-BiJg3eZ063yn91_Udh7iqgRnChoy=phQevFu984ao2NWaHw@mail.gmail.com>
	<CALte62xrg7sgk+-UtdGN=ZKAFpjTmtONFAwDTuBYS9xKgXb=sw@mail.gmail.com>
	<CAH-BiJj6Xny+phV0ppiRto=BOhEjm5GxxhuD5qH-SQd589ZPNQ@mail.gmail.com>
	<CAH-BiJgmaOfjGrvy-9zDaLp-xKYVaYQ+3VWYs3XHgDtSE76rSg@mail.gmail.com>
Date: Tue, 29 Apr 2014 20:32:59 -0700
Message-ID: 
 <CAPt-PJsi1wd30ghx+AY+p7qsvxkeyNSmKbw+=1aTPZNa9YjLWg@mail.gmail.com>
Subject: Re: Help with row and column design
From: Sreepathi <prasanna.sreepathi@gmail.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=001a11c22a527b4f9d04f83a3456

--001a11c22a527b4f9d04f83a3456
Content-Type: text/plain; charset=UTF-8

I guess you can pre-split tables manually which avoids hotspotting..


On Tue, Apr 29, 2014 at 8:08 PM, Software Dev <static.void.dev@gmail.com>wrote:

> Any improvements in the row key design?
>
> If i always know we will be querying by country could/should I prefix
> the row key with the country to help with hotspotting?
>
> FR/2014042901
> FR/2014042902
> ....
> US/2014042901
> US/2014042902
> ...
>
> Is this preferred over adding it in a column... ie 2014042901:Country:US
>
> On Tue, Apr 29, 2014 at 8:05 PM, Software Dev <static.void.dev@gmail.com>
> wrote:
> > Ok didnt know if the sheer number of gets would be a limiting factor.
> Thanks
> >
> > On Tue, Apr 29, 2014 at 7:57 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >> As I said this afternoon:
> >> See the following API in HTable for batching Get's :
> >>
> >>   public Result[] get(List<Get> gets) throws IOException {
> >>
> >> Cheers
> >>
> >>
> >> On Tue, Apr 29, 2014 at 7:45 PM, Software Dev <
> static.void.dev@gmail.com>wrote:
> >>
> >>> Nothing against your code. I just meant that if we are doing a scan
> >>> say for hourly metrics across a 6 month period we are talking about
> >>> 4K+ gets. Is that something that can easily be handled?
> >>>
> >>> On Tue, Apr 29, 2014 at 5:08 PM, Rendon, Carlos (KBB) <CRendon@kbb.com
> >
> >>> wrote:
> >>> >> Gets a bit hairy when doing say a shitload of gets thought.. no?
> >>> >
> >>> > If you by "hairy" you mean the code is ugly, it was written for
> maximal
> >>> clarity.
> >>> > I think you'll find a few sensible loops makes it fairly clean.
> >>> > Otherwise I'm not sure what you mean.
> >>> >
> >>> > -----Original Message-----
> >>> > From: Software Dev [mailto:static.void.dev@gmail.com]
> >>> > Sent: Tuesday, April 29, 2014 5:02 PM
> >>> > To: user@hbase.apache.org
> >>> > Subject: Re: Help with row and column design
> >>> >
> >>> >> Yes. See total_usa vs. total_female_usa above. Basically you have to
> >>> pre-store every level of aggregation you care about.
> >>> >
> >>> > Ok I think this makes sense. Gets a bit hairy when doing say a
> shitload
> >>> of gets thought.. no?
> >>> >
> >>> > On Tue, Apr 29, 2014 at 4:43 PM, Rendon, Carlos (KBB) <
> CRendon@kbb.com>
> >>> wrote:
> >>> >> You don't do a scan, you do a series of gets, which I believe you
> can
> >>> batch into one call.
> >>> >>
> >>> >> last 5 days query in pseudocode
> >>> >> res1 = Get( hash("2014-04-29") + "2014-04-29")
> >>> >> res2 = Get( hash("2014-04-28") + "2014-04-28")
> >>> >> res3 = Get( hash("2014-04-27") + "2014-04-27")
> >>> >> res4 = Get( hash("2014-04-26") + "2014-04-26")
> >>> >> res5 = Get( hash("2014-04-25") + "2014-04-25")
> >>> >>
> >>> >> For each result you look for the particular column or columns you
> are
> >>> >> interested in Total_usa = res1.get("c:usa") + res2.get("c:usa") +
> >>> res3.get("c:usa") + ...
> >>> >> Total_female_usa = res1.get("c:usa:sex:f") + ...
> >>> >>
> >>> >> "What happens when we add more fields? Do we just keep adding in
> more
> >>> column qualifiers? If so, how would we filter across columns to get an
> >>> aggregate total?"
> >>> >>
> >>> >> Yes. See total_usa vs. total_female_usa above. Basically you have to
> >>> pre-store every level of aggregation you care about.
> >>> >>
> >>> >> -----Original Message-----
> >>> >> From: Software Dev [mailto:static.void.dev@gmail.com]
> >>> >> Sent: Tuesday, April 29, 2014 4:36 PM
> >>> >> To: user@hbase.apache.org
> >>> >> Subject: Re: Help with row and column design
> >>> >>
> >>> >>> The downside is it still has a hotspot when inserting, but when
> >>> >>> reading a range of time it does not
> >>> >>
> >>> >> How can you do a scan query between dates when you hash the date?
> >>> >>
> >>> >>> Column qualifiers are just the collection of items you are
> >>> >>> aggregating on. Values are increments. In your case qualifiers
> might
> >>> >>> look like c:usa, c:usa:sex:m, c:usa:sex:f, c:italy:sex:m,
> >>> >>> c:italy:sex:f, c:italy,
> >>> >>
> >>> >> What happens when we add more fields? Do we just keep adding in more
> >>> column qualifiers? If so, how would we filter across columns to get an
> >>> aggregate total?
> >>>
>


-- 
*Regards,*
--- *Sreepathi *

--001a11c22a527b4f9d04f83a3456--