Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
References: 
 <CAHtRq2csj7ztBsYougxeCYE1sLBnH35c9MnmhccRShHBSoNnBA@mail.gmail.com>
 <CAHtRq2dUomjww3PtiitmQrjNtJuiDZC0SeNGHB_1CtrdiiXACg@mail.gmail.com>
 <1334080891.36223.YahooMailNeo@web164503.mail.gq1.yahoo.com>
 <CAHtRq2cwRbwLd7Qj7oeAwKZjfLWj8SZK7gu9w+9uwHSikTvEcA@mail.gmail.com>
Message-ID: <1334102008.40388.YahooMailNeo@web164501.mail.gq1.yahoo.com>
Date: Tue, 10 Apr 2012 16:53:28 -0700 (PDT)
From: Andrew Purtell <apurtell@apache.org>
Reply-To: Andrew Purtell <apurtell@apache.org>
Subject: Re: Add client complexity or use a coprocessor?
To: "user@hbase.apache.org" <user@hbase.apache.org>
In-Reply-To: 
 <CAHtRq2cwRbwLd7Qj7oeAwKZjfLWj8SZK7gu9w+9uwHSikTvEcA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

> Even my implementation of an atomic increment=0A> (using a coprocessor) i=
s two orders of magnitude slower than the=0A> provided implementation.=A0 A=
re there properties inherent to=0A> coprocessors or Incrementors that would=
 force this kind of performance=0A> difference?=0A=0A=0ANo.=0A=0A=0AYou may=
 be seeing a performance difference if you are packing multiple Increments =
into one round trip but not doing a similar kind of batching if calling a c=
ustom endpoint. Each Endpoint invocation is a round trip unless you do some=
thing like:=0A=0A=A0=A0=A0 List<Row> actions =3D new ArrayList<Row>();=A0=
=A0=A0 actions.add(new Exec(conf, row, protocol, method, ...));=0A=0A=A0=A0=
=A0 actions.add(new Exec(conf, row, protocol, method, ...));=0A=0A=A0=A0=A0=
 actions.add(new Exec(conf, row, protocol, method, ...));=0A=0A=A0=A0=A0 Ob=
ject[] results =3D table.batch(actions);=0A=A0=A0=A0 ...=0A=0A=0AI've not p=
ersonally tried that particular API combination but don't see why it would =
not be possible.=0A=0A=0ABeyond that, I'd suggest running a regionserver wi=
th your coprocessor installed under a profiler to see if you have monitor c=
ontention or a hotspot or similar. It could be something unexpected.=0A=0A=
=0A> Can you think of an efficient way to implement an atomic bitfield=0A> =
(other than adding it as a separate feature like atomic increments)?=0A=0AI=
 think the idea of an atomic bitfield operation as part of the core API is =
intriguing. It has applicability to your estimator use case and I can think=
 of a couple of things I could use it for. If there is more support for thi=
s idea, this may be something to consider.=0A=0A=0ABest regards,=0A=0A=0A=
=A0 =A0 - Andy=0A=0AProblems worthy of attack prove their worth by hitting =
back. - Piet Hein (via Tom White)=0A=0A=0A=0A----- Original Message -----=
=0A> From: Tom Brown <tombrown52@gmail.com>=0A> To: user@hbase.apache.org; =
Andrew Purtell <apurtell@apache.org>=0A> Cc: =0A> Sent: Tuesday, April 10, =
2012 3:53 PM=0A> Subject: Re: Add client complexity or use a coprocessor?=
=0A> =0A> Andy,=0A> =0A> I have attempted to use coprocessors to achieve a =
passable performance=0A> but have failed so far. Even my implementation of =
an atomic increment=0A> (using a coprocessor) is two orders of magnitude sl=
ower than the=0A> provided implementation.=A0 Are there properties inherent=
 to=0A> coprocessors or Incrementors that would force this kind of performa=
nce=0A> difference?=0A> =0A> Can you think of an efficient way to implement=
 an atomic bitfield=0A> (other than adding it as a separate feature like at=
omic increments)?=0A> =0A> Thanks!=0A> =0A> --Tom=0A> =0A> On Tue, Apr 10, =
2012 at 12:01 PM, Andrew Purtell <apurtell@apache.org> =0A> wrote:=0A>>  To=
m,=0A>>>  I am a big fan of the Increment class. Unfortunately, I'm not doi=
ng=0A>>>  simple increments for the viewer count. I will be receiving dupli=
cate=0A>>>  messages from a particular client for a specific cube cell, and=
 =0A> don't=0A>>>  want them to be counted twice=0A>> =0A>>  Gotcha.=0A>> =
=0A>>>  I created an RPC endpoint coprocessor to perform this function but=
=0A>>>  performance suffered heavily under load (it appears that the endpoi=
nt=0A>>>  performs all functions in serial).=0A>> =0A>>  Did you serialize =
access to your data structure(s)?=0A>> =0A>>>  When I tried implementing it=
 as a region observer, I was unsure of how=0A>>>  to correctly replace the =
provided "put" with my own. When I =0A> issued a=0A>>>  put from within "pr=
ePut", the server blocked the new put =0A> (waiting for=0A>>>  the "prePut"=
 to finish). Should I be attempting to modify the =0A> WALEdit=0A>>>  objec=
t?=0A>> =0A>>  You can add KVs to the WALEdit. Or, you can get a reference =
to the =0A> Put's familyMap:=0A>> =0A>>  =A0=A0=A0 Map<byte[], List<KeyValu=
e>> familyMap =3D put.getFamilyMap();=0A>> =0A>>  and if you modify the map=
, you'll change what gets committed.=0A>> =0A>>>  Is there a way to extend =
the functionality of "Increment" to =0A> provide=0A>>>  arbitrary bitwise o=
perations on a the contents of a field?=0A>> =0A>>  As a matter of design, =
this should be a new operation. It does sound =0A> interesting and useful, =
some sort of atomic bitfield.=0A>> =0A>> =0A>>  Best regards,=0A>> =0A>>  =
=A0 =A0 - Andy=0A>> =0A>>  Problems worthy of attack prove their worth by h=
itting back. - Piet Hein =0A> (via Tom White)=0A>> =0A>> =0A>> =0A>>  -----=
 Original Message -----=0A>>>  From: Tom Brown <tombrown52@gmail.com>=0A>>>=
  To: user@hbase.apache.org=0A>>>  Cc:=0A>>>  Sent: Monday, April 9, 2012 1=
0:14 PM=0A>>>  Subject: Re: Add client complexity or use a coprocessor?=0A>=
>> =0A>>>  Andy,=0A>>> =0A>>>  I am a big fan of the Increment class. Unfor=
tunately, I'm not doing=0A>>>  simple increments for the viewer count. I wi=
ll be receiving duplicate=0A>>>  messages from a particular client for a sp=
ecific cube cell, and =0A> don't=0A>>>  want them to be counted twice (my s=
tats don't have to be 100%=0A>>>  accurate, but the expected rate of duplic=
ates will be higher than the=0A>>>  allowable error rate).=0A>>> =0A>>>  I =
created an RPC endpoint coprocessor to perform this function but=0A>>>  per=
formance suffered heavily under load (it appears that the endpoint=0A>>>  p=
erforms all functions in serial).=0A>>> =0A>>>  When I tried implementing i=
t as a region observer, I was unsure of how=0A>>>  to correctly replace the=
 provided "put" with my own. When I =0A> issued a=0A>>>  put from within "p=
rePut", the server blocked the new put =0A> (waiting for=0A>>>  the "prePut=
" to finish). Should I be attempting to modify the =0A> WALEdit=0A>>>  obje=
ct?=0A>>> =0A>>>  Is there a way to extend the functionality of "Increment"=
 to =0A> provide=0A>>>  arbitrary bitwise operations on a the contents of a=
 field?=0A>>> =0A>>>  Thanks again!=0A>>> =0A>>>  --Tom=0A>>> =0A>>>>  If i=
t helps, yes this is possible:=0A>>>> =0A>>>>> =A0=A0Can I observe updates =
to a=0A>>>>> =A0=A0particular table and replace the provided data with my o=
wn? =0A> (The=0A>>>>> =A0=A0client calls "put" with the actual user ID, my =
=0A> co-processor=0A>>>  replaces=0A>>>>> =A0=A0it with a computed value, s=
o the actual user ID never gets =0A> stored in=0A>>>>> =A0=A0HBase).=0A>>>>=
 =0A>>>>  Since your option #2 requires atomic updates to the data structur=
e, =0A> have you=0A>>>  considered native=0A>>>>  atomic increments? See=0A=
>>>> =0A>>>> =0A> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/c=
lient/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long=
,%20boolean%29=0A>>>> =0A>>>> =0A>>>>  or=0A>>>> =0A>>>> =0A> http://hbase.=
apache.org/apidocs/org/apache/hadoop/hbase/client/Increment.html=0A>>>> =0A=
>>>>  The former is a round trip for each value update. The latter allows =
=0A> you to=0A>>>  pack multiple updates=0A>>>>  into a single round trip. =
This would give you accurate counts even =0A> with=0A>>>  concurrent writer=
s.=0A>>>> =0A>>>>  It should be possible for you to do partial aggregation =
on the =0A> client side=0A>>>  too whenever parallel=0A>>>>  requests coloc=
ate multiple updates to the same cube within some =0A> small window=0A>>>  =
of time.=0A>>>> =0A>>>>  Best regards,=0A>>>> =0A>>>> =0A>>>>  =A0 =A0 - An=
dy=0A>>>> =0A>>>>  Problems worthy of attack prove their worth by hitting b=
ack. - Piet =0A> Hein=0A>>>  (via Tom White)=0A>>>> =0A>>>>  ----- Original=
 Message -----=0A>>>>> =A0=A0From: Tom Brown <tombrown52@gmail.com>=0A>>>>>=
 =A0=A0To: user@hbase.apache.org=0A>>>>> =A0=A0Cc:=0A>>>>> =A0=A0Sent: Mond=
ay, April 9, 2012 9:48 AM=0A>>>>> =A0=A0Subject: Add client complexity or u=
se a coprocessor?=0A>>>>> =0A>>>>> =A0=A0To whom it may concern,=0A>>>>> =
=0A>>>>> =A0=A0Ignoring the complexities of gathering the data, assume that=
 I =0A> will be=0A>>>>> =A0=A0tracking millions of unique viewers. Updates =
from each of our =0A> millions=0A>>>>> =A0=A0of clients are gathered in a c=
entralized platform and spread =0A> among a=0A>>>>> =A0=A0group of machines=
 for processing and inserting into HBase =0A> (assume that=0A>>>>> =A0=A0th=
is group can be scaled horizontally). The data is stored in =0A> an OLAP=0A=
>>>>> =A0=A0cube format and one of the metrics I'm tracking across =0A> var=
ious=0A>>>>> =A0=A0attributes is viewership (how many people from Y are wat=
ching =0A> X).=0A>>>>> =0A>>>>> =A0=A0I'm writing this to ask for your thou=
ghts as to the most=0A>>>  appropriate=0A>>>>> =A0=A0way to structure my da=
ta so I can count unique TV viewers =0A> (assume a=0A>>>>> =A0=A0service li=
ke netflix or hulu).=0A>>>>> =0A>>>>> =A0=A0Here are the solutions I'm cons=
idering:=0A>>>>> =0A>>>>> =A0=A01. Store each unique user ID as the cell na=
me within the =0A> cube(s) it=0A>>>>> =A0=A0occurs. This has the advantage =
of having 100% accuracy, but =0A> the=0A>>>>> =A0=A0downside is the enormou=
s space required to store each unique =0A> cell.=0A>>>>> =A0=A0Consuming th=
is data is also problematic as the only way to =0A> provide a=0A>>>>> =A0=
=A0viewership count is by counting each cell. To save the =0A> overhead of=
=0A>>>>> =A0=A0sending each cell over the network, counting them could be =
=0A> done by a=0A>>>>> =A0=A0coprocessor on the region server, but that sti=
ll doesn't =0A> avoid the=0A>>>>> =A0=A0overhead of reading each cell from =
the disk. I'm also not =0A> sure what=0A>>>>> =A0=A0happens if a single row=
 is larger than an entire region (48 =0A> bytes per=0A>>>>> =A0=A0user ID *=
 10,000,000 users =3D 480GB).=0A>>>>> =0A>>>>> =A0=A02. Store a byte array =
that allows estimating unique viewers =0A> (with a=0A>>>>> =A0=A0small marg=
in of error*). Add a co-processor for updating this =0A> column=0A>>>>> =A0=
=A0so I can guarantee the updates to a specific OLAP cell will be =0A> atom=
ic.=0A>>>>> =A0=A0The main benefit from this path is that there the nodes t=
hat =0A> update=0A>>>>> =A0=A0HBase can be less complex. Another benefit I =
see is that the I =0A> can=0A>>>>> =A0=A0just add more HBase regions as sca=
le requires. However, =0A> I'm not=0A>>>  sure=0A>>>>> =A0=A0if I can use a=
 coprocessor the way I want; Can I observe =0A> updates to a=0A>>>>> =A0=A0=
particular table and replace the provided data with my own? =0A> (The=0A>>>=
>> =A0=A0client calls "put" with the actual user ID, my =0A> co-processor=
=0A>>>  replaces=0A>>>>> =A0=A0it with a computed value, so the actual user=
 ID never gets =0A> stored in=0A>>>>> =A0=A0HBase).=0A>>>>> =0A>>>>> =A0=A0=
3. Store a byte array that allows estimating unique viewers =0A> (with a=0A=
>>>>> =A0=A0small margin of error*). Re-arrange my architecture so that =0A=
> each OLAP=0A>>>>> =A0=A0cell is only updated by a single node. The main b=
enefit from =0A> this=0A>>>>> =A0=A0would be that I don't need to worry abo=
ut atomic =0A> operations in=0A>>>  HBase=0A>>>>> =A0=A0since all updates f=
or a single cell will be atomic and in =0A> serial. The=0A>>>>> =A0=A0bigge=
st downside is that I believe it will add significant =0A> complexity=0A>>>=
>> =A0=A0to my overall architecture.=0A>>>>> =0A>>>>> =0A>>>>> =A0=A0Thanks=
 for your time, and I look forward to hearing your =0A> thoughts.=0A>>>>> =
=0A>>>>> =A0=A0Sincerely,=0A>>>>> =A0=A0Tom Brown=0A>>>>> =0A>>>>> =A0=A0*(=
For information about the byte array mentioned in #2 and #3, =0A> see:=0A>>=
>>> =0A>>> =0A> http://highscalability.com/blog/2012/4/5/big-data-counting-=
how-to-count-a-billion-distinct-objects-us.html)=0A>>>>> =0A>>> =0A>