hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: question about composite rowKey and performance difference between getScanner() and get(Get[])
Date Thu, 04 Dec 2014 18:12:27 GMT
I assume you have read http://hbase.apache.org/book.html#schema.casestudies
(See 6.11.3)

What's the size of data that is not A or B's uniqueIds ? The answer is
related to the amount of data redundancy that you are comfortable with in
your design.


On Wed, Dec 3, 2014 at 12:31 PM, Marc Sturm <mas9161@nyp.org> wrote:

> Hi,
> I have a many to many relationship that I am trying to model in hbase, and
> I want to be sure I am not missing anything so please let me know or point
> to the right documentation.
> Let's say I have an A to B many to many relationship, the query parameter
> takes A unique id and returns all the B uniqueids related to A with their
> properties and values.
> The first solution I found is having two tables: one with the rowKey equal
> to A's unique id, the table column identifiers are equal to B's unique ids
> related to A, the second table has its rowKeys equal to B unique ids and
> its columns contain the property values. So the query is two steps, it
> first does a get on A to collect all the B uniqueIds and then does a second
> get on the B passing as a parameter an array of B rowkeys. When I run the
> second query, I can get a latency much longer on the first query and then
> good low latency on subsequent queries with same parameter. I believe
> that's a caching issue...
> The second solution is having one table with a composite rowkey equal to A
> uniqueid + B uniqueid, I will then have duplicate B uniqueid rows. But when
> I do a scan on the just the first part of the rowKey (A uniqueid) the
> response time and latency is more consistent and better (smaller).
> So, my questions are threefold: 1) which way is the best, 2) what is the
> performance difference between a scan and a get with multiple rowkeys (I
> think scan is faster because the data is not or less "distributed") and 3)
> how can we make the get with multiple rowkeys more consistent?
> Thank you for your help,
> Marc
> This electronic message is intended to be for the use only of the named
> recipient, and may contain information that is confidential or privileged.
> If you are not the intended recipient, you are hereby notified that any
> disclosure, copying, distribution or use of the contents of this message is
> strictly prohibited.  If you have received this message in error or are not
> the named recipient, please notify us immediately by contacting the sender
> at the electronic mail address noted above, and delete and destroy all
> copies of this message.  Thank you.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message