hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Prefix salting pattern
Date Mon, 19 May 2014 18:39:24 GMT
You run n scans in parallel. 
You want a single result set in sort order. 

How do you do that? 

That’s the extra work that you don’t have when you have a single result set. 

This goes in to why the work done for secondary indexing to be associated with the base table
won’t scale or work when you have to consider joins. 

Remember to think beyond your specific use case and think about a general use case.  You said
that you thought about not caring about the RS order when in the general case you have to
consider it. 

Think of it this way… 

In many RDBMSs you have two ways to handle parallelism. 
You can partition your data in a round robin fashion, or you can partition your data against
a range. 
In one use case, the client used a date range partition. That is that they created a partition
based on the month of the data verus just storing it on a round robin fashion.

In one you get a high degree of parallelism because you’re going against the data that’s
spread across the nodes in the database. 
In the other, your data is segmented so you’re only going after a subset of your data that’s
local on to a single system. 

Which is better? 
Which is more efficient? 


On May 19, 2014, at 2:00 PM, Mike Axiak <mike@axiak.net> wrote:

> On Mon, May 19, 2014 at 8:53 AM, Michael Segel
> <michael_segel@hotmail.com> wrote:
>> While in each scan, the RS is in sort order, the overall set of RS needs to be merged
in to one RS and that’s where you start to have issues.
> What issues? As I said, in multiple tests we saw performance
> improvements across the board with this strategy.

View raw message