hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yonghu <yongyong...@gmail.com>
Subject Re: multiple data versions vs. multiple rows?
Date Mon, 19 Jan 2015 20:17:34 GMT
Hi,

Thanks for your suggestion. I have already considered the first issue that
one row  is not allowed to be split between 2 regions.

However, I have made a small scan-test with MapReduce. I first created a
table t1 with 1 million rows and allowed each column to store 10 data
versions. Then, I translated t1 into t2 in which multiple data versions in
t1 were transformed into multiple rows in t2. I wrote two MapReduce
programs to scan t1 and t2 individually. What I got is the table scanning
time of t1 is shorter than t2. So, I think for performance reason, multiple
data versions may be a better option than multiple rows.

But just as you said, which approach to use depends on how many historical
events you want to keep.

regards!

Yong


On Mon, Jan 19, 2015 at 8:37 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Yong,
>
> A row will not split between 2 regions. If you plan having thousands of
> versions, based on the size of your data, you might end up having a row
> bigger than your preferred region size.
>
> If you plan just keep few versions of the history to have a look at it, I
> will say go with it. If you plan to have one million version because you
> want to keep all the events history, go with the row approach.
>
> You can also consider going with the Column Qualifier approach. This has
> the same constraint as the versions regarding the split in 2 regions, but
> it might me easier to manage and still give you the consistency of being
> within a row.
>
> JM
>
> 2015-01-19 14:28 GMT-05:00 yonghu <yongyong313@gmail.com>:
>
> > Dear all,
> >
> > I want to record the user history data. I know there exists two options,
> > one is to store user events in a single row with multiple data versions
> and
> > the other one is to use multiple rows. I wonder which one is better for
> > performance?
> >
> > Thanks!
> >
> > Yong
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message