hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: multiple data versions vs. multiple rows?
Date Mon, 19 Jan 2015 21:33:37 GMT
Hi Yong,

If you want to compare the performances, you need to run way bigger and
longer tests. Dont run them in parallete. Run them at least 10 time each to
make sure you have a good trend. Is the difference between the 2
significant? It should not.

JM

2015-01-19 15:17 GMT-05:00 yonghu <yongyong313@gmail.com>:

> Hi,
>
> Thanks for your suggestion. I have already considered the first issue that
> one row  is not allowed to be split between 2 regions.
>
> However, I have made a small scan-test with MapReduce. I first created a
> table t1 with 1 million rows and allowed each column to store 10 data
> versions. Then, I translated t1 into t2 in which multiple data versions in
> t1 were transformed into multiple rows in t2. I wrote two MapReduce
> programs to scan t1 and t2 individually. What I got is the table scanning
> time of t1 is shorter than t2. So, I think for performance reason, multiple
> data versions may be a better option than multiple rows.
>
> But just as you said, which approach to use depends on how many historical
> events you want to keep.
>
> regards!
>
> Yong
>
>
> On Mon, Jan 19, 2015 at 8:37 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Hi Yong,
> >
> > A row will not split between 2 regions. If you plan having thousands of
> > versions, based on the size of your data, you might end up having a row
> > bigger than your preferred region size.
> >
> > If you plan just keep few versions of the history to have a look at it, I
> > will say go with it. If you plan to have one million version because you
> > want to keep all the events history, go with the row approach.
> >
> > You can also consider going with the Column Qualifier approach. This has
> > the same constraint as the versions regarding the split in 2 regions, but
> > it might me easier to manage and still give you the consistency of being
> > within a row.
> >
> > JM
> >
> > 2015-01-19 14:28 GMT-05:00 yonghu <yongyong313@gmail.com>:
> >
> > > Dear all,
> > >
> > > I want to record the user history data. I know there exists two
> options,
> > > one is to store user events in a single row with multiple data versions
> > and
> > > the other one is to use multiple rows. I wonder which one is better for
> > > performance?
> > >
> > > Thanks!
> > >
> > > Yong
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message