hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianshi Huang <jianshi.hu...@gmail.com>
Subject Re: One-table w/ multi-CF or multi-table w/ one-CF?
Date Sat, 06 Sep 2014 18:01:19 GMT
Thanks Ted,

I'll pre-split the table during ingestion. The reason to keep the rowkey
monotonic is for easier working with TableInputFormat, otherwise I would've
binned it into 256 splits. (well, I think a good way is to extend
TableInputFormat to accept multiple row ranges, if there's an existing
efficient implementation, please let me know :)

Would you elaborate a little more on the heap memory usage during scan? Is
there any reference to that?

Jianshi



On Sun, Sep 7, 2014 at 1:20 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> If you use monotonically increasing rowkeys, separating out the column
> family into a new table would give you same issue you're facing today.
>
> Using a single table, essential column family feature would reduce the
> amount of heap memory used during scan. With two tables, there is no such
> facility.
>
> Cheers
>
>
> On Sat, Sep 6, 2014 at 10:11 AM, Jianshi Huang <jianshi.huang@gmail.com>
> wrote:
>
> > Hi Ted,
> >
> > Yes, that's the table having RegionTooBusyExceptions :) But the
> performance
> > I care most are scan performance.
> >
> > It's mostly for analytics, so I don't care much about atomicity
> currently.
> >
> > What's your suggestion?
> >
> > Jianshi
> >
> >
> > On Sun, Sep 7, 2014 at 1:08 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > Is this the same table you mentioned in the thread about
> > > RegionTooBusyException
> > > ?
> > >
> > > If you move the column family to another table, you may have to handle
> > > atomicity yourself - currently atomic operations are within region
> > > boundaries.
> > >
> > > Cheers
> > >
> > >
> > > On Sat, Sep 6, 2014 at 9:49 AM, Jianshi Huang <jianshi.huang@gmail.com
> >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm currently putting everything into one table (to make cross
> > reference
> > > > queries easier) and there's one CF which contains rowkeys very
> > different
> > > to
> > > > the rest. Currently it works well, but I'm wondering if it will cause
> > > > performance issues in the future.
> > > >
> > > > So my questions are
> > > >
> > > > 1) will there be performance penalties in the way I'm doing?
> > > > 2) should I move that CF to a separate table?
> > > >
> > > >
> > > > Thanks,
> > > > --
> > > > Jianshi Huang
> > > >
> > > > LinkedIn: jianshi
> > > > Twitter: @jshuang
> > > > Github & Blog: http://huangjs.github.com/
> > > >
> > >
> >
> >
> >
> > --
> > Jianshi Huang
> >
> > LinkedIn: jianshi
> > Twitter: @jshuang
> > Github & Blog: http://huangjs.github.com/
> >
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message