hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: One-table w/ multi-CF or multi-table w/ one-CF?
Date Sat, 06 Sep 2014 18:09:55 GMT
Please refer to HBASE-5416 Filter on one CF and if a match, then load and
return full row

bq. to extend TableInputFormat to accept multiple row ranges

You mean extending hbase.mapreduce.scan.row.start and
hbase.mapreduce.scan.row.stop so that multiple ranges can be specified ?
How many such ranges do you normally need ?

Cheers


On Sat, Sep 6, 2014 at 11:01 AM, Jianshi Huang <jianshi.huang@gmail.com>
wrote:

> Thanks Ted,
>
> I'll pre-split the table during ingestion. The reason to keep the rowkey
> monotonic is for easier working with TableInputFormat, otherwise I would've
> binned it into 256 splits. (well, I think a good way is to extend
> TableInputFormat to accept multiple row ranges, if there's an existing
> efficient implementation, please let me know :)
>
> Would you elaborate a little more on the heap memory usage during scan? Is
> there any reference to that?
>
> Jianshi
>
>
>
> On Sun, Sep 7, 2014 at 1:20 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > If you use monotonically increasing rowkeys, separating out the column
> > family into a new table would give you same issue you're facing today.
> >
> > Using a single table, essential column family feature would reduce the
> > amount of heap memory used during scan. With two tables, there is no such
> > facility.
> >
> > Cheers
> >
> >
> > On Sat, Sep 6, 2014 at 10:11 AM, Jianshi Huang <jianshi.huang@gmail.com>
> > wrote:
> >
> > > Hi Ted,
> > >
> > > Yes, that's the table having RegionTooBusyExceptions :) But the
> > performance
> > > I care most are scan performance.
> > >
> > > It's mostly for analytics, so I don't care much about atomicity
> > currently.
> > >
> > > What's your suggestion?
> > >
> > > Jianshi
> > >
> > >
> > > On Sun, Sep 7, 2014 at 1:08 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > Is this the same table you mentioned in the thread about
> > > > RegionTooBusyException
> > > > ?
> > > >
> > > > If you move the column family to another table, you may have to
> handle
> > > > atomicity yourself - currently atomic operations are within region
> > > > boundaries.
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Sat, Sep 6, 2014 at 9:49 AM, Jianshi Huang <
> jianshi.huang@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm currently putting everything into one table (to make cross
> > > reference
> > > > > queries easier) and there's one CF which contains rowkeys very
> > > different
> > > > to
> > > > > the rest. Currently it works well, but I'm wondering if it will
> cause
> > > > > performance issues in the future.
> > > > >
> > > > > So my questions are
> > > > >
> > > > > 1) will there be performance penalties in the way I'm doing?
> > > > > 2) should I move that CF to a separate table?
> > > > >
> > > > >
> > > > > Thanks,
> > > > > --
> > > > > Jianshi Huang
> > > > >
> > > > > LinkedIn: jianshi
> > > > > Twitter: @jshuang
> > > > > Github & Blog: http://huangjs.github.com/
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jianshi Huang
> > >
> > > LinkedIn: jianshi
> > > Twitter: @jshuang
> > > Github & Blog: http://huangjs.github.com/
> > >
> >
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message