hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shushant Arora <shushantaror...@gmail.com>
Subject Re: hbase doubts
Date Tue, 18 Aug 2015 05:08:14 GMT
Thanks !
few more doubts :

1.Say if requirement is to count distinct value of F1-

If field is part of key- is hbase can't just scan key and skip value
deserialsation and return result to client which will calculate distinct
and in second approcah Hbase will desrialise the value of return column
containing F1 to cleint which will calculate the distinct.

2.For bulk load when LoadIncrementalHFiles runs and regionserver moves the
hfiles from hdfs to region directory - does regionserver localise the hfile
by downloading it to local and then uploading again in region directory? Or
it just moves to to region directory and wait for next compaction to get it
localise  as in regionserver failure case?




On Mon, Aug 17, 2015 at 11:00 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> For both scenarios you mentioned, field is not leading part of row key.
> You would need to specify timerange or start row / stop row to narrow the
> key range being scanned.
>
> I am leaning toward using second approach.
>
> Cheers
>
> On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <shushantarora09@gmail.com
> >
> wrote:
>
> > ~8-10 fields of size (5 of  20 bytes each )and 3 fields of size 200 bytes
> > each.
> >
> > On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > How many fields such as F1 are you considering for embedding in row
> key ?
> > >
> > > Suggested reading:
> > > http://hbase.apache.org/book.html#rowkey.design
> > > http://hbase.apache.org/book.html#client.filter.kvm (see
> > > ColumnPrefixFilter)
> > >
> > > Cheers
> > >
> > > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora <
> > shushantarora09@gmail.com
> > > >
> > > wrote:
> > >
> > > > 1.so size limit is per cell's identifier + value ?
> > > >
> > > > What is more optimise - to have field in key or in column family's
> > > column ?
> > > > If pattern is like every row has that field.
> > > >
> > > > Say I have a field F1 in all rows so
> > > > Situtatio -1
> > > > key1#F1(as composite key)  - and rest fields in column
> > > >
> > > > Situation-2
> > > > key1 as key and F1 part of column family.
> > > >
> > > >
> > > > This is the main reason I  asked the key size limit.
> > > > If I asked for no of rows where F1 is = 'someval' will it be faster
> in
> > > > situation-1 than in situation-2. Since in 1 it can return the result
> > just
> > > > by traversing keys no need to read columns?
> > > >
> > > >
> > > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > >
> > > > > For #1, it is the limit on a single keyvalue, not row, not key.
> > > > >
> > > > > For #2, please see the following:
> > > > >
> > > > > http://hbase.apache.org/book.html#store.memstore
> > > > >
> > >
> http://hbase.apache.org/book.html#regionserver_splitting_implementation
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora <
> > > > shushantarora09@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > 1.Is hbase.client.keyvalue.maxsize  is max size of row or key
> only
> > ?
> > > Is
> > > > > > there any limit on key size only ?
> > > > > > 2.Access pattern is mostly on key based only- Is memstores and
> > > regions
> > > > > on a
> > > > > > regionserver are per table basis? Is it if I have multiple tables
> > it
> > > > will
> > > > > > have multiple memstores instead of few if it would have been
one
> > > large
> > > > > > table ?
> > > > > >
> > > > > >
> > > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <yuzhihong@gmail.com>
> > wrote:
> > > > > >
> > > > > > > For #1, take a look at the following in hbase-default.xml
:
> > > > > > >
> > > > > > >     <name>hbase.client.keyvalue.maxsize</name>
> > > > > > >     <value>10485760</value>
> > > > > > >
> > > > > > > For #2, it would be easier to answer if you can outline
access
> > > > patterns
> > > > > > in
> > > > > > > your app.
> > > > > > >
> > > > > > > For #3, adjustment according to current region boundaries
is
> done
> > > > > client
> > > > > > > side. Take a look at the javadoc for LoadQueueItem
> > > > > > > in LoadIncrementalHFiles.java
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora <
> > > > > > shushantarora09@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > 1.Is there any max limit on key size of hbase table.
> > > > > > > > 2.Is multiple small tables vs one large table which
one is
> > > > preferred.
> > > > > > > > 3.for bulk load -when  LoadIncremantalHfile is run
it again
> > > > > > recalculates
> > > > > > > > the region splits based on region boundary - is this
division
> > > > happens
> > > > > > on
> > > > > > > > client side or server side again at region server
or hbase
> > master
> > > > and
> > > > > > > then
> > > > > > > > it assigns the splits which cross target region boundary
to
> > > desired
> > > > > > > > regionserver.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message