hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ashish singhi <ashish.sin...@huawei.com>
Subject RE: Notes from dev meetup in Shenzhen, August 5th, 2017
Date Tue, 08 Aug 2017 02:54:10 GMT
Great write up, Stack. Covering everything what we all discussed.
It was very nice meeting you all and hope we can continue this HBaseCon Asia.

Regards,
Ashish

From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: 08 August 2017 00:07
To: HBase Dev List <dev@hbase.apache.org>
Subject: Notes from dev meetup in Shenzhen, August 5th, 2017

At fancy Huawei headquarters, 10:00-12:00AM or so (with nice coffee and fancy little cake
squares provided about half way through the session).

For list of attendees, see picture at end of this email.

Discussion was mostly in Chinese with about 25% in English plus some gracious sideline translation
so the below is patchy. Hopefully you get the gist.

For client-side scanner going against hfiles directly; is there a means of being able to pass
the permissions from hbase to hdfs?

Issues w/ the hbase 99th percentile were brought up. "DynamoDB can do 10ms". How to do better?

SSD is not enough.

GC messes us up.

Will the Distributed Log Replay come back to help improve MTTR? We could redo on new ProcedureV2
basis. ZK timeout is the biggest issue. Do as we used to and just rely on the regionserver
heartbeating...

Read replica helps w/ MTTR.

Ratis incubator project to do a quorum based hbase?

Digression on licensing issues around fb wangle and folly.

Redo of hbase but quorum based would be another project altogether.

Decided to go around the table to talk about concerns and what people are working on.

Jieshan wondered what could be done to improve OLAP over hbase.

Client side scanner was brought up again as means of skipping RS overhead and doing better
OLAP.

Have HBase compact to parquet files. Query parquet and hbase.

At Huawei, they are using 1.0 hbase. Most problems are assignment. They have .5M regions.
RIT is a killer. Double assignment issues. And RIT. They run their own services. Suggested
they upgrade to get fixes at least. Then 2.0.

Will HBase federate like HDFS? Can Master handle load at large scale? It needs to do federation
too?

Anyone using Bulk loaded replication? (Yes, it just works so no one talks about it...)

Request that fixes be backported to all active branches, not just most current.

Andrew was good at backporting... not all RMs are.

Too many branches. What should we do?

Proliferation of branches makes for too much work.

Need to cleanup bugs in 1.3. Make it stable release now.

Lets do more active EOL'ing of branches. 1.1?.

Hubert asked if we can have clusters where RS are differently capable? i.e. several generations
of HW all running in the same cluster.

What if fat server goes down.

Balancer could take of it all. RS Capacity. Balancer can take it into account.
Regionserver labels like YARN labels. Characteristics.

Or run it all in docker when heterogeneous cluster. The K8 talk from day before was mentioned;
we should all look at being able to deploy in k8 and docker.

Lets put out kubernetes blog...(Doing).

Alibaba looking at HBase as native YARN app.

i/o is hard even when containers.

Use autoscaler of K8 when heavy user.

Limit i/o use w/ CP. Throttle.

Spark and client-side scanner came up again.

Snapshot input format in spark.

HBase federation came up again. jd.com<http://jd.com> talking of 3k to 4k nodes in a
cluster. Millions of regions. Region assignment is messing them up.

Maybe federation is good idea? Argument that it is too much operational conplexity. Can we
fix master load w/ splittable meta, etc?

Was brought up that even w/ 100s of RS there is scale issue, nvm thousands.

Alibaba talked about disaster recovery. Described issue where HDFS has fencing problem during
an upgrade. There was no active NN. All RS went down.
ZK is another POF. If ZK is not available. Operators were being asked how much longer the
cluster was going to be down but they could not answer the question. No indicators from HBase
on how much longer it will be down or how many WALs its processed and how many more to go.
Operator unable to tell his org how long it would be before it all came back on line. Should
say how many regions are online and how many more to do.

Alibaba use SQL to lower cost. HBase API is low-level. Row-key construction is tricky. New
users make common mistakes. If you don't do schema right, high-performance is difficult.

Alibaba are using a subset of Phoenix... simple sql only; throws exceptions if user tries
to do joins, etc.., anything but basic ops.

HareQL is using hive for meta store.  Don't have data typing in hbase.

HareQL could perhaps contribute some piece... or a module in hbase to sql... From phoenix?

Secondary index.

Client is complicated in phoenix. Was suggested thin client just does parse... and then offload
to server for optimization and execution.

Then secondary index. Need transaction engine. Consistency of secondary index.

We adjourned.

Your dodgy secretary,
St.Ack
P.S. Please add to this base set of notes if I missed anything.



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message