hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: The note of the round table meeting after HBaseConAsia 2019
Date Fri, 26 Jul 2019 17:01:00 GMT
Thanks for the thorough write-up Duo. Made for a good read....
S

On Fri, Jul 26, 2019 at 6:43 AM 张铎(Duo Zhang) <palomino219@gmail.com> wrote:

> The conclusion of the HBaseConAsia 2019 will be available later. And here
> is the note of the round table meeting after the conference. A bit long...
>
> First we talked about splittable meta. At Xiaomi we have a cluster which
> has nearly 200k regions and meta is very easy to overload and can not
> recover. Anoop said we can try read replica, but agreed that read replica
> can not solve all the problems, finally we still need to split meta.
>
> Then we talked about SQL. Allan Yang said that most of their customers want
> secondary index, even more than SQL. And for global strong consistent
> secondary index, we agree that the only safe way is to use transaction.
> Other 'local' solutions will be in trouble when splitting/merging. Xiaomi
> has an global secondary index solution, open source it?
>
> Then we back to SQL. We talked about Phoenix, the problem for Phoenix is
> well known: not stable enough. We even had a user on the mailing-list said
> he/she will never use Phoenix again. Alibaba and Huawei both have their
> in-house SQL solution, and Huawei also talked about it on HBaseConAsia
> 2019, they will try to open source it. And we could introduce a SQL proxy
> in hbase-connector repo. No push down support first, all logics are done at
> the proxy side, can optimize later.
>
> Some guys said that the current feature set for 3.0.0 is not good enough to
> attract more users, especially for small companies. Only internal
> improvements, no users visible features. SQL and secondary index are very
> important.
>
> Yu Li talked about the CCSMap, we still want it to be release in 3.0.0. One
> problem is the relationship with in memory compaction. Theoretically they
> should have no conflicts but actually they have. And Xiaomi guys mentioned
> that in memory compaction still has some bugs, even for basic mode, the
> MVCC writePoint may be stuck and hang the region server. And Jieshan Bi
> asked why not just use CCSMap to replace CSLM. Yu Li said this is for
> better memory usage, the index and data could be placed together.
>
> Then we started to talk about the HBase on cloud. For now, it is a bit
> difficult to deploy HBase on cloud as we need to deploy zookeeper and HDFS
> first. Then we talked about the HBOSS and WAL abstraction(HBASE-209520.
> Wellington said the HBOSS basicly works, it use s3a and zookeeper to help
> simulating the operations of HDFS. We could introduce our own 'FileSystem'
> interface, not the hadoop one, and we could remove the 'atomic renaming'
> dependency so the 'FileSystem' implementation will be easier. And on the
> WAL abstraction, Wellington said there are still some guys working it, but
> now they focus on patching ratis, rather than abstracting the WAL system
> first. We agreed that a better way is to abstract WAL system at a level
> higher than FileSystem. so maybe we could even use Kafka to store the WAL.
>
> Then we talked about the FPGA usage for compaction at Alibaba. Jieshan Bi
> said that in Huawei they offload the compaction to storage layer. For open
> source solution, maybe we could offload the compaction to spark, and then
> use something like bulkload to let region server load the new HFiles. The
> problem for doing compaction inside region server is the CPU cost and GC
> pressure. We need to scan every cell so the CPU cost is high. Yu Li talked
> about their page based compaction in flink state store, maybe it could also
> benefit HBase.
>
> Then it is the time for MOB. Huawei said MOD can not solve their problem.
> We still need to read the data through RPC, and it will also introduce
> pressures on the memstore, since the memstore is still a bit small,
> comparing to MOB cell. And we will also flush a lot although there are only
> a small number of MOB cells in the memstore, so we still need to compact a
> lot. So maybe the suitable scenario for using MOB is that, most of your
> data are still small, and a small amount of the data are a bit larger,
> where MOD could increase the performance, and users do not need to use
> another system to store the larger data.
> Huawei said that they implement the logic at client side. If the data is
> larger than a threshold, the client will go to another storage system
> rather than HBase.
> Alibaba said that if we want to support large blob, we need to introduce
> streaming API.
> And Kuaishou said that they do not use MOB, they just store data on HDFS
> and the index in HBase, typical solution.
>
> Then we talked about which company to host the next year's HBaseConAsia. It
> will be Tencent or Huawei, or both, probably in Shenzhen. And since there
> is no HBaseCon in America any more(it is called 'NoSQL Day'), maybe next
> year we could just call the conference HBaseCon.
>
> Then we back to SQL again. Alibaba said that most of their customers are
> migrate from old business, so they need 'full' SQL support. That's why they
> need Phoenix. And lots of small companies wants to run OLAP queries
> directly on the database, they do no want to use ETL. So maybe in the SQL
> proxy(planned above), we should delegate the OLAP queries to spark SQL or
> something else, rather than just rejecting them.
>
> And a Phoenix committer said that, the Phoenix community are currently
> re-evaluate the relationship with HBase, because when upgrading to HBase
> 2.1.x, lots of things are broken. They plan to break the tie between
> Phoenix and HBase, which means Phoenix plans to also run on other storage
> systems.
> Note: This is not on the meeting but personally, I think this maybe a good
> news, since Phoenix is not HBase only, we have more reasons to introduce
> our own SQL layer.
>
> Then we talked about Kudu. It is faster than HBase on scan. If we want to
> increase the performance on scan, we should have larger block size, but
> this will lead to a slower random read, so we need to trade-off.
> The Kuaishou guys asked whether HBase could support storing HFile in
> columnar format. The answer is no, as said above, it will slow random read.
> But we could learn what google done in bigtable. We could write a copy of
> the data in parquet format to another FileSystem, and user could just scan
> the parquet file for better analysis performance. And if they want the
> newest data, they could ask HBase for the newest data, and it should be
> small. This is more like a solution, not only HBase is involved. But at
> least we could introduce some APIs in HBase so users can build the solution
> in their own environment. And if you do not care the newest data, you could
> also use replication to replicate the data to ES or other systems, and
> search there.
>
> And Didi talked about their problems using HBase. They use kylin so they
> also have lots of regions, so meta is also a problem for them. And the
> pressure on zookeeper is also a problem, as the replication queues are
> stored on zk. And after 2.1, zookeeper is only used as an external storage
> in replication implementation, so it is possible to switch to other
> storages, such as etcd. But it is still a bit difficult to store the data
> in a system table, as now we need to start the replication system before
> WAL system, but  if we want to store the replication data in a hbase table,
> obviously the WAL system must be started before replication system, as we
> need the region of the system online first, and it will write an open
> marker to WAL. We need to find a way to break the dead lock.
> And they also mentioned that, the rsgroup feature also makes big znode on
> zookeeper, as they have lots of tables. We have HBASE-22514 which aims to
> solve the problem.
> And last, they shared their experience when upgrading from 0.98 to 1.4.x.
> they should be compatible but actually there are problems. They agreed to
> post a blog about this.
>
> And the Flipkart guys said they will open source their test-suite, which
> focus on the consistency(Jepsen?). This is a good news, hope we could have
> another useful tool other than ITBLL.
>
> That's all. Thanks for reading.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message