hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Rough notes from dev meetup, day after hbaseconasia 2018, saturday morning
Date Sun, 19 Aug 2018 10:48:36 GMT
There were about 30 of us. I didn't take roll. See photos below [1][2].
PMCers, committers, contributors, and speakers from the day before. There
is no attribution of comments or ideas. Please excuse. No agenda.

TESTING
What do people do testing?
Allan Yang is finding good stuff when he tests AMv2 compared to me. Why?
slowDeterministic does more op types than serverKilling.
What do others do for testing?
Add more variety to the ITBLLs, more chaos?
What for performance testing?
YCSB.
Batch is important. Its what our users do. Recent addition of batch in YCSB
(and in PerformanceEvaluation). Size of batch matters too. And number of
clients.
Alibaba described what they do.
Advocate that we all try different test types rather than all do same runs.
Need to add new async client into YCSB. Alibaba use it for their testing of
new SEDA core (upstreaming soon).
Understanding each others benchmarks can take a while. Common understanding
takes some effort, communication.
New hbase-operation-tools will be good place to put perf and testing
tooling.

GITHUB
Can hbase adopt the github dev flow? Support PRs?
Its a case of just starting the discussion on the dev list?
Do we lose review/commentary information if we go github route? Brief
overview of what is possible w/ the new gitbox repos follows ultimately
answering that no, there should be no loss (github comments show as jira
comments).
Most have github but not apache accounts. PRs are easier. Could encourage
more contribution, lower the barrier to contrib.
Other tools for hbase-operation-tools would be stuff like the alibaba
tooling for shutting down servers... moving regions to new one.

PERF ACROSS VERSIONS
Lucent (lucene?) has a perf curve on home page with markings for when large
features arrived and when releases were cut so can see if increase/decrease
in perf.
There was a big slowdown going from 0.98 to 1.1.2 hbase.
We talked about doing such a perf curve on hbase home page. Would be a big
project. Asked if anyone interested?
Perhaps a dedicated cluster up on Apache. We could do a whip-around to pay
for it.

USER FRIENDLY
Small/new users have a hard time. Is there a UI for users to see data in
cells or to change schema, or to create/drop tables. Is there anything we
can do here?
Much back and forth.
Xiaomi don't let users have access to shell. Have a web ui where you click
to build command that is run for you. Afraid that users will mistakenly
destroy the database so shudown access.
It turns out that most of the bigger players present have some form of UI
built against hbase. Alibaba have something. The DiDi folks have howto wiki
pages.
Talked about upstreaming.
Where to put it? hbase-operator-tools?
What about Docker file to give devs their own hbase easily. Can throw away
when done.
One attendee talked of Hue from CDH, how it is good for simple insert and
view.
Can check the data. For testing and feel-good getting to know system, it
helps.
Another uses Apache Drill but tough when types.
New users need to be able to import data from a csv.
How hard to have a few pages of clicky, clicky, wizard to create/drop
tables or for small query...
A stripped-down version of Hue to come with HBase.... how hard to do this?

Next we went over backburner items mention on previous day staring with
SQL-like access.
What about lightweight SQL support?
At Huawei... they have a project going for lightweight SQL support in hbase
based-on calcite.
For big queries, they'd go to sparksql.
Did you look at phoenix?
Phoenix is complicated, difficult. Calcite migration not done in Phoenix
(Sparksql is not calcite-based).
Talk to phoenix project about generating a lightweight artifact. We could
help with build. One nice idea was building with a cut-down grammar, one
that removed all the "big stuff" and problematics. Could return to the user
a nice "not supported" if they try to do a 10Bx10B join.
An interesting idea about a facade query analyzer making transfer to
sparksql if big query. Would need stats.

COPROCESSORS
Can we add some identifiers to distinguish whether request from CP or from
client. Can we calculate stats on CP resources used? Limit? Can we update
CPs more gracefully. If heavy usage, when update. Have to disable the
table. Short answer was no.
Move CPs to another process. A sidecar process is way to go.
The Huawei effort at lightweight would also use CPs (like Phoenix).
Bring the types into hbase, the phoenix types for spark to use etc.

SECONDARY INDICES
Full support is hard, can we do step-by-step...
Seperate into several steps?
Push back that this is a well covered space. Problems known. Contribs in
tier above welcome.

One attendee asked where does hbase want to go? Is it storage or a db
system? If the former, then should draw the line and sql, graph, geo, is in
layers above, not integrated. Need to draw a sharp line. Do what we are
good at.

END-TO-END-ASYNC
Lots of pieces in place now. Last bit is core. Alibaba working on this.
Put request into a Queue, another thread into memory, another thread to
HDFS. Another thread to get result and response to users.
What to do if blocked HDFS? How you stop process from spinning up too many
threads and having too many ongoing requests? Queues would be bounded.
One attendee suggested the async core be done with coroutine. Was asked,
which JDK has coroutine. Answer, the AJDK. Whats that? The Alibaba JDK
(they have their own JDK team). Laughter all around.

JAVA SUPPORT
We don't support JDK9... JDK10. We'll are bound by HDFS and Spark.
Is there any perf to be had in new JDKs. Answer, some, and yes. Offheaping
will be able to save a copy. Direct I/O. New API for BBs.

SPARK
Be able to scan hfiles directly. Work to transfer to parquet for spark to
query.
One attendee using the replication for streaming out to parquet. Then
having spark go against that. Talk of compacting into parquet then having
spark query parquet files and for the difference between now and last
compaction, go to hbase api.

1.
https://drive.google.com/file/d/0B4a3E58mCyOfSVJNQklEM0gyQ3VDYV9aMHlqTmdNNWgwQ3Bj/view?usp=sharing
2.
https://drive.google.com/file/d/0B4a3E58mCyOfMnF5QWpNTDRkc3M1anRRMEJVSjlBYVhsQm9F/view?usp=sharing

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message