hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Notes from Nicolas + Amir from Facebook @ Cloudera
Date Tue, 27 Mar 2012 18:49:36 GMT
People:
Cloudera: Todd, Dave W, Shaneel M, Jonathan H, Himanshu, Greg C, Matteo B
(remote)
FB: Nicolas, Amir

druba - ubase/hstore - transactin processing, through hive-hbase
integration.

hbase team with hdfs team.
- hbase deman was with hadoop

NY - carve out hunk of HBase to work on.

Long term:
real time hive, deep integration.
- beyond just translate to MR job.
- Use in megastore.
- scan kind of halfass, for hive.
- previously point query optimization.
- analystics too long to scan table.
- doing on demand compression.

Edgecases
- finace sector
- gpu cases.

Uptime and availaiblity.
- chaos monkey
- poll all regions

Hbase 0.89 - fast region failover.
- down time down to..

Take down rack - test cases

putting data node selection in master.
- on per region basis, hash chain - so assigned secondary and tertiary.

What is Cloudera focus?

HDFS HA story
- Talking to HW -- bookies in HDFS ("public story, but ...")
- logs in hdfs.
- Standby node.
- zk flag - halfass solution. "double fails" not in scope.
- todd: 3 journal daemons, quorom for edits, pluggable journal manager
interface.

Facebook - new data infrastructure
- focus on quality, reliability, visibility.
- upping rolling restart to improve monitoring

HBase - stable depends on use case
- pushing out use cases
- ODS, (soon)
- Puma analytics
- ubase - researchy
- site integrity
- hash out cluster (generic kv store, persistent memcache ), multi-tenant
cluster, "photo stuff" (haystack)
- wormhole - backup replication - on hashout cluster, master slave, cross
DC replication.

Replication  - talk to Madu

HDFS hard links - on github.
- at data node layer.
- hari m - HW - hard links also. (claims working prototype)

Kannan -

pubsub,
2ndary index.
native c++ thrift client.
open sourcing folly (c++ stl)

- distrbute log splitting task manager
- ordering for bulk master operations, eliminate class of problems.

Online schema changes
- high friction to change
- check column descriptor, then table, then configuraiton.
- tune new features for column family.

FB doesn't care about access control.
- auditing - multi tenancy case.
- specific app servers that will access - perms
- FB will do security at a higher level



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message