hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Notes from Nicolas + Amir from Facebook @ Cloudera
Date Tue, 27 Mar 2012 18:54:27 GMT
Can someone explain the notes below ?

bq. - hbase deman was with hadoop

bq. - scan kind of halfass, for hive.

bq. - finace sector

Thanks

On Tue, Mar 27, 2012 at 11:49 AM, Jonathan Hsieh <jon@cloudera.com> wrote:

> People:
> Cloudera: Todd, Dave W, Shaneel M, Jonathan H, Himanshu, Greg C, Matteo B
> (remote)
> FB: Nicolas, Amir
>
> druba - ubase/hstore - transactin processing, through hive-hbase
> integration.
>
> hbase team with hdfs team.
> - hbase deman was with hadoop
>
> NY - carve out hunk of HBase to work on.
>
> Long term:
> real time hive, deep integration.
> - beyond just translate to MR job.
> - Use in megastore.
> - scan kind of halfass, for hive.
> - previously point query optimization.
> - analystics too long to scan table.
> - doing on demand compression.
>
> Edgecases
> - finace sector
> - gpu cases.
>
> Uptime and availaiblity.
> - chaos monkey
> - poll all regions
>
> Hbase 0.89 - fast region failover.
> - down time down to..
>
> Take down rack - test cases
>
> putting data node selection in master.
> - on per region basis, hash chain - so assigned secondary and tertiary.
>
> What is Cloudera focus?
>
> HDFS HA story
> - Talking to HW -- bookies in HDFS ("public story, but ...")
> - logs in hdfs.
> - Standby node.
> - zk flag - halfass solution. "double fails" not in scope.
> - todd: 3 journal daemons, quorom for edits, pluggable journal manager
> interface.
>
> Facebook - new data infrastructure
> - focus on quality, reliability, visibility.
> - upping rolling restart to improve monitoring
>
> HBase - stable depends on use case
> - pushing out use cases
> - ODS, (soon)
> - Puma analytics
> - ubase - researchy
> - site integrity
> - hash out cluster (generic kv store, persistent memcache ), multi-tenant
> cluster, "photo stuff" (haystack)
> - wormhole - backup replication - on hashout cluster, master slave, cross
> DC replication.
>
> Replication  - talk to Madu
>
> HDFS hard links - on github.
> - at data node layer.
> - hari m - HW - hard links also. (claims working prototype)
>
> Kannan -
>
> pubsub,
> 2ndary index.
> native c++ thrift client.
> open sourcing folly (c++ stl)
>
> - distrbute log splitting task manager
> - ordering for bulk master operations, eliminate class of problems.
>
> Online schema changes
> - high friction to change
> - check column descriptor, then table, then configuraiton.
> - tune new features for column family.
>
> FB doesn't care about access control.
> - auditing - multi tenancy case.
> - specific app servers that will access - perms
> - FB will do security at a higher level
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message