hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Re: Notes from Nicolas + Amir from Facebook @ Cloudera
Date Tue, 27 Mar 2012 19:12:50 GMT
sorry guys I sent notes out to the wrong list. 

Sent from my iPhone

On Mar 27, 2012, at 11:54, Ted Yu <yuzhihong@gmail.com> wrote:

> Can someone explain the notes below ?
> 
> bq. - hbase deman was with hadoop
> 
> bq. - scan kind of halfass, for hive.
> 
> bq. - finace sector
> 
> Thanks
> 
> On Tue, Mar 27, 2012 at 11:49 AM, Jonathan Hsieh <jon@cloudera.com> wrote:
> 
>> People:
>> Cloudera: Todd, Dave W, Shaneel M, Jonathan H, Himanshu, Greg C, Matteo B
>> (remote)
>> FB: Nicolas, Amir
>> 
>> druba - ubase/hstore - transactin processing, through hive-hbase
>> integration.
>> 
>> hbase team with hdfs team.
>> - hbase deman was with hadoop
>> 
>> NY - carve out hunk of HBase to work on.
>> 
>> Long term:
>> real time hive, deep integration.
>> - beyond just translate to MR job.
>> - Use in megastore.
>> - scan kind of halfass, for hive.
>> - previously point query optimization.
>> - analystics too long to scan table.
>> - doing on demand compression.
>> 
>> Edgecases
>> - finace sector
>> - gpu cases.
>> 
>> Uptime and availaiblity.
>> - chaos monkey
>> - poll all regions
>> 
>> Hbase 0.89 - fast region failover.
>> - down time down to..
>> 
>> Take down rack - test cases
>> 
>> putting data node selection in master.
>> - on per region basis, hash chain - so assigned secondary and tertiary.
>> 
>> What is Cloudera focus?
>> 
>> HDFS HA story
>> - Talking to HW -- bookies in HDFS ("public story, but ...")
>> - logs in hdfs.
>> - Standby node.
>> - zk flag - halfass solution. "double fails" not in scope.
>> - todd: 3 journal daemons, quorom for edits, pluggable journal manager
>> interface.
>> 
>> Facebook - new data infrastructure
>> - focus on quality, reliability, visibility.
>> - upping rolling restart to improve monitoring
>> 
>> HBase - stable depends on use case
>> - pushing out use cases
>> - ODS, (soon)
>> - Puma analytics
>> - ubase - researchy
>> - site integrity
>> - hash out cluster (generic kv store, persistent memcache ), multi-tenant
>> cluster, "photo stuff" (haystack)
>> - wormhole - backup replication - on hashout cluster, master slave, cross
>> DC replication.
>> 
>> Replication  - talk to Madu
>> 
>> HDFS hard links - on github.
>> - at data node layer.
>> - hari m - HW - hard links also. (claims working prototype)
>> 
>> Kannan -
>> 
>> pubsub,
>> 2ndary index.
>> native c++ thrift client.
>> open sourcing folly (c++ stl)
>> 
>> - distrbute log splitting task manager
>> - ordering for bulk master operations, eliminate class of problems.
>> 
>> Online schema changes
>> - high friction to change
>> - check column descriptor, then table, then configuraiton.
>> - tune new features for column family.
>> 
>> FB doesn't care about access control.
>> - auditing - multi tenancy case.
>> - specific app servers that will access - perms
>> - FB will do security at a higher level
>> 
>> 
>> 
>> --
>> // Jonathan Hsieh (shay)
>> // Software Engineer, Cloudera
>> // jon@cloudera.com
>> 

Mime
View raw message