mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Click <ccli...@gmail.com>
Subject Re: Straw poll re: H2O ?
Date Fri, 02 May 2014 21:57:16 GMT
"detailed description of h20's programming and execution model."

No *formal* documentation for this exists; been no time to write such a 
thing.
There's easy-to-find slide-share & video talks.  Here are two:
  - http://www.infoq.com/presentations/api-memory-analytics
  - http://www.infoq.com/interviews/click-0xdata

Summary:
- A high-performance in-memory K/V store (cache-hits are 150 nano's, 
misses depend on network transfer times).  Supports full JMM exact 
semantics & transactions.  Used to hold the Big Data & to control 
computations
- Big Data support via Frames/Vecs/Chunks - see the above slides for 
graphical overview; compression "is a implementation feature" but not 
visible in the execution model except as speed or size constraints.
- A well-tuned data-ingestion system
- Map/Reduce coding style, uses Java 1.7's Fork/Join on a single-node, 
but distributed across nodes.  Maps are fine-grained F/J tasks and can 
produce both a Big output (distributed parallel writing to Frames/Vecs) 
and a Small output (anything in a POJO). Reductions are also 
fine-grained, and happen anytime 2 maps are done... so separate 
"reduction" phase.  Not the hadoop M/R - no sort or shuffle steps, 
everything in DRAM.
- REST/JSON access to most algo's & coding.  Web browser/html over that.
- Internal DSL - A work-in-progress.  Right now converts a subset of the 
R language to AST's, then executes the AST's.  Covers a fairly large 
subset of the bulk/array operators in R, and expressions built thereof.  
Includes 1st-class functions and e.g. GroupBy (ddply in R lingo).  
Expressions like "|apply(someFrame,2,function(x){ 
ifelse(is.na(x),mean(x),x)})|" will replace NA's in "someFrame" with the 
mean of the column.  It's R syntax (or very close to R), not Scala.

Cliff



On 5/1/2014 10:13 AM, Dmitriy Lyubimov wrote:
>
>> I'd be happy to see a concept of how to bring the operations of the DSL
>> onto h20, as well as a detailed description of h20's programming and
>> execution model.
> +1.
>
>>
>> --sebastian
>>


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message