incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Victor Iacoban <victor.iaco...@gmail.com>
Subject crackle status
Date Sun, 02 Dec 2012 13:33:56 GMT
Hi,

I've change the way crackle users will be defining remote functions, this
allowed me to for nicer and more fluent dsl
I still did not get rid of crunch types influence in crackle dsl but at
least now it's harder to get it wrong and easier to debug

I think this is pretty close to what I'm trying to achieve. Next steps are
going to be in the direction of documentation, more validation, error
condition checks, tests and crackle hbase.

here it is, any comments are welcome:

(ns crackle.example
  (:require [crackle.from :as from])
  (:require [crackle.to :as to])
  (:use crackle.core))
;====== word count example ===============(fn-mapcat split-words [line] :strings
  (clojure.string/split line #"\s+"))
(defn count-words [input-path output-path]
  (pipeline (from/text-file input-path)
    (split-words)
    (count-values)
    (to/text-file output-path)))
;====== average bytes by ip example ======(fn-map parse-line [line]
[:strings :clojure]
  (let [parts (clojure.string/split line #"\s+")]
    (pair-of (first parts) [(read-string (second parts)) 1])))
(fn-combine sum-bytes-and-counts [value1 value2]
  [(+ (first value1) (first value2)) (+ (second value1) (second value2))])
(fn-mapv compute-average [value] :ints
  (int (apply / value)))
(defn count-bytes-by-ip [input-path output-path]
  (pipeline (from/text-file input-path)
    (parse-line)
    (group-by-key)
    (sum-bytes-and-counts)
    (compute-average)
    (to/text-file output-path)))

-- victor

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message