Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "Hive/StatisticsAndDataMining" page has been changed by MayankLahiri.
The comment on this change is: finished datamining and statistics wiki page.
http://wiki.apache.org/hadoop/Hive/StatisticsAndDataMining?action=diff&rev1=1&rev2=2

= Statistics and Data Mining in Hive =
 This page is a central repository for the slightly more advanced statistical and data mining
functions that are being integrated into Hive, and especially the functions that warrant more
than oneline descriptions.
+ This page is the secondary documentation for the slightly more advanced statistical and
data mining functions that are being integrated into Hive, and especially the functions that
warrant more than oneline descriptions.
<<TableOfContents(3)>>
@@ 74, +74 @@
== histogram_numeric(): Estimating frequency distributions ==
+ Histograms represent frequency distributions from empirical data. The kind that is referred
to here are histograms with variablesized bins. Specifically, this UDAF will return a list
of (x,y) pairs that represent histogram bin centers and heights. It's up to you to then plot
them in Excel / Gnuplot / Matlab / Mathematica to get a nice graphical display.
+
+ === Use Cases ===
+
+ 1. Estimating the frequency distribution of a column, possibly grouped by other attributes.
+ 2. Choosing discretization points in a continuous valued column.
+
+ === Usage ===
+
+ {{{
+ SELECT histogram_numeric(age) FROM users GROUP BY gender;
+ }}}
+
+ The command above is selfexplanatory. Converting the output into a graphical display is
a bit more involved. The following [[http://www.gnuplot.info/Gnuplot]] command should do
it, assuming that you've parsed the output from `histogram()` into a text file of (x,y) pairs
called `data.txt`.
+
+ {{{
+ plot 'data.txt' u 1:2 w impulses lw 5
+ }}}
+
+ === Example ===
+
+ {{{
+ SELECT explode(histogram_numeric(val, 10)) AS x FROM normal;
+ {"x":3.6505464999999995,"y":20.0}
+ {"x":2.7514727901960785,"y":510.0}
+ {"x":1.7956678951954481,"y":8263.0}
+ {"x":0.9878507685761995,"y":19167.0}
+ {"x":0.2625338380837097,"y":31737.0}
+ {"x":0.5057392319427763,"y":31502.0}
+ {"x":1.2774146480311135,"y":14526.0}
+ {"x":2.083955560712489,"y":3986.0}
+ {"x":2.9209550254545484,"y":275.0}
+ {"x":3.674835214285715,"y":14.0}
+ }}}
+
