incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "DataSketchesProposal" by Lee Rhodes
Date Wed, 06 Mar 2019 19:09:55 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "DataSketchesProposal" page has been changed by Lee Rhodes:
https://wiki.apache.org/incubator/DataSketchesProposal?action=diff&rev1=17&rev2=18

Comment:
Trying to get footnote to work

  ## page was renamed from DataSketchesPorposal
- = Apache DataSketches Proposal = <<FootNote(In 2017 Verizon acquired Yahoo and merged
it with previously acquired AOL. The merged entity was originally called Oath, Inc., but has
recently been renamed Verizon Media, Inc., a wholly-owned subsidiary of Verizon, Inc.  Since
Yahoo is the more recognized name, references in this document to Yahoo, are also a reference
to Verizon Media, Inc.)>>
+ = Apache DataSketches Proposal =
  
  == Abstract ==
  DataSketches is an open source, high-performance library of stochastic streaming algorithms
commonly called "sketches" in the data sciences. Sketches are small, stateful programs that
process massive data as a stream and can provide approximate answers, with mathematical guarantees,
to computationally difficult queries orders-of-magnitude faster than traditional, exact methods.
@@ -21, +21 @@

  The DataSketches library also includes several experimental repositories for use-cases outside
the large-scale systems environments, such as sketches for mobile, IoT devices (Android),
command-line access of the sketch library, and an experimental repository for vector-based
sketches that performs approximate Singular Value Decomposition (SVD) analysis that could
potentially be used in Machine Learning (ML) applications. 
  
  == Background ==
- The DataSketches library was started in 2012 as internal Yahoo project to dramatically reduce
time and resources required for distinct (unique) counting.  An extensive search on the Internet
at the time yielded a number of theoretical papers on stochastic streaming algorithms with
pseudocode examples, but we did not find any usable open-source code of the quality we felt
we needed for our internal production systems.  So we started a small project (one person)
to develop our own sketches working directly from published theoretical papers. 
+ The DataSketches library was started in 2012 as internal Yahoo<<FootNote(In 2017 Verizon
acquired Yahoo and merged it with previously acquired AOL. The merged entity was originally
called Oath, Inc., but has recently been renamed Verizon Media, Inc., a wholly-owned subsidiary
of Verizon, Inc.  Since Yahoo is the more recognized name, references in this document to
Yahoo, are also a reference to Verizon Media, Inc.)>> project to dramatically reduce
time and resources required for distinct (unique) counting.  An extensive search on the Internet
at the time yielded a number of theoretical papers on stochastic streaming algorithms with
pseudocode examples, but we did not find any usable open-source code of the quality we felt
we needed for our internal production systems.  So we started a small project (one person)
to develop our own sketches working directly from published theoretical papers. 
  
  The DataSketches library was designed from the start with the objective of making these
algorithms, usually only described in theoretical papers, easily accessible to systems developers
for use in our internal production systems. By necessity, the code had to be of the highest
quality and thoroughly tested. The wide variety of our internal production systems drove the
requirement that the sketch implementations had to have an absolute minimum of external, run-time
dependencies in order to simplify integration and troubleshooting.
  

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message