bigtop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From RJ Nowling <rnowl...@gmail.com>
Subject Proposal for "BigTop Data Generators"
Date Sun, 23 Aug 2015 16:53:58 GMT
Hi BigTop,

I had a discussion with Jay yesterday, we'd like to propose a new component
for BigTop: BigTop Data Generators.

BigTop Data Generators would consist of a common set of libraries for
building data generators and three example data generators:

    * BigPetStore transaction generator (moved from BigPetStore)
    * BigTop Bazaar -- attendee movement and interactions with booths on a
showroom floor, at a conference, or at a mall
    * BigTop Weatherman -- stochastic weather simulation (temperature, wind
speed, wind chill, rainfall, etc.) per zip code.  (From a model trained on
NOAA historical weather data)

We believe that creating a common set of libraries will have several
benefits including:

     * Easier for others to build their own data generators
     * Make data generators smaller and easier to maintain
     * Share improvements across the data generators

More details on the libraries are below.

BigPetStore will be continue to focus on building  and maintaining
blueprints, powered by the BigTop Data Generators.

Our vision is that we get all of Apache coming to BigTop for tools for
building better, more comprehensive blueprints.  We want to support these
efforts through data generators and the initial set of blueprint we've been
building.

If the community is generally in support of this, I can create a top-level
"bigtop-data-generators" directory and put the data generators and
libraries in there.

Thanks!

RJ


-------
Library details:

So far, I've extracted the following common libraries:

     * Samplers -- provides classes for PDFs and various samplers
     * Name generator -- data set and samplers for generating names
     * Location data set -- data set and classes for US zip codes, their
GPS coordinates, median house hold incomes, and population sizes
     * Product generator -- library for enumerating products from a
specification file.  Comes with default specifications for BigPetStore

I also expect that I'll add libraries for:

      * Particle simulation -- customer movement in a room
      * Latent factor model generation -- generate latent factors and
customer weights to create something like MovieLens data.  Used in Bazaar
for booth preferences and potentially in BigPetStore for customer item
preferences

Most of these libraries came out of the BigPetStore data generator but the
other generators have been refactored to be based off the standard set of
libraries.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message