bigtop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <...@apache.org>
Subject Re: Proposal for "BigTop Data Generators"
Date Wed, 26 Aug 2015 01:41:11 GMT
It is pretty cool indeed!

I wonder how it needs to be structured to be:
 - easy to access/use from other components wherever it is needed
 - doesn't interfere with the rest of the stack

I guess one possible way would be to implement the generator as a set of maven
artifacts, that could be installed/consumed transparently by just declaring a
dependency e.g as proposed via top-level component.

Another way is to have a new package like we do for bigtop-utils and such.

Perhaps this discussion should be moved to JIRA or shall we continue on the
dev@ ??

Cos

On Sun, Aug 23, 2015 at 11:53AM, RJ Nowling wrote:
> Hi BigTop,
> 
> I had a discussion with Jay yesterday, we'd like to propose a new component
> for BigTop: BigTop Data Generators.
> 
> BigTop Data Generators would consist of a common set of libraries for
> building data generators and three example data generators:
> 
>     * BigPetStore transaction generator (moved from BigPetStore)
>     * BigTop Bazaar -- attendee movement and interactions with booths on a
> showroom floor, at a conference, or at a mall
>     * BigTop Weatherman -- stochastic weather simulation (temperature, wind
> speed, wind chill, rainfall, etc.) per zip code.  (From a model trained on
> NOAA historical weather data)
> 
> We believe that creating a common set of libraries will have several
> benefits including:
> 
>      * Easier for others to build their own data generators
>      * Make data generators smaller and easier to maintain
>      * Share improvements across the data generators
> 
> More details on the libraries are below.
> 
> BigPetStore will be continue to focus on building  and maintaining
> blueprints, powered by the BigTop Data Generators.
> 
> Our vision is that we get all of Apache coming to BigTop for tools for
> building better, more comprehensive blueprints.  We want to support these
> efforts through data generators and the initial set of blueprint we've been
> building.
> 
> If the community is generally in support of this, I can create a top-level
> "bigtop-data-generators" directory and put the data generators and
> libraries in there.
> 
> Thanks!
> 
> RJ
> 
> 
> -------
> Library details:
> 
> So far, I've extracted the following common libraries:
> 
>      * Samplers -- provides classes for PDFs and various samplers
>      * Name generator -- data set and samplers for generating names
>      * Location data set -- data set and classes for US zip codes, their
> GPS coordinates, median house hold incomes, and population sizes
>      * Product generator -- library for enumerating products from a
> specification file.  Comes with default specifications for BigPetStore
> 
> I also expect that I'll add libraries for:
> 
>       * Particle simulation -- customer movement in a room
>       * Latent factor model generation -- generate latent factors and
> customer weights to create something like MovieLens data.  Used in Bazaar
> for booth preferences and potentially in BigPetStore for customer item
> preferences
> 
> Most of these libraries came out of the BigPetStore data generator but the
> other generators have been refactored to be based off the standard set of
> libraries.

Mime
View raw message