incubator-drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From peter he <rmj...@gmail.com>
Subject Re: Drill synthetic log generator
Date Sat, 13 Jul 2013 00:12:28 GMT
Got it, I appreciate the information!


On Fri, Jul 12, 2013 at 5:04 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Peter,
>
> There is no command line parameter.
>
> In LogGenerator, this line controls how users are invented:
>
>   private LongTail<User> userGenerator = new LongTail<User>(50000, 0) {
>         @Override
>         protected User createThing() {
>             return new User(ipGenerator.sample(), geo, terms);
>         }
>     };
>
> The two parameters here (50000 and 0) control how the number of users
> grows.  The first number is called alpha and the second is the discount.
>  When discount == 0 as in this code, the users are generated using a
> Dirichlet process and the number of unique users grows at roughly alpha
> log(n).  If discount > 0, then the percentage of users with a single
> transaction is asymptotically equal to the discount.  The user population
> grows roughly with alpha n^discount.
>
> There will be a real problem if the number of users increases, however,
> because each user requires a lot of memory.  This happens because the
> language model for each user is cloned from a common base instead of
> sharing this common base.  I have been looking into using a better kind of
> hash table to allow sharing of mutable tables (using an HAMT, actually),
> but this definitely isn't ready.  Once (if ever) it is ready, we should see
> at least one and possibly 3 orders of magnitude decrease in the memory cost
> of each user after the first few.
>
> This all means that the simplest and safest thing to do is increase the
> value of alpha from 50,000 and watch your memory usage.
>
>
> On Fri, Jul 12, 2013 at 4:42 PM, peter he <rmjlxj@gmail.com> wrote:
>
> > ...
> >
> > One quick followup question, is there anyway to change the number of
> users
> > generated using a parameter?
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message