For #2, I think MemstoreSizeCostFunction belongs to the same category if we
are to adopt moving average.
Some factors to consider:
The data structure used by StochasticLoadBalancer should be concise. The
number of regions in a cluster can be expected to approach 1 million. We
cannot afford to store long history of read / write requests in master.
Efficiency of cost calculation should be high  there're many cost
functions the balancer goes through, it is expected for each cost function
to return quickly. Otherwise we would not come up with proper region
movement plan(s) in time.
Cheers
On Wed, Jan 11, 2017 at 5:51 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> For #2, I think it makes sense to try out using request rates for cost
> calculation.
>
> If the experiment result turns out to be better, we can consider using
> such measure.
>
> Thanks
>
> On Wed, Jan 11, 2017 at 5:34 PM, Timothy Brown <tim@siftscience.com>
> wrote:
>
>> Hi,
>>
>> I have a couple of questions about the StochasticLoadBalancer.
>>
>> 1) In CostFromRegionLoadFunction.getRegionLoadCost the cost is weights
>> later samples of the RegionLoad more than previous ones. For example, with
>> a queue size of 4 it would be (.5 * load1 + .25*load2 + .125*load3 +
>> .125*load4). Is this the intended behavior?
>>
>> 2) Would it make more sense to calculate the ReadRequestCost and
>> WriteRequestCost as rates? Right now it looks like the cost is just based
>> off the total number of read/write requests a region has gotten over its
>> lifetime.
>>
>> Tim
>>
>
>
