accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mikewalch <...@git.apache.org>
Subject [GitHub] accumulo pull request #293: ACCUMULO-4669 Use windowed statistics in RFile
Date Fri, 18 Aug 2017 19:57:29 GMT
Github user mikewalch commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/293#discussion_r134035312
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/file/rfile/RollingStats.java ---
    @@ -0,0 +1,114 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license
    + * agreements. See the NOTICE file distributed with this work for additional information
regarding
    + * copyright ownership. The ASF licenses this file to You under the Apache License, Version
2.0 (the
    + * "License"); you may not use this file except in compliance with the License. You may
obtain a
    + * copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software distributed under
the License
    + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express
    + * or implied. See the License for the specific language governing permissions and limitations
under
    + * the License.
    + */
    +package org.apache.accumulo.core.file.rfile;
    +
    +import org.apache.commons.math3.stat.StatUtils;
    +import org.apache.commons.math3.util.FastMath;
    +
    +/**
    + * This class supports efficient window statistics. Apache commons math3 has a class
called DescriptiveStatistics that supports windows. DescriptiveStatistics
    + * recomputes the statistics over the entire window each time its requested. In a test
over 1,000,000 entries with a window size of 1019 that requested stats
    + * for each entry this class took ~50ms and DescriptiveStatistics took ~6,000ms.
    + *
    + * <p>
    + * This class may not be as accurate as DescriptiveStatistics. In unit test its within
1/1000 of DescriptiveStatistics.
    + */
    +class RollingStats {
    +  private int position;
    +  private double window[];
    +
    +  private double average;
    +  private double variance;
    +  private double stddev;
    +
    +  // indicates if the window is full
    +  private boolean windowFull;
    +
    +  private int recomputeCounter = 0;
    +
    +  RollingStats(int windowSize) {
    +    this.windowFull = false;
    +    this.position = 0;
    +    this.window = new double[windowSize];
    +  }
    +
    +  /**
    +   * @see <a href= "http://jonisalonen.com/2014/efficient-and-accurate-rolling-standard-deviation/">Efficient
and accurate rolling standard deviation</a>
    +   */
    +  private void update(double n, double o, int w) {
    --- End diff --
    
    I guess `n` & `o` is for new & old.  Could instead use `newValue` & `oldValue`
to make things clear.  What is `w`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message