Return-Path: Delivered-To: apmail-commons-user-archive@www.apache.org Received: (qmail 13165 invoked from network); 6 Aug 2008 23:07:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Aug 2008 23:07:14 -0000 Received: (qmail 1197 invoked by uid 500); 6 Aug 2008 23:07:09 -0000 Delivered-To: apmail-commons-user-archive@commons.apache.org Received: (qmail 1148 invoked by uid 500); 6 Aug 2008 23:07:09 -0000 Mailing-List: contact user-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Users List" Delivered-To: mailing list user@commons.apache.org Received: (qmail 1137 invoked by uid 99); 6 Aug 2008 23:07:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Aug 2008 16:07:09 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [80.12.242.145] (HELO smtp2b.orange.fr) (80.12.242.145) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Aug 2008 23:06:13 +0000 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2b13.orange.fr (SMTP Server) with ESMTP id 579C8A002DA3 for ; Thu, 7 Aug 2008 01:06:30 +0200 (CEST) Received: from lehrin (AToulouse-256-1-54-173.w86-205.abo.wanadoo.fr [86.205.133.173]) by mwinf2b13.orange.fr (SMTP Server) with ESMTP id 1C945A002DA1 for ; Thu, 7 Aug 2008 01:06:30 +0200 (CEST) X-ME-UUID: 20080806230630117.1C945A002DA1@mwinf2b13.orange.fr Received: from [127.0.0.1] (localhost [127.0.0.1]) by lehrin (Postfix) with ESMTP id 791254063 for ; Thu, 7 Aug 2008 01:06:29 +0200 (CEST) Message-ID: <489A2E75.6060107@free.fr> Date: Thu, 07 Aug 2008 01:06:29 +0200 From: Luc Maisonobe User-Agent: Thunderbird 2.0.0.16 (X11/20080724) MIME-Version: 1.0 To: Commons Users List Subject: Re: Apache.Commons.Math. How to compute Frequency distributions ? References: <18859308.post@talk.nabble.com> In-Reply-To: <18859308.post@talk.nabble.com> X-Enigmail-Version: 0.95.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Pierre8rou a �crit : > Hello, > > Apache.Commons.Math. > How to compute Frequency distributions ? > > I have a sample array of doubles. > It's just a sample, the real array it's a lot bigger. > > double[] array = { 10.1, 34.0, 15.0, 22.5, 24.2, 31.0, 32.0, 37.0 }; > > I need to know the frequency for 4 categories. > > Which is the frequency for the doubles < 10 ? > Which is the frequency for the doubles >= 10 and < 20? > Which is the frequency for the doubles >= 20 and < 30 ? > Which is the frequency for the doubles >= 30 ? > > Thanks, > > Pierre8r > You could use the org.apache.commons.math.stat.Frequency class, wrapping your elements in Double objects and calling getCumFreq(threshold). However, I guess there will be two problems: The first one is that getCumFreq(threshold) counts items lesser or equal to threshold, not strictly lesser. This can be considered using a little trick: using the opposite of the values and not the values themselves, like this: Frequency f = new Frequency(); for (int i = 0; i < array.length; ++i) { f.addValue(Double.valueOf(-array[i]); } long f30 = f.getCumFreq(Double.valueOf(-30.0)); long f20 = f.getCumFreq(Double.valueOf(-20.0)); long f10 = f.getCumFreq(Double.valueOf(-10.0)); long f00 = f.getSumFreq(); System.out.println("< 10: " + (f00 - f10)); System.out.println(">= 10 and < 20: " + (f10 - f20)); System.out.println(">= 20 and < 30: " + (f20 - f30)); System.out.println(">= 30: " + f30); (Beware, I didn't chacke this code, I just wrote it directly in this message, there are certainly errors in it, but you get the idea.) The second problem is that if your array is really big, this method will be really inefficient: you will end up with an additional copy of your array data stored in the Frequency instance wrapped in Double and wrapped in a tree structure. The management inside the Frequency class and inside the tree structure would also add some CPU overhead. If your categories are fixed, I wonder if simply counting yourself using a loop would not be easier. Luc --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@commons.apache.org For additional commands, e-mail: user-help@commons.apache.org