Return-Path: X-Original-To: apmail-ctakes-commits-archive@www.apache.org Delivered-To: apmail-ctakes-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 91BECC7C6 for ; Fri, 19 Jul 2013 16:35:38 +0000 (UTC) Received: (qmail 45510 invoked by uid 500); 19 Jul 2013 16:35:38 -0000 Delivered-To: apmail-ctakes-commits-archive@ctakes.apache.org Received: (qmail 45449 invoked by uid 500); 19 Jul 2013 16:35:35 -0000 Mailing-List: contact commits-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ctakes.apache.org Delivered-To: mailing list commits@ctakes.apache.org Received: (qmail 45427 invoked by uid 99); 19 Jul 2013 16:35:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Jul 2013 16:35:33 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Jul 2013 16:35:30 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id BAFA3238889B; Fri, 19 Jul 2013 16:35:10 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1504935 - /ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/feature/selection/Chi2FeatureSelection.java Date: Fri, 19 Jul 2013 16:35:10 -0000 To: commits@ctakes.apache.org From: clin@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20130719163510.BAFA3238889B@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: clin Date: Fri Jul 19 16:35:10 2013 New Revision: 1504935 URL: http://svn.apache.org/r1504935 Log: Make a Yate's parameter for Chi2 feature selection. So people may turn on or off the Yate's correction for Chi2 value calculation. If Yate's correction is on (boolean Yates = true), small difference between observed value and expected value (<0.5) will be ignored. More features will be trimmed. Else, small difference will be kept. Users have the freedom to keep all features. Modified: ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/feature/selection/Chi2FeatureSelection.java Modified: ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/feature/selection/Chi2FeatureSelection.java URL: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/feature/selection/Chi2FeatureSelection.java?rev=1504935&r1=1504934&r2=1504935&view=diff ============================================================================== --- ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/feature/selection/Chi2FeatureSelection.java (original) +++ ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/feature/selection/Chi2FeatureSelection.java Fri Jul 19 16:35:10 2013 @@ -38,9 +38,12 @@ public class Chi2FeatureSelection featValueClassCount; - public Chi2Scorer() { + private boolean yates = false; + + public Chi2Scorer(boolean yate) { this.classCounts = HashMultiset. create(); this.featValueClassCount = HashBasedTable. create(); + this.yates = yate; } public void update(String featureName, OUTCOME_T outcome, int occurrences) { @@ -88,13 +91,12 @@ public class Chi2FeatureSelection 0) { double diff = Math.abs(posiOutcomeCounts[lbl] - expected); - if (yates) { // apply Yate's correction + if (this.yates ) { // apply Yate's correction diff -= 0.5; } if (diff > 0) @@ -106,7 +108,7 @@ public class Chi2FeatureSelection 0) { double diff = Math.abs(observ - expected); - if (yates) { // apply Yate's correction + if (this.yates) { // apply Yate's correction diff -= 0.5; } if (diff > 0) @@ -121,6 +123,8 @@ public class Chi2FeatureSelection chi2Function; + + private boolean yates = false; public Chi2FeatureSelection(String name) { this(name, 0.0); @@ -131,6 +135,17 @@ public class Chi2FeatureSelection> instances) { // aggregate statistics for all features - this.chi2Function = new Chi2Scorer(); + this.chi2Function = new Chi2Scorer(this.yates); for (Instance instance : instances) { OUTCOME_T outcome = instance.getOutcome(); for (Feature feature : instance.getFeatures()) {