Return-Path: Delivered-To: apmail-uima-user-archive@www.apache.org Received: (qmail 5643 invoked from network); 23 Jul 2010 01:36:33 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 Jul 2010 01:36:33 -0000 Received: (qmail 52979 invoked by uid 500); 23 Jul 2010 01:36:33 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 52881 invoked by uid 500); 23 Jul 2010 01:36:32 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 52873 invoked by uid 99); 23 Jul 2010 01:36:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Jul 2010 01:36:32 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of eaepstein@gmail.com designates 74.125.82.49 as permitted sender) Received: from [74.125.82.49] (HELO mail-ww0-f49.google.com) (74.125.82.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Jul 2010 01:36:27 +0000 Received: by wwb31 with SMTP id 31so2187026wwb.30 for ; Thu, 22 Jul 2010 18:36:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=qT6g9zq/jfy6I4KEC1kb897Rc1gc31vfflNK/+PwA0s=; b=n+7Yj2Q/GmTn9f4QzP5SSzr7bjD4r7kctmtGYEnQjAydNtAgjjbjhARiD72nbGVN7z vNWjydYlnUSHhaaGCPTzFRChHer9iV1QEjvyzZjeqR5IB1TASgj3+5bKwO5i9qjMUC7O EjWMaPBl/AoP3LW1DEvw+dvYoco1twe5Bb680= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=wzwGFHJO7mfDIjGx8MDQhSm/vLLV5Okx1qgTwzWCg7EiMVoP8Z0t0EUB/MASbSxk4R uTzWACruIkdjIQHvXl+vRE0f7jElJ2BlNBvGYS/ioLotMuDRlvBZquk+4XwXr7LxtCHh TAu1bK48E5CnWfExPp1eZTe9QKsZfOniZSfSc= MIME-Version: 1.0 Received: by 10.227.153.3 with SMTP id i3mr2736299wbw.171.1279848965665; Thu, 22 Jul 2010 18:36:05 -0700 (PDT) Received: by 10.216.181.206 with HTTP; Thu, 22 Jul 2010 18:36:05 -0700 (PDT) In-Reply-To: <4C46057D.40506@cs.cmu.edu> References: <4C46057D.40506@cs.cmu.edu> Date: Thu, 22 Jul 2010 21:36:05 -0400 Message-ID: Subject: Re: Suggestion for CPE stats From: Eddie Epstein To: user@uima.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Eric, I'm not sure which, but one of the UIMA command line tools does report tota= l document size at the end of processing. However, some problems with this suggestion. The UIMA framework pretty much just moves CASes around without looking inside them. If it did look inside, which view would it look at? What about non text artifacts? My answer would be to make this an application design issue. Have a CAS consumer do the count and make it available at collection process complete. Eddie On Tue, Jul 20, 2010 at 4:22 PM, Eric Riebling wrote: > Although it's useful to know how many documents have been processed, > that figure is not nearly as useful as how many CHARACTERS have been > processed by a given CPE or set of components within a CPE. =A0Since, > if your documents are tiny, processing per document is much faster > than if they are huge. > So I think it would be a great thing to include Characters Processed > in the stats window of the Performance Report. > -- > Eric Riebling =A0GHC 6713, =A0LTI, =A0 SCS, =A0CMU > 412.268.9872 =A0 http://www.cs.cmu.edu/~er1k >