Return-Path: X-Original-To: apmail-incubator-ctakes-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-ctakes-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6AC16E3C4 for ; Thu, 3 Jan 2013 21:26:04 +0000 (UTC) Received: (qmail 10168 invoked by uid 500); 3 Jan 2013 21:26:04 -0000 Delivered-To: apmail-incubator-ctakes-dev-archive@incubator.apache.org Received: (qmail 10110 invoked by uid 500); 3 Jan 2013 21:26:04 -0000 Mailing-List: contact ctakes-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: ctakes-dev@incubator.apache.org Delivered-To: mailing list ctakes-dev@incubator.apache.org Received: (qmail 9917 invoked by uid 99); 3 Jan 2013 21:26:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Jan 2013 21:26:04 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Pei.Chen@childrens.harvard.edu designates 134.174.20.73 as permitted sender) Received: from [134.174.20.73] (HELO mailsmtp3.childrenshospital.org) (134.174.20.73) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Jan 2013 21:25:57 +0000 Received: from pps.filterd (mailsmtp3 [127.0.0.1]) by mailsmtp3.childrenshospital.org (8.14.5/8.14.5) with SMTP id r03LMvH5007268; Thu, 3 Jan 2013 16:25:36 -0500 Received: from smtpndc1.chboston.org (smtpndc1.chboston.org [10.20.50.104]) by mailsmtp3.childrenshospital.org with ESMTP id 19ngg8raxn-1 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Thu, 03 Jan 2013 16:25:36 -0500 Received: from pps.filterd (smtpndc1 [127.0.0.1]) by smtpndc1.chboston.org (8.14.5/8.14.5) with SMTP id r03LPWtU030582; Thu, 3 Jan 2013 16:25:35 -0500 Received: from chexhubcasbdc1.chboston.org (chexhubcasbdc1.chboston.org [10.20.18.71]) by smtpndc1.chboston.org with ESMTP id 196f8baux8-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Thu, 03 Jan 2013 16:25:35 -0500 Received: from CHEXMBX1A.CHBOSTON.ORG ([fe80::3c05:8ca9:55a6:f320]) by CHEXHUBCASBDC1.CHBOSTON.ORG ([fe80::192f:54df:3040:8bb0%15]) with mapi id 14.02.0309.002; Thu, 3 Jan 2013 16:25:35 -0500 From: "Chen, Pei" To: "dev@uima.apache.org" CC: "ctakes-dev@incubator.apache.org" Subject: UIMA AS Binary vs XMI serialization Thread-Topic: UIMA AS Binary vs XMI serialization Thread-Index: Ac3p95HENQAhTieCTU+vqZd4N6kHlw== Date: Thu, 3 Jan 2013 21:25:34 +0000 Message-ID: <924DE05C19409B438EB81DE683A942D925F2C7@CHEXMBX1A.CHBOSTON.ORG> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.7.2.44] Content-Type: multipart/alternative; boundary="_000_924DE05C19409B438EB81DE683A942D925F2C7CHEXMBX1ACHBOSTON_" MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.9.8327,1.0.431,0.0.0000 definitions=2013-01-03_06:2013-01-03,2013-01-03,1970-01-01 signatures=0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.9.8327,1.0.431,0.0.0000 definitions=2013-01-03_06:2013-01-03,2013-01-03,1970-01-01 signatures=0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_924DE05C19409B438EB81DE683A942D925F2C7CHEXMBX1ACHBOSTON_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, I was just curious on others' experience with the binary serialization. My original issue was documents which contained invalid XML chars, so I dec= ided to try the binary serialization option within AS instead of replacing/= modifing the special chars in the original docs. As a side effect, I notic= ed that it's magnitudes of order faster; Just curious if there were any reasons why not make this the recommended/d= efault when sending CAS's around within AS. Are there any downsides to be = aware of (assuming that UIMA will have wrappers to abstract this from users= for all of their implementations.) Caused by: org.xml.sax.SAXParseException; Trying to serialize non-XML 1.0 c= haracter: , 0x0 at org.apache.uima.util.XMLSerializer$CharacterValidatingContentHan= dler.checkForInvalidXmlChars(XMLSerializer.java:254) --Pei --_000_924DE05C19409B438EB81DE683A942D925F2C7CHEXMBX1ACHBOSTON_--