Return-Path: Delivered-To: apmail-commons-user-archive@www.apache.org Received: (qmail 1295 invoked from network); 15 Apr 2009 22:24:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Apr 2009 22:24:43 -0000 Received: (qmail 27552 invoked by uid 500); 15 Apr 2009 22:24:41 -0000 Delivered-To: apmail-commons-user-archive@commons.apache.org Received: (qmail 27464 invoked by uid 500); 15 Apr 2009 22:24:41 -0000 Mailing-List: contact user-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Users List" Delivered-To: mailing list user@commons.apache.org Received: (qmail 27454 invoked by uid 99); 15 Apr 2009 22:24:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Apr 2009 22:24:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [134.96.191.147] (HELO smtp.dfki.de) (134.96.191.147) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Apr 2009 22:24:31 +0000 Received: from smtp.dfki.de (localhost [127.0.0.1]) by imss.7 (Postfix) with ESMTP id D01DA313E0 for ; Thu, 16 Apr 2009 00:24:10 +0200 (CEST) Received: from mail.dfki.de (lnv-104.sb.dfki.de [134.96.191.146]) by smtp.dfki.de (Postfix) with ESMTP id B7EC4312B9 for ; Thu, 16 Apr 2009 00:24:10 +0200 (CEST) Received: from [192.168.178.47] (BAH5faf.bah.pppool.de [77.135.95.175]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.dfki.de (Postfix) with ESMTPSA id 735FC3115F for ; Thu, 16 Apr 2009 00:24:10 +0200 (CEST) Message-Id: From: Paul Libbrecht To: "Commons Users List" In-Reply-To: <524750.77523.qm@web50307.mail.re2.yahoo.com> Content-Type: multipart/signed; boundary=Apple-Mail-140--465955309; micalg=sha1; protocol="application/pkcs7-signature" Mime-Version: 1.0 (Apple Message framework v930.3) Subject: Re: [Digester] HTML entity decoding? Date: Thu, 16 Apr 2009 00:24:05 +0200 References: <524750.77523.qm@web50307.mail.re2.yahoo.com> X-Mailer: Apple Mail (2.930.3) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-140--465955309 Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Hello Otis, For the second form you'll need to hook a DTD to do so. A DTD =20 declaration in your header pointing to a DTD which defines these =20 entities I am no expert in Digester but I believe that it is the only =20= way to do so. At least according to the XML specs. Here's a text pointing to such a DTD: = http://www.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_xhtml_cha= racter_entities Note that opening the file with a validating parser will certainly =20 grumble about all sorts of undeclared elements, this is ok, it does =20 not prevent parsing but is, indeed, a validation error. However you get the entity-expansion. Note that using the first form, which contains an *escaped* entity, =20 there's nothing to do! You'd have to match them manually ("re-=20 entrantly") into a parser that parses entities properly. paul PS: I would feel lucky not to have been blown away the XML parsing in =20= the second case as a normal XML parser does: missing entity =20 declaration means unparseable XML while missing element declaration =20 means much less a dangerous thing. Le 16-avr.-09 =E0 00:06, Otis Gospodnetic a =E9crit : > > Hello, > > I'm using Digester 2.0 and trying to process XML that > may include HTML entities and trying to get Digester to decode them > when parsing. > > For example, my XML contains: > > > Currently, Digester is parses this as: Grüber > > But what I am really after is "Gr=FCber", so I am looking for a way to = =20 > get this ü entity decoded by Digester. > How do I tell Digester to decode HTML entities? > > Also, if I don't use CDATA, like this: > Grüber > > Digester gives me: Grber --Apple-Mail-140--465955309 Content-Disposition: attachment; filename=smime.p7s Content-Type: application/pkcs7-signature; name=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFzjCCAocw ggHwoAMCAQICEFwkG9LgbGf8Fuu/5kCXRScwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UEBhMCWkEx JTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQ ZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA4MDYwMzE1MTgyOVoXDTA5MDYwMzE1MTgy OVowcDEaMBgGA1UEBBMRTGliYnJlY2h0IEdvdXJkZXQxDTALBgNVBCoTBFBhdWwxHzAdBgNVBAMT FlBhdWwgTGliYnJlY2h0IEdvdXJkZXQxIjAgBgkqhkiG9w0BCQEWE3BhdWxAYWN0aXZlbWF0aC5v cmcwgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAMAHqTEgekG5Iyctn3tSRSfsPoZL+vkFavzl w/xbbiQ4vWuoCZFVNHvLcFO2rfWoFLaRbcueLsmagsJwA1uxAj4tj/neoawadD+jZ3a98BkqBhIu 4hsC1poLDyvcbUOdKga9DXQ/pk/CorFvR7OOC1INrBtbDi7ycUYaUL7ZkpBFAgMBAAGjMDAuMB4G A1UdEQQXMBWBE3BhdWxAYWN0aXZlbWF0aC5vcmcwDAYDVR0TAQH/BAIwADANBgkqhkiG9w0BAQUF AAOBgQAt7uUnq0yd+jIo9/zYyj+NPyCh3v8CX8MW2qalj37t1YUR82o5/1O5z0sXj7+hQmSEWH7N BgptwNvhNoKWTYo5c8HBsAMvvv1ruSXYKlsRvPBfewHEEo2KgH73SImj2w7a8owq3t0KLVCPP7d4 RXnrXJf6nuXlsvfFFEd6aIN7VjCCAz8wggKooAMCAQICAQ0wDQYJKoZIhvcNAQEFBQAwgdExCzAJ BgNVBAYTAlpBMRUwEwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcTCUNhcGUgVG93bjEaMBgG A1UEChMRVGhhd3RlIENvbnN1bHRpbmcxKDAmBgNVBAsTH0NlcnRpZmljYXRpb24gU2VydmljZXMg RGl2aXNpb24xJDAiBgNVBAMTG1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBDQTErMCkGCSqGSIb3 DQEJARYccGVyc29uYWwtZnJlZW1haWxAdGhhd3RlLmNvbTAeFw0wMzA3MTcwMDAwMDBaFw0xMzA3 MTYyMzU5NTlaMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5 KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQTCBnzAN BgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAxKY8VXNV+065yplaHmjAdQRwnd/p/6Me7L3N9VvyGna9 fww6YfK/Uc4B1OVQCjDXAmNaLIkVcI7dyfArhVqqP3FWy688Cwfn8R+RNiQqE88r1fOCdz0Dviv+ uxg+B79AgAJk16emu59l0cUqVIUPSAR/p7bRPGEEQB5kGXJgt/sCAwEAAaOBlDCBkTASBgNVHRMB Af8ECDAGAQH/AgEAMEMGA1UdHwQ8MDowOKA2oDSGMmh0dHA6Ly9jcmwudGhhd3RlLmNvbS9UaGF3 dGVQZXJzb25hbEZyZWVtYWlsQ0EuY3JsMAsGA1UdDwQEAwIBBjApBgNVHREEIjAgpB4wHDEaMBgG A1UEAxMRUHJpdmF0ZUxhYmVsMi0xMzgwDQYJKoZIhvcNAQEFBQADgYEASIzRUIPqCy7MDaNmrGcP f6+svsIXoUOWlJ1/TCG4+DYfqi2fNi/A9BxQIJNwPP2t4WFiw9k6GX6EsZkbAMUaC4J0niVQlGLH 2ydxVyWN3amcOY6MIE9lX5Xa9/eH1sYITq726jTlEBpbNU1341YheILcIRk13iSx0x1G/11fZU8x ggKPMIICiwIBATB2MGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAo UHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQQIQ XCQb0uBsZ/wW67/mQJdFJzAJBgUrDgMCGgUAoIIBbzAYBgkqhkiG9w0BCQMxCwYJKoZIhvcNAQcB MBwGCSqGSIb3DQEJBTEPFw0wOTA0MTUyMjI0MDZaMCMGCSqGSIb3DQEJBDEWBBRJbl8qmRaOj2uk GaslnSsKXsJWjjCBhQYJKwYBBAGCNxAEMXgwdjBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhh d3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVt YWlsIElzc3VpbmcgQ0ECEFwkG9LgbGf8Fuu/5kCXRScwgYcGCyqGSIb3DQEJEAILMXigdjBiMQsw CQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UE AxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECEFwkG9LgbGf8Fuu/5kCXRScw DQYJKoZIhvcNAQEBBQAEgYAbAvB9L2qYWTFLZKpk7A4GzN+V5bqCq+4fVS8n9GyaGpNEWZem8ux/ PJcaw9qZm6mgzqgaW2bITqbasNZ+JzZfrsRoOicO1s3j6bbX8i8HYmVvqBUd1bu/X0yDne3z1cqw 0iJVG2ApxCW5weYf5ezGcHLkJaH8A105G/EI+W1K/AAAAAAAAA== --Apple-Mail-140--465955309--