Return-Path: Delivered-To: apmail-commons-issues-archive@locus.apache.org Received: (qmail 15924 invoked from network); 18 Feb 2008 10:01:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Feb 2008 10:01:07 -0000 Received: (qmail 61825 invoked by uid 500); 18 Feb 2008 10:01:00 -0000 Delivered-To: apmail-commons-issues-archive@commons.apache.org Received: (qmail 61746 invoked by uid 500); 18 Feb 2008 10:01:00 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 61727 invoked by uid 99); 18 Feb 2008 10:01:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Feb 2008 02:01:00 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Feb 2008 10:00:22 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E8C64234C051 for ; Mon, 18 Feb 2008 02:00:34 -0800 (PST) Message-ID: <708500861.1203328834952.JavaMail.jira@brutus> Date: Mon, 18 Feb 2008 02:00:34 -0800 (PST) From: "Joerg Schaible (JIRA)" To: issues@commons.apache.org Subject: [jira] Commented: (CONFIGURATION-164) Charset detection MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CONFIGURATION-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569833#action_12569833 ] Joerg Schaible commented on CONFIGURATION-164: ---------------------------------------------- That algorithm seems bloated for our use case, since it also tries to extract the encoding from the HTML settings/page. We might as alternative use a simpler approach like it is done in the XmlHeaderAwareReader in XStream, which is based on the JDK's PushBackInputStream and implemented similar to a normal InputStreamReader. > Charset detection > ----------------- > > Key: CONFIGURATION-164 > URL: https://issues.apache.org/jira/browse/CONFIGURATION-164 > Project: Commons Configuration > Issue Type: New Feature > Affects Versions: Nightly Builds > Reporter: Emmanuel Bourg > Priority: Minor > Fix For: 2.x > > > Detecting automatically the charset of the file based configurations would be a > nice addition. When the file has no byte order mark defining the charset, we > might apply a detection algorithm similar to the one implemented in Mozilla. > There is at least one Java library providing this feature, jchardet : > http://sourceforge.net/projects/jchardet -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.