Return-Path: Delivered-To: apmail-commons-issues-archive@locus.apache.org Received: (qmail 18509 invoked from network); 30 Nov 2008 01:31:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Nov 2008 01:31:06 -0000 Received: (qmail 53851 invoked by uid 500); 30 Nov 2008 01:31:16 -0000 Delivered-To: apmail-commons-issues-archive@commons.apache.org Received: (qmail 53797 invoked by uid 500); 30 Nov 2008 01:31:16 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 53786 invoked by uid 99); 30 Nov 2008 01:31:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 29 Nov 2008 17:31:16 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Nov 2008 01:29:57 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4E945234C2A7 for ; Sat, 29 Nov 2008 17:30:44 -0800 (PST) Message-ID: <1207552739.1228008644320.JavaMail.jira@brutus> Date: Sat, 29 Nov 2008 17:30:44 -0800 (PST) From: "Niall Pemberton (JIRA)" To: issues@commons.apache.org Subject: [jira] Resolved: (IO-178) BOMExclusionInputStream - an InputStream for UTF-8 data that ignores an initial Byte Order mark In-Reply-To: <364487624.1218895184401.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/IO-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niall Pemberton resolved IO-178. -------------------------------- Resolution: Fixed Fix Version/s: (was: 1.5) 2.0 Assignee: Niall Pemberton Thanks Keith, I have added this with superficial changes, mostly formatting http://svn.apache.org/viewvc?view=rev&revision=721749 > BOMExclusionInputStream - an InputStream for UTF-8 data that ignores an initial Byte Order mark > ----------------------------------------------------------------------------------------------- > > Key: IO-178 > URL: https://issues.apache.org/jira/browse/IO-178 > Project: Commons IO > Issue Type: New Feature > Components: Streams/Writers > Affects Versions: 1.4 > Reporter: Keith D Gregory > Assignee: Niall Pemberton > Priority: Minor > Fix For: 2.0 > > Attachments: BOMExclusionInputStream.java, BOMExclusionInputStream.patch, TestBOMExclusionInputStream.java > > > Microsoft tools have the unpleasant habit of writing a byte order mark (the three-byte sequence 0xEF 0xBB 0xBF) at the start of a UTF-8 encoded file. > The CharsetDecoder supplied with the JDK does not simply discard these bytes, but instead returns the BOM character (0xFEFF); see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6378911 for discussion on this. > This makes life unpleasant for anyone who is processing text data, as the program must look for this character and ignore it. > The BOMExclusionInputStream class is a work-around: it recognizes the BOM at the start of the stream, and skips over it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.