commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Spero (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IO-577) Add readers to filter out given characters: CharacterSetFilterReader and CharacterFilterReader.
Date Thu, 07 Jun 2018 16:25:00 GMT

    [ https://issues.apache.org/jira/browse/IO-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504882#comment-16504882
] 

Simon Spero commented on IO-577:
--------------------------------

A few comments :
1. apis like java.lang.stream and rxjava have filter methods that work in the opposite sense
to the filter method introduced here - they select items that match the test, rather than
excluding them. 

2. The documentation refers to "codepoints"; however, the read method in java.io.FilterReader
returns UTF-16 characters. This makes a difference for characters that aren't in the BMP,
and which are represented in Java as surrogate pairs. The current implementation can't filter
codepoints like 😭 (U+1F62D) because it only sees the UTF-16 surrogates.  
Working with codepoints would potentially require interposing a pushback reader to handle
the case where the input contains a codepoint encoded in more than one char, which is not
rejected. 

3. commons IO is currently using Java 7. If the source level were to change to Java 8 then
the filter method could be replaced by an IntPredicate  / Predicate<Integer> (passed
in when the class is constructed).  The current cases could be handled using a method reference.
/ Predicate.isEquals. 


> Add readers to filter out given characters: CharacterSetFilterReader and CharacterFilterReader.
> -----------------------------------------------------------------------------------------------
>
>                 Key: IO-577
>                 URL: https://issues.apache.org/jira/browse/IO-577
>             Project: Commons IO
>          Issue Type: New Feature
>          Components: Filters
>            Reporter: Gary Gregory
>            Assignee: Gary Gregory
>            Priority: Major
>             Fix For: 2.7
>
>         Attachments: commons-io-577.patch
>
>
> Add readers to filter out given characters,  handy to remove known junk characters from
CSV files for example. Please see attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message