commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Warshaw (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CSV-224) Some Multi Iterator Parsing Peek Sequences Incorrectly Consume Elements
Date Fri, 20 Apr 2018 13:52:00 GMT

    [ https://issues.apache.org/jira/browse/CSV-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445748#comment-16445748
] 

David Warshaw commented on CSV-224:
-----------------------------------

cc [~garydgregory]

> Some Multi Iterator Parsing Peek Sequences Incorrectly Consume Elements
> -----------------------------------------------------------------------
>
>                 Key: CSV-224
>                 URL: https://issues.apache.org/jira/browse/CSV-224
>             Project: Commons CSV
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: 1.5
>            Reporter: David Warshaw
>            Priority: Minor
>
> Repeated calls to CSVParser Iterable return new Iterators that each reference the same
underlying parser lexer. Within the scope of a single Iterator, row peeking with Iterator.hasNext()
works as intended. When row peeking with Iterator.hasNext() under circumstances that create
a new Iterator, an element is consumed by the iterator which cannot be accessed by subsequent,
newly created Iterators and Iterator.next()s. Effectively, the record Iterator and the lexer
get out of sequence. See snippet below.
> The "right thing" is keeping the Iterator in sequence with the lexer, and since this
is reading from a buffer, there seem to me to be only two resolutions:
>  # One lexer, one Iterator.
>  # New Iterators, but peeking with hasNext doesn't advance the lexer.
>  
> If there's a consensus on one of these, I can put up a PR.
>  
> {code:java}
>   @Test
>   public void newIteratorSameLexer() throws Exception {
>     String fiveRows = "1\n2\n3\n4\n5\n";
>     System.out.println("Enhanced for loop, no peeking:");
>     CSVParser parser =
>         new CSVParser(new BufferedReader(new StringReader(fiveRows)), CSVFormat.DEFAULT);
>     int recordNumber = 0;
>     for (CSVRecord record : parser) {
>       recordNumber++;
>       System.out.println(recordNumber + " -> " + record.get(0));
>       if (recordNumber >= 2) {
>         break;
>       }
>     }
>     // CSVParser.iterator() returns a new iterator, but the lexer isn't reset so we
can pick up
>     // where we left off.
>     for (CSVRecord record : parser) {
>       recordNumber++;
>       System.out.println(recordNumber + " -> " + record.get(0));
>     }
>     // Enhanced for loop, no peeking:
>     // 1 -> 1
>     // 2 -> 2
>     // 3 -> 3
>     // 4 -> 4
>     // 5 -> 5
>     System.out.println("\nEnhanced for loop, with peek:");
>     parser = new CSVParser(new BufferedReader(new StringReader(fiveRows)), CSVFormat.DEFAULT);
>     recordNumber = 0;
>     for (CSVRecord record : parser) {
>       recordNumber++;
>       System.out.println(recordNumber + " -> " + record.get(0));
>       if (recordNumber >= 2) {
>         break;
>       }
>     }
>     // CSVParser.iterator() returns a new iterator, but we call hasNext before next,
so we queue
>     // one element for consumption. This element is discarded by the new iterator,
even though the
>     // lexer has advanced a row, so we've consumed an element with the peek!
>     System.out.println("hasNext(): " + parser.iterator().hasNext());
>     for (CSVRecord record : parser) {
>       recordNumber++;
>       System.out.println(recordNumber + " -> " + record.get(0));
>     }
>     // Enhanced for loop, with peek:
>     // 1 -> 1
>     // 2 -> 2
>     // hasNext(): true
>     // 3 -> 4
>     // 4 -> 5
>     System.out.println("\nIterator while, with peek:");
>     parser = new CSVParser(new BufferedReader(new StringReader(fiveRows)), CSVFormat.DEFAULT);
>     recordNumber = 0;
>     Iterator<CSVRecord> iter = parser.iterator();
>     while (iter.hasNext()) {
>       CSVRecord record = iter.next();
>       recordNumber++;
>       System.out.println(recordNumber + " -> " + record.get(0));
>       if (recordNumber >= 2) {
>         break;
>       }
>     }
>     // When we use the same iterator, iterator and lexer are in sequence.
>     System.out.println("hasNext(): " + iter.hasNext());
>     while (iter.hasNext()) {
>       CSVRecord record = iter.next();
>       recordNumber++;
>       System.out.println(recordNumber + " -> " + record.get(0));
>     }
>     // Iterator while, with peek:
>     // 1 -> 1
>     // 2 -> 2
>     // hasNext(): true
>     // 3 -> 3
>     // 4 -> 4
>     // 5 -> 5
>   }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message