Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 15C9D10CEB for ; Fri, 5 Dec 2014 15:45:35 +0000 (UTC) Received: (qmail 61553 invoked by uid 500); 5 Dec 2014 15:45:35 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 61489 invoked by uid 500); 5 Dec 2014 15:45:34 -0000 Mailing-List: contact user-help@flink.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.incubator.apache.org Delivered-To: mailing list user@flink.incubator.apache.org Received: (qmail 61479 invoked by uid 99); 5 Dec 2014 15:45:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Dec 2014 15:45:34 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [85.13.129.7] (HELO dd2236.kasserver.com) (85.13.129.7) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Dec 2014 15:45:29 +0000 Received: from [130.149.212.241] (dhcp-212-241.vpn.tu-berlin.de [130.149.212.241]) by dd2236.kasserver.com (Postfix) with ESMTPSA id 21FB34AA12AE for ; Fri, 5 Dec 2014 16:44:38 +0100 (CET) User-Agent: Microsoft-MacOutlook/14.3.4.130416 Date: Fri, 05 Dec 2014 16:44:30 +0100 Subject: Re: Quotes in fields of CsvInputFormat From: Malte Schwarzer To: Message-ID: Thread-Topic: Quotes in fields of CsvInputFormat In-Reply-To: Mime-version: 1.0 Content-type: multipart/alternative; boundary="B_3500642678_5878978" X-Virus-Checked: Checked by ClamAV on apache.org > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3500642678_5878978 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable Hi Stephan, The result should be >"hhh=B3 xx< as field value. Enclosures should be disabled but there seems to be no method to do that. Malte Von: Stephan Ewen Antworten an: Datum: Freitag, 5. Dezember 2014 16:28 An: Betreff: Re: Quotes in fields of CsvInputFormat Hi! The parser interprets the quotes as quotes for the field. That means the second field (the string) stops after the "hhh" and the xx is considered invalid trailing data. What do you expect as the result of parsing that line? Stephan On Fri, Dec 5, 2014 at 4:16 PM, Malte Schwarzer wrote: > Hi, >=20 > I=B9m try to import a CSV file but the parser seems to have problems this q= uotes > in the beginning of a field. Is there a way to set or disable enclosures = for > the CSV input? >=20 > This is my code: >=20 > DataSet> res =3D env.readCsvFile(inputCsvFilename) > .fieldDelimiter('|') > .types(String.class, String.class) >=20 > CSV: >=20 > A|ggg > B|"hhh" xx > C|xxx >=20 > As result I=B9m receiving a ParserException for line B: >=20 > org.apache.flink.api.common.io.ParseException: Line could not be parsed: > 'B|"hhh" xx=8C >=20 >=20 > Thanks, > Malte --B_3500642678_5878978 Content-type: text/html; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable
Hi Stephan,

The result should be >"hhh“ xx<  as field value. En= closures should be disabled but there seems to be no method to do that. = ;


Malte

Von: Stephan Ewen <sewen@apache.org>
Antworten= an: <user@flink= .incubator.apache.org>
Datum: Freitag, 5. Dezember 2014 16:28
An: <user@flink.incubat= or.apache.org>
Betreff: Re:= Quotes in fields of CsvInputFormat

H= i!

The parser interprets the quotes as quotes for the fie= ld. That means the second field (the string) stops after the "hhh" and the x= x is considered invalid trailing data.

What do you = expect as the result of parsing that line?

Stephan<= /div>


On Fri, Dec 5, 2014 at 4:16 PM, Malte Schwarzer <ms@mieo.de> wrote:
=
Hi,

I’m try to import a CSV file but the parser seems to have problems = this quotes in the beginning of a field. Is there a way to set or disable en= closures for the CSV input?

This is my  code:

<= /div>
DataSet<Tuple2<String, String>> res =3D env.readCsvFile(= inputCsvFilename)
             =   .fieldDelimiter('|')
          &n= bsp;     .types(String.class, String.class)

<= div style=3D"color:rgb(0,0,0);font-family:Calibri,sans-serif;font-size:14px">C= SV:

A|ggg
B|"hhh" xx
C|xxx

As result I’m receiving a ParserException for line B:

org.apache.flink.api.common.io.ParseException: Line could not = be parsed: 'B|"hhh" xx


Thanks,
Malte

--B_3500642678_5878978--