Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 454D710659 for ; Wed, 15 Oct 2014 14:07:45 +0000 (UTC) Received: (qmail 70426 invoked by uid 500); 15 Oct 2014 14:07:45 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 70371 invoked by uid 500); 15 Oct 2014 14:07:45 -0000 Mailing-List: contact dev-help@flink.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.incubator.apache.org Delivered-To: mailing list dev@flink.incubator.apache.org Received: (qmail 70348 invoked by uid 99); 15 Oct 2014 14:07:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Oct 2014 14:07:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mneumann@spotify.com designates 209.85.212.180 as permitted sender) Received: from [209.85.212.180] (HELO mail-wi0-f180.google.com) (209.85.212.180) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Oct 2014 14:07:18 +0000 Received: by mail-wi0-f180.google.com with SMTP id em10so2037985wid.1 for ; Wed, 15 Oct 2014 07:07:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=spotify.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=mkMPKnj3ecnsvi/Tl4VBegmi/u4z3mfw0XImGBMB4R8=; b=bYL6Kxl0wLaA4PTTmRprBSgRXHAN55De5KkjA0vaGub3WKmYoyezmXyush9SIk1U1k 0OqsouGfMZURjt3ctNwLC/aSiXvIKm9YVg4/bnCyuSfwQuSzrkuOA2UM3sL4c+nfy+p5 EjGT65cnS1MUwnQZwnR+pGg2R0dmNdDvHeJps= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=mkMPKnj3ecnsvi/Tl4VBegmi/u4z3mfw0XImGBMB4R8=; b=jY0rBz9Mz5BU3Ums4YLHLuN9D8iKCAJPY/62PYhjBM8lJRsj0XsJnDQwR5Hkpv/ZV3 2UXuNYd+7947mGdvNj8sOzbuJIDQizicgRZKzFlbGeYsOIzQywiOdhAQhobtHjow9BPw ArpjoGPKw5dDRebxE2EOR452mes88hEQ6JvBIZYehnsDU1+gZToHu40yK8RSGHmkVW0w puZh9RYfgSxqRcAZpayaOXGUTFeqRYSoiIGzgyYZ5aCwGu73fPtVgU3LIsJ3xG4Wzu+B iUqIhO44LrEXEtmWQPmETVTKkyWjP7yeHDh+B9EOkY+tXGwZzou/wdJKxYUiK8wKTnib ZFLA== X-Gm-Message-State: ALoCoQmqclwnrOXlU+HXp6txJEX+Sh3Ip+h6BOXDnNEhYtTTiBMKBQA+YuFy6Qrbm6yrU9dycrRT MIME-Version: 1.0 X-Received: by 10.180.73.134 with SMTP id l6mr12402822wiv.14.1413382037426; Wed, 15 Oct 2014 07:07:17 -0700 (PDT) Received: by 10.194.43.135 with HTTP; Wed, 15 Oct 2014 07:07:17 -0700 (PDT) In-Reply-To: References: Date: Wed, 15 Oct 2014 16:07:17 +0200 Message-ID: Subject: Re: CsvInputFormat delimiter fields From: Martin Neumann To: dev@flink.incubator.apache.org Content-Type: multipart/alternative; boundary=f46d043d673f45e52e050576a6a7 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043d673f45e52e050576a6a7 Content-Type: text/plain; charset=UTF-8 Would changing it cost performance? If not I thing it would be a good change to make since it allows to (ab)use the csv reader to load structured Text files (for example by putting Keywords as delimiter). Being able to put a regular expression there would be even nicer but maybe it should end up in its own InputFormat then. cheers Martin On Wed, Oct 15, 2014 at 3:47 PM, Stephan Ewen wrote: > Hi! > > The reason is the current way the csv parsers work. They are pushed into > the byte stream parsing and are restricted to recognize one char > delimiters. It is possible to change that, but would be a bit of work. > > Stephan > > On Wed, Oct 15, 2014 at 3:36 PM, Martin Neumann > wrote: > > > Hej, > > > > A lot of my inputs are csv files so I use the CsvInputFormat a lot. What > I > > find kind of odd that the Line delimiter is a String but the Field > > delimiter is a Character. > > > > *see:* new CsvInputFormat>(new > > Path(pVecPath),"\n",'\t',String.class,String.class) > > > > Is there a reason for this? I'm currently working with a file that has a > > more complex field delimiter so I had to write a mapper to read from > > StringInputFormat. > > > > cheers Martin > > > --f46d043d673f45e52e050576a6a7--