flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Neumann <mneum...@spotify.com>
Subject Re: CsvInputFormat delimiter fields
Date Wed, 15 Oct 2014 14:07:17 GMT
Would changing it cost performance?
If not I thing it would be a good change to make since it allows to (ab)use
the csv reader to load structured Text files (for example by putting
Keywords as delimiter).

Being able to put a regular expression there would be even nicer but maybe
it should end up in its own InputFormat then.

cheers Martin

On Wed, Oct 15, 2014 at 3:47 PM, Stephan Ewen <sewen@apache.org> wrote:

> Hi!
>
> The reason is the current way the csv parsers work. They are pushed into
> the byte stream parsing and are restricted to recognize one char
> delimiters. It is possible to change that, but would be a bit of work.
>
> Stephan
>
> On Wed, Oct 15, 2014 at 3:36 PM, Martin Neumann <mneumann@spotify.com>
> wrote:
>
> > Hej,
> >
> > A lot of my inputs are csv files so I use the CsvInputFormat a lot. What
> I
> > find kind of odd that the Line delimiter is a String but the Field
> > delimiter is a Character.
> >
> > *see:* new CsvInputFormat<Tuple2<String,String>>(new
> > Path(pVecPath),"\n",'\t',String.class,String.class)
> >
> > Is there a reason for this? I'm currently working with a file that has a
> > more complex field delimiter so I had to write a mapper to read from
> > StringInputFormat.
> >
> > cheers Martin
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message