flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Flink first() operator
Date Mon, 25 Apr 2016 09:54:07 GMT
Hi Biplop,

you can also implement a generic IF that wraps another IF (such as a
CsvInputFormat).
The wrapping IF forwards all calls to the wrapped IF and in addition counts
how many records were emitted (how often InputFormat.nextRecord() was
called).
Once the count arrives at the threshold, it returns true for
InputFormat.reachedEnd().

Cheers, Fabian

2016-04-25 11:06 GMT+02:00 Ufuk Celebi <uce@apache.org>:

> Hey Biplob,
>
> Yes, the file source will read all input. The first operator will add
> a combiner to the source for pre-aggregation and then shuffle
> everything to a single reduce instance, which emits the N first
> elements. Keep in mind that there is no strict order in which the
> records will be emitted.
>
> If you need to optimize this you could write a custom
> File/TextInputFormat, which discards the lines at the sources. You can
> have a look at these classes and then get back with questions on the
> mailing list.
>
> – Ufuk
>
> On Sat, Apr 23, 2016 at 6:37 PM, Biplob Biswas <revolutionisme@gmail.com>
> wrote:
> > Hi,
> >
> > It might be a naive question but I was concerned as I am trying to read
> from
> > a file.
> > My question is if I have a file with n lines and i want m lines out of
> that
> > where n << m, would the first operator process only the first m lines or
> > would it go through the entire file?
> >
> > If it does go through the entire file, is there a better way to just get
> the
> > top m lines using readCsvFile function?
> >
> > Thanks & Regards
> > Biplob Biswas
>

Mime
View raw message