flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Flink first() operator
Date Mon, 25 Apr 2016 09:06:00 GMT
Hey Biplob,

Yes, the file source will read all input. The first operator will add
a combiner to the source for pre-aggregation and then shuffle
everything to a single reduce instance, which emits the N first
elements. Keep in mind that there is no strict order in which the
records will be emitted.

If you need to optimize this you could write a custom
File/TextInputFormat, which discards the lines at the sources. You can
have a look at these classes and then get back with questions on the
mailing list.

– Ufuk

On Sat, Apr 23, 2016 at 6:37 PM, Biplob Biswas <revolutionisme@gmail.com> wrote:
> Hi,
> It might be a naive question but I was concerned as I am trying to read from
> a file.
> My question is if I have a file with n lines and i want m lines out of that
> where n << m, would the first operator process only the first m lines or
> would it go through the entire file?
> If it does go through the entire file, is there a better way to just get the
> top m lines using readCsvFile function?
> Thanks & Regards
> Biplob Biswas

View raw message