flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sohimankotia <sohimanko...@gmail.com>
Subject How to Create Sample Data from HDFS File using Flink ?
Date Tue, 21 Nov 2017 13:46:25 GMT
Hi, 

I have directory in HDFS containing 20 files with 150 Million records .

I just want random 20 million records from that directory . (Sampled Data ).
I see that there are few implementations are there in flink 
https://github.com/eBay/Flink/tree/master/flink-java/src/main/java/org/apache/flink/api/java/sampling
.

Can someone provide code example to use these .

Here is my code to read from HDFS file  :

	final
org.apache.flink.api.java.hadoop.mapred.HadoopInputFormat<LongWritable,
Text> inputFormat
				= HadoopInputs.readHadoopFile(new TextInputFormat(), LongWritable.class,
Text.class, hdfsPath);

		final DataSource<Tuple2&lt;LongWritable, Text>> input =
environment.createInput(inputFormat).withParameters(configs);







--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Mime
View raw message