To summarize, the experiments verify the previously stated benefits of operating on binary data.

We’re not done yet!

Apache Flink features quite a bit of advanced techniques to safely and efficiently process huge amounts of data with limited memory resources. However, there are a few points that could make Flink even more efficient. The Flink community is working on moving the managed memory to off-heap memory. This will allow for smaller JVMs, lower garbage collection overhead, and also easier system configuration. With Flink’s Table API, the semantics of all operations such as aggregations and projections are known (in contrast to black-box user-defined functions). Hence we can generate code for Table API operations that directly operates on binary data. Further improvements include serialization layouts which are tailored towards the operations that are applied on the binary data and code generation for serializers and comparators.

The groundwork (and a lot more) for operating on binary data is done but there is still some room for making Flink even better and faster. If you are crazy about performance and like to juggle with lot of bits and bytes, join the Flink community!

TL;DR; Give me three things to remember!

14 May 2015 by Kostas Tzoumas (@kostas_tzoumas)

April was an packed month for Apache Flink.

Flink 0.9.0-milestone1 release

@@ -163,7 +163,7 @@

Flink on the web

Fabian Hueske gave an interview at InfoQ on Apache Flink.

Upcoming events

http://git-wip-us.apache.org/repos/asf/flink-web/blob/630f2583/faq.md ---------------------------------------------------------------------- diff --git a/faq.md b/faq.md index ec45656..a8c0c8e 100644 --- a/faq.md +++ b/faq.md @@ -159,6 +159,29 @@ Please refer to the [download page]({{ site.baseurl }}/downloads.html#maven) and the {% github README.md master "build instructions" %} for details on how to set up Flink for different Hadoop and HDFS versions. + +### My job fails with various exceptions from the HDFS/Hadoop code. What can I do? + +Flink is shipping with the Hadoop 2.2 binaries by default. These binaries are used +to connect to HDFS or YARN. +It seems that there are some bugs in the HDFS client which cause exceptions while writing to HDFS +(in particular under high load). +Among the exceptions are the following: + +- `HDFS client trying to connect to the standby Namenode "org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby"` +- `java.io.IOException: Bad response ERROR for block BP-1335380477-172.22.5.37-1424696786673:blk_1107843111_34301064 from datanode 172.22.5.81:50010 + at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:732)` + +- `Caused by: org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): 0 + at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeStorageInfos(DatanodeManager.java:478) + at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updatePipelineInternal(FSNamesystem.java:6039) + at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updatePipeline(FSNamesystem.java:6002)` + +If you are experiencing any of these, we recommend using a Flink build with a Hadoop version matching +your local HDFS version. +You can also manually build Flink against the exact Hadoop version (for example +when using a Hadoop distribution with a custom patch level) + ### In Eclipse, I get compilation errors in the Scala projects Flink uses a new feature of the Scala compiler (called "quasiquotes") that have not yet been properly

How does Flink allocate memory?

How does Flink serialize objects?

How does Flink operate on binary data?

We’re not done yet!

TL;DR; Give me three things to remember!

Flink 0.9.0-milestone1 release

Flink on the web

Upcoming events