flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Márton Balassi <mbala...@apache.org>
Subject Fwd: Flink questions
Date Wed, 11 Mar 2015 15:37:41 GMT
Dear Emmanuel,

I'm Marton, one of the Flink Streaming developers - Robert forwarded your
issue to me. Thanks for trying out our project.

1) Debugging: TaskManager logs are currently not forwarded to the UI, but
you can find them on the taskmanager machines in the log folder of your
Flink distribution. We have this issue on our agenda in the very near
future - they need to be accessible from the UI.

2) Output to socket: Currently we do not have a preimplemented sink for
sockets (although we offer a socket source and sinks writing to Apache
Kafka, Flume and RabbitMQ). You can easily implement a socket sink by
extending the abstract RichSinkFunction class though. [1]

For using that you can simply say dataStream.addSink(MySinkFunction()) - in
that you can bring up a socket or any other service. You would create a
socket in the open function and then in the invoke method you would write
every value out to it.

I do agree that this is a nice tool to have so I have opened a JIRA ticket
for it. [2]

3) Internal data format: Robert was kind enough to offer a more detailed
answer on this issue. In general streaming sinks support any file output
that is supported by batch Flink including Avro. You can use this
functionality by dataStream.addSink(new FileSinkFunction<>(OutputFormat)).

[2] https://issues.apache.org/jira/browse/FLINK-1688



*From:* Emmanuel <eleroy@msn.com>
*Date:* 11. März 2015 14:59:31 MEZ
*To:* Robert Metzger <rmetzger@apache.org>, Henry Saputra <
*Subject:* *Flink questions*


Thanks again for the help yesterday: the simple things go a long way to get
me moving...

I have more questions i hope I can get your opinion and input about:

What's the preferred or recommended way to proceed?
I have been using some System.out.println() statements in my simple test
code, and the results are confusing:
First, in the UI, the logs are for the jobmanager.out, but there is never
anything there; wherever i see output in a log it's on the taskmanager.out
Also, even more confusing is the fact that often times I just get no log at
all... the UI says the topology is running, but nothing get printed out...
Is there a process you'd recommend to follow to debug properly with logs?

*Output to socket*
Ideally I'd like to print out to a socket/stream and read from another
machine so as not to choke the node with disk I/Os when testing
performances. Not sure how to do that.

*Internal Data format*
Finally, a practical question about data format: we ingest JSON, which is
not convenient, and uses a lot of space. Internally Java/Scala prefers
Tuples, and we were thinking of using ProtoBuffs.
There is also Avro that could do this as I understand it... What would be
the recommended way to format data internally?

Thanks for your input.


View raw message