flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@apache.org>
Subject Re: WriteAsText bug or bad name?
Date Mon, 03 Nov 2014 11:06:28 GMT
OK, I assume the problem of creating multiple files (+ output directory) is
fixed by setting the DOP of the OutputFormat to 1, right?

But you still get binary output with a TextOutputFormat that writes a
DataSet<String>?

2014-11-03 11:58 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:

> Nope. This is actually a bug for me, I don't know what the FLINK community
> or committee think
>
>
> On Mon, Nov 3, 2014 at 11:52 AM, Fabian Hueske <fhueske@apache.org> wrote:
>
>> Hi Flavio,
>>
>> any updates on this bug?
>>
>> Thanks, Fabian
>>
>> 2014-10-29 22:36 GMT+01:00 Fabian Hueske <fhueske@apache.org>:
>>
>>> Regarding the text vs. sequence output.
>>> writeAsText() emits each record using its toString() method, which
>>> should be the String itself in your case.
>>>
>>> So if it would write binary data, something is wrong...
>>>
>>>
>>> 2014-10-29 22:34 GMT+01:00 Fabian Hueske <fhueske@apache.org>:
>>>
>>>> You can set the DOP of the data sink to 1 [1].
>>>> There is also a config parameter whether to create a directory or not
>>>> in case of DOP=1. If I remember correctly, the default is to NOT create
>>>> a folder for DOP=1.
>>>>
>>>> [1]
>>>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#parallel-execution
>>>>
>>>> Best, Fabian
>>>>
>>>> 2014-10-29 22:22 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>>>
>>>>> Would it be that difficult to change the behaviour for file:/// and
>>>>> create a single file?or is there a way to do that?
>>>>> On Oct 29, 2014 9:52 PM, "Márton Balassi" <balassi.marton@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Dear Flavio,
>>>>>>
>>>>>> Yes, the writeAsText() merthod really creates a folder which contains
>>>>>> a file for each execution thread, so your threads do not block each
other
>>>>>> and the execution can use multiple cores on your machine. You can
see
>>>>>> similar results if you try it with env.execute() from an IDE.
>>>>>>
>>>>>> There are filesystems, HDFS to mention the most prominent one which
>>>>>> can transparently treat such folder structure as a single file and
then it
>>>>>> would behave as you expect. I hope this answers your question.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Marton
>>>>>>
>>>>>> On Wed, Oct 29, 2014 at 8:31 PM, Flavio Pompermaier <
>>>>>> pompermaier@okkam.it> wrote:
>>>>>>
>>>>>>> Hi to all,
>>>>>>> running the example at
>>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
>>>>>>> I was thinking that the writeAsText on a local file was creating
a text
>>>>>>> file on my local filesystem..instead it creates something similar
to a
>>>>>>> sequence file (within a folder).
>>>>>>> This is something misleading I think...or the API name is wrong
or
>>>>>>> this is a bug (IMHO).
>>>>>>> Btw..how can I modify the following program to write results
in a
>>>>>>> single text file on my local filesystem?
>>>>>>>
>>>>>>> public static void main(String[] args) throws Exception {
>>>>>>>  ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>>>>>>>  DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
>>>>>>>  data.filter(new FilterFunction<String>() {
>>>>>>>    public boolean filter(String value) {
>>>>>>>     return value.startsWith("http://");
>>>>>>>    }
>>>>>>>   }).writeAsText("file:///tmp/res.txt");
>>>>>>>   env.execute();}
>>>>>>>
>>>>>>> Best,
>>>>>>> Flavio
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Mime
View raw message