flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: WriteAsText bug or bad name?
Date Mon, 03 Nov 2014 11:09:10 GMT
Hey!

Parallel outputs require multiple output files.

The only way to make this a single file by default is to set the default
parallelism of file outputs to 1. That would cause many surprises on
cluster execution, actually.

It may be a fair compromise to set the default parallelism of sinks to 1 if
the execution environment is the local environment.

Stephan


On Mon, Nov 3, 2014 at 12:06 PM, Fabian Hueske <fhueske@apache.org> wrote:

> OK, I assume the problem of creating multiple files (+ output directory)
> is fixed by setting the DOP of the OutputFormat to 1, right?
>
> But you still get binary output with a TextOutputFormat that writes a
> DataSet<String>?
>
> 2014-11-03 11:58 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>
>> Nope. This is actually a bug for me, I don't know what the FLINK
>> community or committee think
>>
>>
>> On Mon, Nov 3, 2014 at 11:52 AM, Fabian Hueske <fhueske@apache.org>
>> wrote:
>>
>>> Hi Flavio,
>>>
>>> any updates on this bug?
>>>
>>> Thanks, Fabian
>>>
>>> 2014-10-29 22:36 GMT+01:00 Fabian Hueske <fhueske@apache.org>:
>>>
>>>> Regarding the text vs. sequence output.
>>>> writeAsText() emits each record using its toString() method, which
>>>> should be the String itself in your case.
>>>>
>>>> So if it would write binary data, something is wrong...
>>>>
>>>>
>>>> 2014-10-29 22:34 GMT+01:00 Fabian Hueske <fhueske@apache.org>:
>>>>
>>>>> You can set the DOP of the data sink to 1 [1].
>>>>> There is also a config parameter whether to create a directory or not
>>>>> in case of DOP=1. If I remember correctly, the default is to NOT create
>>>>> a folder for DOP=1.
>>>>>
>>>>> [1]
>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#parallel-execution
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>> 2014-10-29 22:22 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>>>>
>>>>>> Would it be that difficult to change the behaviour for file:/// and
>>>>>> create a single file?or is there a way to do that?
>>>>>> On Oct 29, 2014 9:52 PM, "Márton Balassi" <balassi.marton@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear Flavio,
>>>>>>>
>>>>>>> Yes, the writeAsText() merthod really creates a folder which
>>>>>>> contains a file for each execution thread, so your threads do
not block
>>>>>>> each other and the execution can use multiple cores on your machine.
You
>>>>>>> can see similar results if you try it with env.execute() from
an IDE.
>>>>>>>
>>>>>>> There are filesystems, HDFS to mention the most prominent one
which
>>>>>>> can transparently treat such folder structure as a single file
and then it
>>>>>>> would behave as you expect. I hope this answers your question.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Marton
>>>>>>>
>>>>>>> On Wed, Oct 29, 2014 at 8:31 PM, Flavio Pompermaier <
>>>>>>> pompermaier@okkam.it> wrote:
>>>>>>>
>>>>>>>> Hi to all,
>>>>>>>> running the example at
>>>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
>>>>>>>> I was thinking that the writeAsText on a local file was creating
a text
>>>>>>>> file on my local filesystem..instead it creates something
similar to a
>>>>>>>> sequence file (within a folder).
>>>>>>>> This is something misleading I think...or the API name is
wrong or
>>>>>>>> this is a bug (IMHO).
>>>>>>>> Btw..how can I modify the following program to write results
in a
>>>>>>>> single text file on my local filesystem?
>>>>>>>>
>>>>>>>> public static void main(String[] args) throws Exception {
>>>>>>>>  ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>>>>>>>>  DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
>>>>>>>>  data.filter(new FilterFunction<String>() {
>>>>>>>>    public boolean filter(String value) {
>>>>>>>>     return value.startsWith("http://");
>>>>>>>>    }
>>>>>>>>   }).writeAsText("file:///tmp/res.txt");
>>>>>>>>   env.execute();}
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Flavio
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message