flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: WriteAsText bug or bad name?
Date Mon, 03 Nov 2014 11:17:59 GMT
That is not a big problem, it should just be well documented :)

On Mon, Nov 3, 2014 at 12:09 PM, Stephan Ewen <sewen@apache.org> wrote:

> Hey!
>
> Parallel outputs require multiple output files.
>
> The only way to make this a single file by default is to set the default
> parallelism of file outputs to 1. That would cause many surprises on
> cluster execution, actually.
>
> It may be a fair compromise to set the default parallelism of sinks to 1
> if the execution environment is the local environment.
>
> Stephan
>
>
> On Mon, Nov 3, 2014 at 12:06 PM, Fabian Hueske <fhueske@apache.org> wrote:
>
>> OK, I assume the problem of creating multiple files (+ output directory)
>> is fixed by setting the DOP of the OutputFormat to 1, right?
>>
>> But you still get binary output with a TextOutputFormat that writes a
>> DataSet<String>?
>>
>> 2014-11-03 11:58 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>
>>> Nope. This is actually a bug for me, I don't know what the FLINK
>>> community or committee think
>>>
>>>
>>> On Mon, Nov 3, 2014 at 11:52 AM, Fabian Hueske <fhueske@apache.org>
>>> wrote:
>>>
>>>> Hi Flavio,
>>>>
>>>> any updates on this bug?
>>>>
>>>> Thanks, Fabian
>>>>
>>>> 2014-10-29 22:36 GMT+01:00 Fabian Hueske <fhueske@apache.org>:
>>>>
>>>>> Regarding the text vs. sequence output.
>>>>> writeAsText() emits each record using its toString() method, which
>>>>> should be the String itself in your case.
>>>>>
>>>>> So if it would write binary data, something is wrong...
>>>>>
>>>>>
>>>>> 2014-10-29 22:34 GMT+01:00 Fabian Hueske <fhueske@apache.org>:
>>>>>
>>>>>> You can set the DOP of the data sink to 1 [1].
>>>>>> There is also a config parameter whether to create a directory or
not
>>>>>> in case of DOP=1. If I remember correctly, the default is to NOT
create
>>>>>> a folder for DOP=1.
>>>>>>
>>>>>> [1]
>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#parallel-execution
>>>>>>
>>>>>> Best, Fabian
>>>>>>
>>>>>> 2014-10-29 22:22 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>>>>>
>>>>>>> Would it be that difficult to change the behaviour for file:///
and
>>>>>>> create a single file?or is there a way to do that?
>>>>>>> On Oct 29, 2014 9:52 PM, "Márton Balassi" <balassi.marton@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Dear Flavio,
>>>>>>>>
>>>>>>>> Yes, the writeAsText() merthod really creates a folder which
>>>>>>>> contains a file for each execution thread, so your threads
do not block
>>>>>>>> each other and the execution can use multiple cores on your
machine. You
>>>>>>>> can see similar results if you try it with env.execute()
from an IDE.
>>>>>>>>
>>>>>>>> There are filesystems, HDFS to mention the most prominent
one which
>>>>>>>> can transparently treat such folder structure as a single
file and then it
>>>>>>>> would behave as you expect. I hope this answers your question.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Marton
>>>>>>>>
>>>>>>>> On Wed, Oct 29, 2014 at 8:31 PM, Flavio Pompermaier <
>>>>>>>> pompermaier@okkam.it> wrote:
>>>>>>>>
>>>>>>>>> Hi to all,
>>>>>>>>> running the example at
>>>>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
>>>>>>>>> I was thinking that the writeAsText on a local file was
creating a text
>>>>>>>>> file on my local filesystem..instead it creates something
similar to a
>>>>>>>>> sequence file (within a folder).
>>>>>>>>> This is something misleading I think...or the API name
is wrong or
>>>>>>>>> this is a bug (IMHO).
>>>>>>>>> Btw..how can I modify the following program to write
results in a
>>>>>>>>> single text file on my local filesystem?
>>>>>>>>>
>>>>>>>>> public static void main(String[] args) throws Exception
{
>>>>>>>>>  ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>>>>>>>>>  DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
>>>>>>>>>  data.filter(new FilterFunction<String>() {
>>>>>>>>>    public boolean filter(String value) {
>>>>>>>>>     return value.startsWith("http://");
>>>>>>>>>    }
>>>>>>>>>   }).writeAsText("file:///tmp/res.txt");
>>>>>>>>>   env.execute();}
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Flavio
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Mime
View raw message