spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <>
Subject Re: Text
Date Fri, 27 Jan 2017 13:45:09 GMT
I agree with the previous statements. You cannot expect any ordering guarantee. This means
you need to ensure that the same ordering is done as the original file. Internally Spark is
using the Hadoop Client libraries - even if you do not have Hadoop installed, because it is
a flexible transparent solution to access many file systems including the local one. In the
case you mentioned it is the TextInputFileFormat that returns a key and the value. The key
This means you can sort by the key.
However to access this key you must use the hadoopFile method of Sparl together with the TextInputFormat.

> On 27 Jan 2017, at 10:44, Soheila S. <> wrote:
> Hi All,
> I read a test file using sparkContext.textfile(filename) and assign it to an RDD and
process the RDD (replace some words) and finally write it to a text file using rdd.saveAsTextFile(output).
> Is there any way to be sure the order of the sentences will not be changed? I need to
have the same text with some corrected words.
> thanks!
> Soheila

To unsubscribe e-mail:

View raw message