spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antonio Murgia <>
Subject Best practices to handle corrupted records
Date Thu, 15 Oct 2015 12:28:41 GMT
I looked around on the web and I couldn’t find any way to deal in a structured way with
malformed/faulty records during computation. All I was able to find was the flatMap/Some/None
technique + logging.
I’m facing this problem because I have a processing algorithm that extracts more than one
value from each record, but can fail in extracting one of those multiple values, and I want
to keep track of them. Logging is not feasible because this “warning” happens so frequently
that the logs would become overwhelming and impossibile to read.
Since I have 3 different possible outcomes from my processing I modeled it with this class
That holds result and/or warnings.
Since Result implements Traversable it can be used in a flatMap, discarding all warnings and
failure results, in the other hand, if we want to keep track of warnings, we can elaborate
them and output them if we need.

Kind Regards
View raw message