spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antonio Murgia <antonio.murg...@studio.unibo.it>
Subject Re: Best practices to handle corrupted records
Date Thu, 15 Oct 2015 15:31:17 GMT
'Either' does not cover the case where the outcome was successful but generated warnings. I
already looked into it and also at 'Try' from which I got inspired. Thanks for pointing it
out anyway!

#A.M.

Il giorno 15 ott 2015, alle ore 16:19, Erwan ALLAIN <eallain.poctu@gmail.com<mailto:eallain.poctu@gmail.com>>
ha scritto:

What about http://www.scala-lang.org/api/2.9.3/scala/Either.html ?


On Thu, Oct 15, 2015 at 2:57 PM, Roberto Congiu <roberto.congiu@gmail.com<mailto:roberto.congiu@gmail.com>>
wrote:
I came to a similar solution to a similar problem. I deal with a lot of CSV files from many
different sources and they are often malformed.
HOwever, I just have success/failure. Maybe you should  make SuccessWithWarnings a subclass
of success, or getting rid of it altogether making the warnings optional.
I was thinking of making this cleaning/conforming library open source if you're interested.

R.

2015-10-15 5:28 GMT-07:00 Antonio Murgia <antonio.murgia2@studio.unibo.it<mailto:antonio.murgia2@studio.unibo.it>>:
Hello,
I looked around on the web and I couldn't find any way to deal in a structured way with malformed/faulty
records during computation. All I was able to find was the flatMap/Some/None technique + logging.
I'm facing this problem because I have a processing algorithm that extracts more than one
value from each record, but can fail in extracting one of those multiple values, and I want
to keep track of them. Logging is not feasible because this "warning" happens so frequently
that the logs would become overwhelming and impossibile to read.
Since I have 3 different possible outcomes from my processing I modeled it with this class
hierarchy:
[cid:935118B9-A7BA-4D67-815A-B861FA866DAF]
That holds result and/or warnings.
Since Result implements Traversable it can be used in a flatMap, discarding all warnings and
failure results, in the other hand, if we want to keep track of warnings, we can elaborate
them and output them if we need.

Kind Regards
#A.M.



--
--------------------------------------------------------------
"Good judgment comes from experience.
Experience comes from bad judgment"
--------------------------------------------------------------


Mime
View raw message