spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erwan ALLAIN <eallain.po...@gmail.com>
Subject Re: Best practices to handle corrupted records
Date Thu, 15 Oct 2015 14:19:21 GMT
What about http://www.scala-lang.org/api/2.9.3/scala/Either.html ?


On Thu, Oct 15, 2015 at 2:57 PM, Roberto Congiu <roberto.congiu@gmail.com>
wrote:

> I came to a similar solution to a similar problem. I deal with a lot of
> CSV files from many different sources and they are often malformed.
> HOwever, I just have success/failure. Maybe you should  make
> SuccessWithWarnings a subclass of success, or getting rid of it altogether
> making the warnings optional.
> I was thinking of making this cleaning/conforming library open source if
> you're interested.
>
> R.
>
> 2015-10-15 5:28 GMT-07:00 Antonio Murgia <antonio.murgia2@studio.unibo.it>
> :
>
>> Hello,
>> I looked around on the web and I couldn’t find any way to deal in a
>> structured way with malformed/faulty records during computation. All I was
>> able to find was the flatMap/Some/None technique + logging.
>> I’m facing this problem because I have a processing algorithm that
>> extracts more than one value from each record, but can fail in extracting
>> one of those multiple values, and I want to keep track of them. Logging is
>> not feasible because this “warning” happens so frequently that the logs
>> would become overwhelming and impossibile to read.
>> Since I have 3 different possible outcomes from my processing I modeled
>> it with this class hierarchy:
>> That holds result and/or warnings.
>> Since Result implements Traversable it can be used in a flatMap,
>> discarding all warnings and failure results, in the other hand, if we want
>> to keep track of warnings, we can elaborate them and output them if we need.
>>
>> Kind Regards
>> #A.M.
>>
>
>
>
> --
> --------------------------------------------------------------
> "Good judgment comes from experience.
> Experience comes from bad judgment"
> --------------------------------------------------------------
>

Mime
View raw message