spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From swetha <>
Subject Optimal way to avoid processing null returns in Spark Scala
Date Wed, 07 Oct 2015 16:42:38 GMT

I have the following functions that I am using for my job in Scala. If you
see the getSessionId function I am returning null sometimes. If I return
null the only way that I can avoid processing those records is by filtering
out null records. I wanted to avoid having another pass for filtering so I
tried returning "None" . But, it seems to be having issues as it demands the
return type as optional. What is the optimal way to avoid processing null
records and at the same way avoid having Option as the return type using
None? The use of Option[] and Some(()) seems to be having type issues in
subsequent function calls.

    val sessions = filteredStream.transform(rdd=>getBeaconMap(rdd))

  def getBeaconMap(rdd: RDD[(String, String)]): RDD[(String, (Long,
String))] = {[(String, (Long, String))]{ case (x, y) =>
      ((getSessionId(y), (getTimeStamp(y).toLong,y)))

  def getSessionId(eventRecord:String): String = {
    val beaconTestImpl: BeaconTestLoader = new BeaconTestImpl//This needs to
be changed.
    val beaconEvent: BeaconEventData =

       beaconEvent.getSessionID //This might be in Set Cookie header

    val groupedAndSortedSessions =

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message