spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "caoxuewen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-23247) combines Unsafe operations and statistics operations in Scan Data Source
Date Sat, 27 Jan 2018 06:36:00 GMT
caoxuewen created SPARK-23247:
---------------------------------

             Summary: combines Unsafe operations and statistics operations in Scan Data Source
                 Key: SPARK-23247
                 URL: https://issues.apache.org/jira/browse/SPARK-23247
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: caoxuewen


Currently, we scan the execution plan of the data source, first the unsafe operation of each
row of data, and then re traverse the data for the count of rows. In terms of performance,
this is not necessary. this PR combines the two operations and makes statistics on the number
of rows while performing the unsafe operation.

*Before modified,*

{color:#cc7832}val {color}unsafeRow = rdd.mapPartitionsWithIndexInternal { (index{color:#cc7832},
{color}iter) =>
 {color:#cc7832}val {color}proj = UnsafeProjection.create({color:#9876aa}schema{color})
 proj.initialize(index)
 {color:#FF0000}iter.map(proj){color}
}

{color:#cc7832}val {color}numOutputRows = longMetric({color:#6a8759}"numOutputRows"{color})
unsafeRow.map { r =>
 {color:#FF0000}numOutputRows += {color}{color:#6897bb}{color:#FF0000}1{color}
{color} r
}

*After modified,*

    val numOutputRows = longMetric("numOutputRows")

    rdd.mapPartitionsWithIndexInternal { (index, iter) =>
      val proj = UnsafeProjection.create(schema)
      proj.initialize(index)
      iter.map( r => {
{color:#FF0000}        numOutputRows += 1{color}
{color:#FF0000}        proj(r){color}
      })
    }

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message