spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A.K.M. Ashrafuzzaman" <ashrafuzzaman...@gmail.com>
Subject Bulk insert strategy
Date Sun, 08 Mar 2015 06:54:09 GMT
While processing DStream in the Spark Programming Guide, the suggested usage of connection
is the following,

dstream.foreachRDD(rdd => {
      rdd.foreachPartition(partitionOfRecords => {
          // ConnectionPool is a static, lazily initialized pool of connections
          val connection = ConnectionPool.getConnection()
          partitionOfRecords.foreach(record => connection.send(record))
          ConnectionPool.returnConnection(connection)  // return to the pool for future reuse
      })
  })

In this case processing and the insertion is done in the workers. There, we don’t use batch
insert in db. How about this use case, where we can process(parse string JSON to obj) and
send back those objects to master and then send a bulk insert request. Is there any benefit
for sending individually using connection pool vs use of bulk operation in the master?
	
A.K.M. Ashrafuzzaman
Lead Software Engineer
NewsCred

(M) 880-175-5592433
Twitter | Blog | Facebook

Check out The Academy, your #1 source
for free content marketing resources


Mime
View raw message