hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harshit Raikar <harshit.rai...@gmail.com>
Subject Storm HiveBolt missing records due to batching of Hive transactions
Date Fri, 09 Oct 2015 09:05:25 GMT
To store the processed records I am using HiveBolt in Storm topology with
following arguments.

- id: "MyHiveOptions"
    className: "org.apache.storm.hive.common.HiveOptions"
      - "${metastore.uri}"                       # metaStoreURI
      - "${hive.database}"                       # databaseName
      - "${hive.table}"                          # tableName
    configMethods:
          - name: "withTxnsPerBatch"
            args:
              - 2
          - name: "withBatchSize"
            args:
              - 100
          - name: "withIdleTimeout"
            args:
              - 2      #default value 0
          - name: "withMaxOpenConnections"
            args:
              - 200     #default value 500
          - name: "withCallTimeout"
            args:
              - 30000     #default value 10000
          - name: "withHeartBeatInterval"
            args:
              - 240     #default value 240

There are missing transaction in Hive due to batch no being completed and
records are flushed. (For example: 1330 records are processed but only 1200
records are in hive. 130 records missing.)

How can I overcome this situation? How can I fill the batch so that the
transaction is triggered and the records are stored in hive.

Topology : Kafka-Spout --> DataProcessingBolt
           DataProcessingBolt -->HiveBolt (Sink)
           DataProcessingBolt -->JdbcBolt (Sink)


-- 
Thanks and Regards,
Harshit Raikar



-- 
Thanks and Regards,
Harshit Raikar
Phone No. +4917655471932

Mime
View raw message