hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elliot West <tea...@gmail.com>
Subject Updates/deletes with OrcRecordUpdater
Date Fri, 20 Mar 2015 21:50:21 GMT
Hi,

I'm trying to use the insert, update and delete methods on OrcRecordUpdater
to programmatically mutate an ORC based Hive table (1.0.0). I've got
inserts working correctly but I'm hitting into a problem with deletes and
updates. I get an NPE which I have traced back to what seems like a missing
recIdField(?).

java.lang.NullPointerException
at
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:103)
at
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.addEvent(OrcRecordUpdater.java:296)
at
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.delete(OrcRecordUpdater.java:330)


I've tried specifying a location for the field using
AcidOutputFormat.Options.recordIdColumn(0) but this fails dues to an
ObjectInspector mismatch. I'm not sure if I should be creating this field
as part of my table definition or not. Currently I'm constructing the table
with some code based on that located in the storm-hive project:

      Table tbl = new Table();
      tbl.setDbName(databaseName);
      tbl.setTableName(tableName);
      tbl.setTableType(TableType.MANAGED_TABLE.toString());
      StorageDescriptor sd = new StorageDescriptor();
      sd.setCols(getTableColumns(colNames, colTypes));
      sd.setNumBuckets(1);
      sd.setLocation(dbLocation + Path.SEPARATOR + tableName);
      if (partNames != null && partNames.length != 0) {
        tbl.setPartitionKeys(getPartitionKeys(partNames));
      }

      tbl.setSd(sd);

      sd.setBucketCols(new ArrayList<String>(2));
      sd.setSerdeInfo(new SerDeInfo());
      sd.getSerdeInfo().setName(tbl.getTableName());
      sd.getSerdeInfo().setParameters(new HashMap<String, String>());

sd.getSerdeInfo().getParameters().put(serdeConstants.SERIALIZATION_FORMAT,
"1");
      // Not sure if this does anything?
      sd.getSerdeInfo().getParameters().put("transactional",
Boolean.TRUE.toString());

      sd.getSerdeInfo().setSerializationLib(OrcSerde.class.getName());
      sd.setInputFormat(OrcInputFormat.class.getName());
      sd.setOutputFormat(OrcOutputFormat.class.getName());

      Map<String, String> tableParams = new HashMap<String, String>();
      // Not sure if this does anything?
      tableParams.put("transactional", Boolean.TRUE.toString());
      tbl.setParameters(tableParams);
      client.createTable(tbl);
      try {
        if (partVals != null && partVals.size() > 0) {
          addPartition(client, tbl, partVals);
        }
      } catch (AlreadyExistsException e) {
      }

I don't really know enough about Hive and ORCFile internals to work out
where I'm going wrong so any help would be appreciated.

Thanks - Elliot.

Mime
View raw message