hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <alanfga...@gmail.com>
Subject Re: Updates/deletes with OrcRecordUpdater
Date Sat, 21 Mar 2015 04:28:32 GMT
Your table definition looks fine, and no you shouldn't service the 
recIdField in the table itself.

Without seeing your writing code it's hard to know why you're hitting 
this, but some info that may be of use.  Hive itself uses a pseudo 
column to store the recIdInfo when it reads an ACID row so that it has 
it when it writes back for an update or delete.  I'm guessing you don't 
have this pseudo column set up correctly.  You can take a look at 
FileSinkOperator (look for ACID or UPDATE) and 
OrcInputFormat.getRecordReader to get an idea of how this works.

Alan.

> Elliot West <mailto:teabot@gmail.com>
> March 20, 2015 at 14:50
> Hi,
>
> I'm trying to use the insert, update and delete methods on 
> OrcRecordUpdater to programmatically mutate an ORC based Hive table 
> (1.0.0). I've got inserts working correctly but I'm hitting into a 
> problem with deletes and updates. I get an NPE which I have traced 
> back to what seems like a missing recIdField(?).
>
>
> I've tried specifying a location for the field using 
> AcidOutputFormat.Options.recordIdColumn(0) but this fails dues to an 
> ObjectInspector mismatch. I'm not sure if I should be creating this 
> field as part of my table definition or not. Currently I'm 
> constructing the table with some code based on that located in the 
> storm-hive project:
>
>       Table tbl = new Table();
>       tbl.setDbName(databaseName);
>       tbl.setTableName(tableName);
>       tbl.setTableType(TableType.MANAGED_TABLE.toString());
>       StorageDescriptor sd = new StorageDescriptor();
>       sd.setCols(getTableColumns(colNames, colTypes));
>       sd.setNumBuckets(1);
>       sd.setLocation(dbLocation + Path.SEPARATOR + tableName);
>       if (partNames != null && partNames.length != 0) {
>         tbl.setPartitionKeys(getPartitionKeys(partNames));
>       }
>
>       tbl.setSd(sd);
>
>       sd.setBucketCols(new ArrayList<String>(2));
>       sd.setSerdeInfo(new SerDeInfo());
>       sd.getSerdeInfo().setName(tbl.getTableName());
>       sd.getSerdeInfo().setParameters(new HashMap<String, String>());
>       
> sd.getSerdeInfo().getParameters().put(serdeConstants.SERIALIZATION_FORMAT, 
> "1");
>       // Not sure if this does anything?
>       sd.getSerdeInfo().getParameters().put("transactional", 
> Boolean.TRUE.toString());
>
>       sd.getSerdeInfo().setSerializationLib(OrcSerde.class.getName());
>       sd.setInputFormat(OrcInputFormat.class.getName());
>       sd.setOutputFormat(OrcOutputFormat.class.getName());
>
>       Map<String, String> tableParams = new HashMap<String, String>();
> // Not sure if this does anything?
>       tableParams.put("transactional", Boolean.TRUE.toString());
>       tbl.setParameters(tableParams);
>       client.createTable(tbl);
>       try {
>         if (partVals != null && partVals.size() > 0) {
>           addPartition(client, tbl, partVals);
>         }
>       } catch (AlreadyExistsException e) {
>       }
>
> I don't really know enough about Hive and ORCFile internals to work 
> out where I'm going wrong so any help would be appreciated.
>
> Thanks - Elliot.

Mime
View raw message