hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Accumulo Storage Manager
Date Sun, 13 Sep 2015 02:18:47 GMT
So the binary parsing definitely seems wrong. Maybe two issues there: 
one being the inline #binary not being recognized with the '*' map 
modifier and the second being the row failing to parse.

I'd have to write a test to see how the HBaseStorageHandler works and 
see if I missed something in handling all the types correctly. The 
AccumuloStorageHandler should be able to handle the same kind of types 
that a native table can handle. So, I would call ARRAYs not being 
serialized a bug as well.

Sorry you're running into this. If you could capture these in JIRA 
issues, that would make it really good to start working through them and 
get them fixed.

If you have the time and desire, trying to reproduce theses failures in 
unit tests would also be great :). The type handling can be a little 
difficult but there are likely some places to start in the accumulo or 
hbase handler tests. At worst, we can start by writing a qtest that will 
reproduce your errors using an full environment (Accumulo minicluster, etc).

peter.marron@baesystems.com wrote:
> Hi Josh,
>
> At this stage I don't know whether there's anything wrong with Hive or it's just user
error.
> Perhaps if I go through what I have done you can see where the error lies.
> Unfortunately this is going to be wordy. Apologies in advance for the long email.
>
> So I created a "normal" table in HDFS with a variety of column types like this:
>
>          CREATE TABLE employees4 (
>           rowid STRING,
>           flag BOOLEAN,
>           number INT,
>           bignum BIGINT,
>           name STRING,
>           salary FLOAT,
>           bigsalary DOUBLE,
>           numbers ARRAY<INT>,
>           floats ARRAY<DOUBLE>,
>           subordinates ARRAY<STRING>,
>           deductions MAP<STRING, FLOAT>,
>           namedNumbers MAP<STRING, INT>,
>           address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>);
>
> And I put some data into it and I can see the data:
>
> hive>  SELECT * FROM employees4;
> OK
> row1    true    100     7       John Doe        100000.0        100000.0        [13,23,-1,1001]
[3.14159,2.71828,-1.1,1001.0]   ["Mary Smith","Todd Jones"]     {"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1} {"nameOne":123,"Name Two":49,"The Third Man":-1}        {"street":"1
Michigan Ave.","city":"Chicago","state":"IL","zip":60600}
> row2    false   7       100     Mary Smith      100000.0        80000.0 [13,23,-1,1001]
[3.14159,2.71828,-1.1,1001.0,1001.0]    ["Bill King"]   {"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1}{"nameOne":123,"Name
Two":49,"The Third Man":-1} {"street":"100 Ontario St.","city":"Chicago","state":"IL","zip":60601}
> row3    false   3245    877878  Todd Jones      100000.0        70000.0 [13,23,-1,1001]
[3.14159,2.71828,-1.1,1001.0,2.0]       []      {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}
      {"nameOne":123,"Name Two":49,"The Third Man":-1} {"street":"200 Chicago Ave.","city":"Oak
Park","state":"IL","zip":60700}
> row4    true    877878  3245    Bill King       100000.0        60000.0 [13,23,-1,1001]
[3.14159,2.71828,-1.1,1001.0,1001.0,1001.0,1001.0]      []      {"Federal Taxes":0.15,"State
Taxes":0.03,"Insurance":0.1}        {"nameOne":123,"Name Two":49,"The Third Man":-1}     
  {"street":"300 Obscure Dr.","city":"Obscuria","state":"IL","zip":60100}
> Time taken: 0.535 seconds, Fetched: 4 row(s)
>
> Everything looks fine.
> Now I create a Hive table stored in Accumulo:
>
>          DROP TABLE IF EXISTS accumulo_table4;
>          CREATE TABLE accumulo_table4 (
>           rowid STRING,
>           flag BOOLEAN,
>           number INT,
>           bignum BIGINT,
>           name STRING,
>           salary FLOAT,
>           bigsalary DOUBLE,
>           numbers ARRAY<INT>,
>           floats ARRAY<DOUBLE>,
>           subordinates ARRAY<STRING>,
>           deductions MAP<STRING, FLOAT>,
>           namednumbers MAP<STRING, INT>,
>           address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>)
>          STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
>          WITH SERDEPROPERTIES('accumulo.columns.mapping' = ':rowid,person:flag#binary,person:number#binary,person:bignum#binary,person:name,person:salary#binary,person:bigsalary#binary,person:numbers#binary,person:floats,person:subordinates,deductions:*,namednumbers:*,person:address');
>
> (Note that I am only really interested in storing the values in "binary".)
> Now I can load the Accumulo table from the normal table:
>
>          INSERT OVERWRITE TABLE accumulo_table4 SELECT * FROM employees4;
>
> And I can query the data from the Accumulo table.
>
> hive>  SELECT * FROM accumulo_table4;
> OK
> row1    true    100     7       John Doe        100000.0        100000.0        [null]
 [null]  ["Mary Smith\u0003Todd Jones"]  {"Federal Taxes":0.2,"Insurance":0.1,"State Taxes":0.05}
       {"Name Two":49,"The Third Man":-1,"nameOne":123} {"street":"1 Michigan Ave.\u0003Chicago\u0003IL\u000360600","city":null,"state":null,"zip":null}
> row2    false   7       100     Mary Smith      100000.0        80000.0 [null]  [null]
 ["Bill King"]   {"Federal Taxes":0.2,"Insurance":0.1,"State Taxes":0.05}        {"Name Two":49,"The
Third Man":-1,"nameOne":123} {"street":"100 Ontario St.\u0003Chicago\u0003IL\u000360601","city":null,"state":null,"zip":null}
> row3    false   3245    877878  Todd Jones      100000.0        70000.0 [null]  [null]
 []      {"Federal Taxes":0.15,"Insurance":0.1,"State Taxes":0.03}       {"Name Two":49,"The
Third Man":-1,"nameOne":123} {"street":"200 Chicago Ave.\u0003Oak Park\u0003IL\u000360700","city":null,"state":null,"zip":null}
> row4    true    877878  3245    Bill King       100000.0        60000.0 [null]  [null]
 []      {"Federal Taxes":0.15,"Insurance":0.1,"State Taxes":0.03}       {"Name Two":49,"The
Third Man":-1,"nameOne":123} {"street":"300 Obscure Dr.\u0003Obscuria\u0003IL\u000360100","city":null,"state":null,"zip":null}
> Time taken: 0.109 seconds, Fetched: 4 row(s)
>
> Notice that the columns with type ARRAY<INT>and ARRAY<DOUBLE>  are empty.
> I assume that this means that there is something wrong and the Hive Storage Handler is
returning a null?
> When I use the accumulo shell to look at the data stored in Accumulo
>
> root@accumulo>  scan -t accumulo_table4
> row1 deductions:Federal Taxes []    0.2
> row1 deductions:Insurance []    0.1
> row1 deductions:State Taxes []    0.05
> row1 namednumbers:Name Two []    49
> row1 namednumbers:The Third Man []    -1
> row1 namednumbers:nameOne []    123
> row1 person:address []    1 Michigan Ave.\x03Chicago\x03IL\x0360600
> row1 person:bignum []    \x00\x00\x00\x00\x00\x00\x00\x07
> row1 person:bigsalary []    @\xF8j\x00\x00\x00\x00\x00
> row1 person:flag []    \x01
> row1 person:floats []    3.14159\x032.71828\x03-1.1\x031001.0
> row1 person:name []    John Doe
> row1 person:number []    \x00\x00\x00d
> row1 person:numbers []    \x00\x00\x00\x0D\x03\x00\x00\x00\x17\x03\xFF\xFF\xFF\xFF\x03\x00\x00\x03\xE9
> row1 person:salary []    G\xC3P\x00
> row1 person:subordinates []    Mary Smith\x03Todd Jones
>
> This shows that the columns of type INT and FLOAT have been converted to binary, which
is great.
> However the column with type ARRAY<INT>  has had the individual values converted,
but still has the field separator (0x03) present.
> I thought that this might just be a conversion problem and so I hacked the Accumulo table
to have the "correct" value:
>
> row1 person:numbers []    \x00\x00\x00\x0D\x00\x00\x00\x17\xFF\xFF\xFF\xFF\x00\x00\x03\xE9
>
> However when I run the query the numbers field is still "[null]"
> I'm happy to arrange to store whatever is need in Accumulo to make it work, I just need
to know what that is.
>
> The second issue is to do with the MAP<STRING,INT>  column, in this case called
namednumbers.
> As you can so far it works fine and I am very happy :)
> However, as I stated before, I really want everything stored in binary.
> However when I change the table defintion to have a #binary I get an error:
>
>          hive>  CREATE TABLE accumulo_table4 (
>                  >   rowid STRING,
>                  >   flag BOOLEAN,
>                  >   number INT,
>                  >   bignum BIGINT,
>                  >   name STRING,
>                  >   salary FLOAT,
>                  >   bigsalary DOUBLE,
>                  >   numbers ARRAY<INT>,
>                  >   floats ARRAY<DOUBLE>,
>                  >   subordinates ARRAY<STRING>,
>                  >   deductions MAP<STRING, FLOAT>,
>                  >   namednumbers MAP<STRING, INT>,
>                  >   address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>)
>                  >  STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
>                  >  WITH SERDEPROPERTIES('accumulo.columns.mapping' = ':rowid,person:flag#binary,person:number#binary,person:bignum#binary,person:name,person:salary#binary,person:bigsalary#binary,person:numbers#binary,person:floats,person:subordinates,deductions:*,namednumbers:*#binary,person:address');
>          FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
java.lang.IllegalArgumentException: Expected map encoding for a map specification, namednumbers:*
with encoding binary
>
> I thought that maybe this is because the syntax "column_family:*#binary" is too much.
So I try using a default.
>
>          DROP TABLE IF EXISTS accumulo_table4;
>          CREATE TABLE accumulo_table4 (
>           rowid STRING,
>           flag BOOLEAN,
>           number INT,
>           bignum BIGINT,
>           name STRING,
>           salary FLOAT,
>           bigsalary DOUBLE,
>           numbers ARRAY<INT>,
>           floats ARRAY<DOUBLE>,
>           subordinates ARRAY<STRING>,
>           deductions MAP<STRING, FLOAT>,
>           namednumbers MAP<STRING, INT>,
>           address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>)
>          STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
>          WITH SERDEPROPERTIES('accumulo.columns.mapping' = ':rowid,person:flag,person:number,person:bignum,person:name,person:salary,person:bigsalary,person:numbers,person:floats,person:subordinates,deductions:*,namednumbers:*,person:address',
 "accumulo.default.storage" = "binary");
>
> This table creation works, however when I try to insert the data I get a long error message,
which follows.
> However, before that I just want to say that I'm happy to look at the source if I have
to.
> I guess that I would appreciate a pointer as to the file name/path for the Hive Storage
Manager code.
> Many thanks in advance for any help.
>
> Z
>
> PS. I never thought of using  column_family with sequence numbers in the qualifiers for
an array.
> I will try that and get back to you.
>
> Here's the conversion error:
>
>
> hive>  INSERT OVERWRITE TABLE accumulo_table4 SELECT * FROM employees4;
> Query ID = hive_20150910125252_f6fb143e-13df-4e81-98d0-fe8391025dc7
> Total jobs = 1
> Launching Job 1 out of 1
> Tez session was closed. Reopening...
> Session re-established.
>
>
> Status: Running (Executing on YARN cluster with App id application_1441875240043_0005)
>
> --------------------------------------------------------------------------------
>          VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
> --------------------------------------------------------------------------------
> Map 1                 FAILED      1          0        0        1       4       0
> --------------------------------------------------------------------------------
> VERTICES: 00/01  [>>--------------------------] 0%    ELAPSED TIME: 24.54 s
> --------------------------------------------------------------------------------
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1441875240043_0005_1_00, diagnostics=[Task
failed, taskId=task_1441875240043_0005_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error:
Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name
Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
>          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
>          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
>          at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
>          at java.security.AccessController.doPrivileged(Native Method)
>          at javax.security.auth.Subject.doAs(Subject.java:415)
>          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
>          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>          at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name
Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
>          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
>          ... 13 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name
Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
>          at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
>          ... 16 more
> Caused by: java.lang.RuntimeException: Hive internal error.
>          at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:327)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeBinary(AccumuloRowSerializer.java:368)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:270)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:288)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.getSerializedValue(AccumuloRowSerializer.java:249)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serializeColumnMapping(AccumuloRowSerializer.java:148)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serialize(AccumuloRowSerializer.java:130)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe.serialize(AccumuloSerDe.java:119)
>          at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:660)
>          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>          at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>          at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>          at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
>          at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
>          ... 17 more
> ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name
Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
>          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
>          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
>          at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
>          at java.security.AccessController.doPrivileged(Native Method)
>          at javax.security.auth.Subject.doAs(Subject.java:415)
>          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
>          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>          at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name
Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
>          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
>          ... 13 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name
Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
>          at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
>          ... 16 more
> Caused by: java.lang.RuntimeException: Hive internal error.
>          at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:327)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeBinary(AccumuloRowSerializer.java:368)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:270)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:288)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.getSerializedValue(AccumuloRowSerializer.java:249)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serializeColumnMapping(AccumuloRowSerializer.java:148)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serialize(AccumuloRowSerializer.java:130)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe.serialize(AccumuloSerDe.java:119)
>          at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:660)
>          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>          at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>          at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>          at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
>          at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
>          ... 17 more
> ], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name
Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
>          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
>          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
>          at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
>          at java.security.AccessController.doPrivileged(Native Method)
>          at javax.security.auth.Subject.doAs(Subject.java:415)
>          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
>          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>          at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name
Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
>          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
>          ... 13 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name
Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
>          at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
>          ... 16 more
> Caused by: java.lang.RuntimeException: Hive internal error.
>          at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:327)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeBinary(AccumuloRowSerializer.java:368)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:270)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:288)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.getSerializedValue(AccumuloRowSerializer.java:249)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serializeColumnMapping(AccumuloRowSerializer.java:148)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serialize(AccumuloRowSerializer.java:130)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe.serialize(AccumuloSerDe.java:119)
>          at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:660)
>          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>          at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>          at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>          at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
>          at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
>          ... 17 more
> ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name
Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
>          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
>          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
>          at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
>          at java.security.AccessController.doPrivileged(Native Method)
>          at javax.security.auth.Subject.doAs(Subject.java:415)
>          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
>          at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
>          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>          at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name
Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
>          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
>          ... 13 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name
Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
>          at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
>          at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
>          ... 16 more
> Caused by: java.lang.RuntimeException: Hive internal error.
>          at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:327)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeBinary(AccumuloRowSerializer.java:368)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:270)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:288)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.getSerializedValue(AccumuloRowSerializer.java:249)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serializeColumnMapping(AccumuloRowSerializer.java:148)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serialize(AccumuloRowSerializer.java:130)
>          at org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe.serialize(AccumuloSerDe.java:119)
>          at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:660)
>          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>          at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>          at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>          at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
>          at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
>          ... 17 more
> ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1441875240043_0005_1_00
[Map 1] killed/failed due to:null]
> DAG failed due to vertex failure. failedVertices:1 killedVertices:0
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
> hive>  DROP TABLE IF EXISTS accumulo_table4;                                     
                                                                                         
                              OK
> Time taken: 1.101 seconds
>
>
> -----Original Message-----
> From: Josh Elser [mailto:josh.elser@gmail.com]
> Sent: 08 September 2015 22:15
> To: user@hive.apache.org
> Subject: Re: Accumulo Storage Manager
>
> For the Array support: it might have just been a missed test case and is just a bug.
I don't recall how off the top of my head Arrays are intended to be serialized (if it's some
numeric counter in the Accumulo CQ or just serializing all the elements in the array into
the Accumulo Value). If it isn't working for you, feel free to open up a JIRA issue with the
details and mention me so I notice it :). I can try to help figure out what's busted, and,
if necessary, a fix.
>
> For the Map support, what are you trying to do differently? Going from memory, I believe
the support is for a fixed column family and an optional column qualifier prefix. This limits
the entries in a Map to that column family, and allows you to place multiple maps into a given
family for locality purposes (identifying the maps by qualifier-prefix, and getting Key uniqueness
from the qualifier-suffix). There isn't much flexibility in this regard for alternate serialization
approaches -- the considerations at the time were for a general-purpose schema that you don't
really have to think about (you just think SQL).
>
> - Josh
> Please consider the environment before printing this email. This message should be regarded
as confidential. If you have received this email in error please notify the sender and destroy
it immediately. Statements of intent shall only become binding when confirmed in hard copy
by an authorised signatory. The contents of this email may relate to dealings with other companies
under the control of BAE Systems Applied Intelligence Limited, details of which can be found
at http://www.baesystems.com/Businesses/index.htm.

Mime
View raw message