asterixdb-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From schul...@informatik.hu-berlin.de
Subject unable to load external data
Date Tue, 20 Oct 2015 12:40:46 GMT
Hello,

I have done a cluster setup of AsterixDB on four nodes. Everyhing is
running fine and I want to load some data into the system to run sum
bigger examples. However I am unable to do so using the description at

https://asterixdb.ics.uci.edu/documentation/aql/externaldata.html

I created a dataverse, a datatype and a dataset as follows:

create dataverse tpch;

use dataverse tpch
create type LineitemType as closed {
      orderkey:int32,
      partkey: int32,
      suppkey: int32,
      linenumber: int32,
      quantity: double,
      extendedprice: double,
      discount: double,
      tax: double,
      returnflag: string,
      linestatus: string,
      shipdate: string,
      commitdate: string,
      receiptdate: string,
      shipinstruct: string,
      shipmode: string,
      comment: string}

create dataset lineitem(LineitemType) if not exists primary key orderkey,
linenumber

as described on the homepage linked above there are two ways to load data
from, using either a reachable HDFS or the localFS. I have a running HDFS
within the same network containing the data I want to access and tried to
reach it like this:

load dataset lineitem using hdfs
(("hdfs"="hdfs://192.168.127.11:50040"),
("path"="/user/schultzem/lineitem.tbl"),
("input-format"="text-input-format"),
("format"="delimited-text"),
("delimiter"="|"));

However I get an error message

Unable to create adapter org.apache.hadoop.ipc.RemoteException: Server IPC
version 9 cannot communicate with client version 3 [AlgebricksException]

all I found out about this was an old Issue from 2013 that recommends an
older version of hadoop, which is not an option for me.

https://code.google.com/p/asterixdb/issues/detail?id=521

Is this somehow fixable?

The other option to load data from the localFS also throws an error.

load dataset lineitem using localfs
(("path"="192.168.127.21:///home/schultzem/tpch/TPCH_data_10GB/lineitem.tbl"),
    ("format"="delimited-text"),
    ("delimiter"="|"));

leads to

No node controllers found at the address: 192.168.127.21 [AsterixException]

which is the same error as for 127.0.0.1.

On the linked documentation about external datasets it is assumed that
AsterixDB is used in local mode. Is this the problem why I cannot reach
the cluster nodes?

Did I make a mistake accessing the data? How can I load data into the
database?

Regards, Max


Mime
View raw message