Return-Path: X-Original-To: apmail-tajo-user-archive@minotaur.apache.org Delivered-To: apmail-tajo-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 20E3711B1C for ; Sat, 23 Aug 2014 17:27:28 +0000 (UTC) Received: (qmail 60327 invoked by uid 500); 23 Aug 2014 17:27:24 -0000 Delivered-To: apmail-tajo-user-archive@tajo.apache.org Received: (qmail 60202 invoked by uid 500); 23 Aug 2014 17:27:24 -0000 Mailing-List: contact user-help@tajo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@tajo.apache.org Delivered-To: mailing list user@tajo.apache.org Received: (qmail 59947 invoked by uid 99); 23 Aug 2014 17:27:24 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Aug 2014 17:27:24 +0000 Received: from localhost (HELO mail-qg0-f49.google.com) (127.0.0.1) (smtp-auth username hyunsik, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Aug 2014 17:27:23 +0000 Received: by mail-qg0-f49.google.com with SMTP id j107so11457131qga.22 for ; Sat, 23 Aug 2014 10:27:23 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.224.161.140 with SMTP id r12mr19135068qax.74.1408814843092; Sat, 23 Aug 2014 10:27:23 -0700 (PDT) Received: by 10.96.166.38 with HTTP; Sat, 23 Aug 2014 10:27:23 -0700 (PDT) In-Reply-To: <3ABF3BCE-C299-4ADF-904E-E0642E5FAB4C@gmx.com> References: <3ABF3BCE-C299-4ADF-904E-E0642E5FAB4C@gmx.com> Date: Sun, 24 Aug 2014 02:27:23 +0900 Message-ID: Subject: Re: USING Parquet From: Hyunsik Choi To: tajo-user Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Chris, Currently, Parquet file format does not support Timestamp, Date, and Time data type. Parquet community currently is working on those data types. So, besides Tajo, other systems do not support those data types. Now, you need to use TEXT data type to handle timestamp if you must use Parquet format. If so, you can achieve the same features by using date/time functions. Thanks, Hyunsik On Sun, Aug 24, 2014 at 12:59 AM, Christian Schwabe wrote: > Hello together, > > with large test data (.csv > 5 GB) I now wanted to do some tests. > Unfortunately it fails again pretty early. I have an EXTERNAL TABLE appli= ed > as follows: > > CREATE EXTERNAL TABLE dfkklocks_hist > ( > validfrom timestamp, > validthru timestamp, > client text, > loobj1 text, > lotyp text, > proid text, > lockr text, > fdate date, > tdate date, > gpart text, > vkont text, > cond_loobj text, > actkey text, > uname text, > adatum date, > azeit text, > protected text, > laufd date, > laufi text > ) > using csv with ('csvfile.delimiter'=3D'~') location =E2=80=9Afile:path/to= /csv/file; > > Then I create a table with the suffix *_internal and the parquet type as > follows: > > CREATE TABLE dfkklocks_hist_internal > ( > validfrom timestamp, > validthru timestamp, > client text, > loobj1 text, > lotyp text, > proid text, > lockr text, > fdate date, > tdate date, > gpart text, > vkont text, > cond_loobj text, > actkey text, > uname text, > adatum date, > azeit text, > protected text, > laufd date, > laufi text > ) using parquet; > > > This csv-file contains records such as these: > 2014-08-19 21:03:32.78~9999-12-31 > 23:59:59.999~200~0000000000530010000053~06~01~5~2005-12-31~9999-12-31~001= 0000053~000000000053~~~FREITAG~2006-06-01~125611~~1800-01-01~ > > Now I would like to insert content from cdv-file to the table using parqu= et > as follows:: > contract> INSERT INTO dfkklocks_hist_internal SELECT * FROM dfkklocks_his= t; > ERROR: Cannot convert Tajo type: TIMESTAMP > java.lang.RuntimeException: Cannot convert Tajo type: TIMESTAMP > at > org.apache.tajo.storage.parquet.TajoSchemaConverter.convertColumn(TajoSch= emaConverter.java:191) > at > org.apache.tajo.storage.parquet.TajoSchemaConverter.convert(TajoSchemaCon= verter.java:150) > at > org.apache.tajo.storage.parquet.TajoWriteSupport.(TajoWriteSupport.= java:54) > at > org.apache.tajo.storage.parquet.TajoParquetWriter.(TajoParquetWrite= r.java:80) > at > org.apache.tajo.storage.parquet.ParquetAppender.init(ParquetAppender.java= :75) > at > org.apache.tajo.engine.planner.physical.StoreTableExec.init(StoreTableExe= c.java:69) > at org.apache.tajo.worker.Task.run(Task.java:423) > at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:425) > at java.lang.Thread.run(Thread.java:745) > > In TajoSchemaConverter.java it looks as if it would not be possible to us= e a > Tajo timestamp in parquet. Am I right with the assumption? > Change the timestamp value (see example data set) also did not bring me t= o > success. I had, at first, the assumption that the timestamp is not valid. > But timestamp values like eg: 1970-00-00 00: 00: 00.000 or 1971-01-01 01: > 01: 01 000 showed no change in behavior. > Are my conclusions thus far correct? Is this an outstanding bug? Am I doi= ng > something wrong maybe? What chance would there still that could lead me t= o > the goal that I have not yet listed here? > > private Type convertColumn(Column column) { > TajoDataTypes.Type type =3D column.getDataType().getType(); > switch (type) { > case BOOLEAN: > return primitive(column.getSimpleName(), > PrimitiveType.PrimitiveTypeName.BOOLEAN); > case BIT: > case INT2: > case INT4: > return primitive(column.getSimpleName(), > PrimitiveType.PrimitiveTypeName.INT32); > case INT8: > return primitive(column.getSimpleName(), > PrimitiveType.PrimitiveTypeName.INT64); > case FLOAT4: > return primitive(column.getSimpleName(), > PrimitiveType.PrimitiveTypeName.FLOAT); > case FLOAT8: > return primitive(column.getSimpleName(), > PrimitiveType.PrimitiveTypeName.DOUBLE); > case CHAR: > case TEXT: > return primitive(column.getSimpleName(), > PrimitiveType.PrimitiveTypeName.BINARY, > OriginalType.UTF8); > case PROTOBUF: > return primitive(column.getSimpleName(), > PrimitiveType.PrimitiveTypeName.BINARY); > case BLOB: > return primitive(column.getSimpleName(), > PrimitiveType.PrimitiveTypeName.BINARY); > case INET4: > case INET6: > return primitive(column.getSimpleName(), > PrimitiveType.PrimitiveTypeName.BINARY); > default: > throw new RuntimeException("Cannot convert Tajo type: " + type); > } > } > > I'm really thankful that there is a community like you guys out there tha= t > fix a support in such errors together. > Have a nice weekend. > > Best regards, > Chris