Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2D489200C26 for ; Sat, 25 Feb 2017 22:02:25 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 2BC1B160B5D; Sat, 25 Feb 2017 21:02:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0258D160B50 for ; Sat, 25 Feb 2017 22:02:23 +0100 (CET) Received: (qmail 61459 invoked by uid 500); 25 Feb 2017 21:02:23 -0000 Mailing-List: contact user-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@orc.apache.org Delivered-To: mailing list user@orc.apache.org Received: (qmail 61450 invoked by uid 99); 25 Feb 2017 21:02:23 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Feb 2017 21:02:23 +0000 Received: from mail-vk0-f48.google.com (mail-vk0-f48.google.com [209.85.213.48]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id C02C01A03A6 for ; Sat, 25 Feb 2017 21:02:22 +0000 (UTC) Received: by mail-vk0-f48.google.com with SMTP id r136so27296269vke.1 for ; Sat, 25 Feb 2017 13:02:22 -0800 (PST) X-Gm-Message-State: AMke39lkj1oFZROdeuLiztHGWGCOTAAO20/UGiQ4u7qnypc8RAzY98vDj0Zsgqguj7EVo4PSMZujj5IAj/DKUg== X-Received: by 10.31.84.6 with SMTP id i6mr1799794vkb.119.1488056541702; Sat, 25 Feb 2017 13:02:21 -0800 (PST) MIME-Version: 1.0 Received: by 10.176.23.66 with HTTP; Sat, 25 Feb 2017 13:02:21 -0800 (PST) In-Reply-To: <2087776470.1331462.1488044377121@mail.yahoo.com> References: <2087776470.1331462.1488044377121.ref@mail.yahoo.com> <2087776470.1331462.1488044377121@mail.yahoo.com> From: "Owen O'Malley" Date: Sat, 25 Feb 2017 13:02:21 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Setting Null correctly To: user@orc.apache.org, Telco Phone Content-Type: multipart/alternative; boundary=001a114e25ac92f48205496129c8 archived-at: Sat, 25 Feb 2017 21:02:25 -0000 --001a114e25ac92f48205496129c8 Content-Type: text/plain; charset=UTF-8 On Sat, Feb 25, 2017 at 9:39 AM, Telco Phone wrote: > Give the code here I am trying to find the correct way to set null to > various vectors > > In the case of Long or Bytes vectors, how do you correctly set nulls ? > > Lines in question are > > col1.isNull[4] = Boolean.TRUE; <--- does not set to null but sets to 0 > in output > col2.isNull[4] = Boolean.TRUE; <--- throws error on write > It is easier to use "true" instead of "Boolean.TRUE": col1.isNull[4] = true; col2.isNull[4] = true; You also need to set ColumnVector.noNulls http://orc.apache.org/api/hive-storage-api/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.html#noNulls to false: col1.noNulls = false; col2.noNulls = false; .. Owen > > > Thanks in advance > > void example() { > > String s = "struct<_col0:bigint,_col1:string>"; > > TypeDescription schema = TypeDescription.fromString(s); > > > // Build col0 > > LongColumnVector col1 = new LongColumnVector(5); > col1.init(); > col1.vector[0] = 9L; > col1.vector[1] = 9L; > col1.vector[2] = 9L; > col1.vector[3] = 9L; > col1.isNull[4] = Boolean.TRUE; > > > // Build col1 > > BytesColumnVector col2 = new BytesColumnVector(); > col2.init(); > col2.initBuffer(); > > byte[] byteString = "Test0".getBytes(); > col2.setVal(0, byteString, 0, byteString.length); > > byteString = "Test1".getBytes(); > col2.setVal(1, byteString, 0, byteString.length); > > byteString = "Test2".getBytes(); > col2.setVal(2, byteString, 0, byteString.length); > > byteString = "Test3".getBytes(); > col2.setVal(3, byteString, 0, byteString.length); > > byteString = null; > > col2.isNull[4] = Boolean.TRUE; > > > VectorizedRowBatch batch = schema.createRowBatch(); > batch.cols[0] = col1; > batch.cols[1] = col2; > > batch.size=5; > > > try { > File f = new File("/tmp/my-file.orc"); > f.delete(); > > Configuration conf = new Configuration(); > Writer writer = OrcFile.createWriter(new > Path("/tmp/my-file.orc"), OrcFile.writerOptions(conf).setSchema(schema)); > writer.addRowBatch(batch); > writer.close(); > > > > } catch (Exception e) { > e.printStackTrace(); > } > } > --001a114e25ac92f48205496129c8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Sat, Feb 25, 2017 at 9:39 AM, Telco Phone <telco5@yahoo.com><= /span> wrote:
Give = the code here I am trying to find the correct way to set null to various ve= ctors

In the case of Long or Bytes vectors, how do you= correctly set nulls ?

Lines in question are=C2=A0

col1.isNull[4] = =3D Boolean.TRUE; =C2=A0 =C2=A0<--- does not set to null but sets to 0 i= n output
col2.isNull[4] =3D Boolean.TRUE; =C2=A0 <--- throws error on= write=C2=A0

It is= easier to use "true" instead of "Boolean.TRUE":
<= div>col1.isNull[4] =3D true;
col2.isNull[4] =3D true;
<= br>
You also need to set ColumnVector.noNulls http://orc.apache.org/api/hive-storage-api/org/apa= che/hadoop/hive/ql/exec/vector/ColumnVector.html#noNulls to false:

col1.noNulls =3D false;
col2.noNulls =3D fal= se;

.. Owen
=C2=A0


Thanks in advance<= /span>

void example() {

String s =3D "str= uct<_col0:bigint,_col1:string>";

=C2=A0 =C2=A0 =C2=A0 =C2=A0 TypeDescription schema =3D Ty= peDescription.fromString(s);

=
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 // Build col0

=C2=A0= =C2=A0 =C2=A0 =C2=A0 LongColumnVector col1 =3D new LongColumnVector(5);
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 col1.init();
=C2=A0 =C2=A0 =C2=A0 =C2=A0 col1.vector[0] =3D 9L;
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 col1.vector[1] =3D 9L;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 col1.vector[2] =3D 9L;
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 col1.vector[3] =3D 9L;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 col1.isNull[4] =3D Boo= lean.TRUE;
<= div style=3D"color:rgb(34,34,34);font-family:arial,sans-serif" id=3D"gmail-= m_8709523185217287359yui_3_16_0_1_1488041204763_35653">
=C2=A0 =C2=A0 =C2=A0 =C2=A0 // Build = col1

=C2=A0 =C2=A0 =C2=A0 =C2= =A0 BytesColumnVector col2 =3D new BytesColumnVector();
=C2=A0 =C2=A0 =C2=A0 =C2=A0 col2.= init();
=C2=A0= =C2=A0 =C2=A0 =C2=A0 col2.initBuffer();

=C2=A0 =C2=A0 =C2=A0 =C2=A0 byte[] byteString =3D "Test0&quo= t;.getBytes();
=C2=A0 =C2=A0 =C2=A0 =C2=A0 col2.setVal(0, byteString, 0, byteString.lengt= h);

=C2=A0 =C2=A0 =C2=A0 =C2=A0= byteString =3D "Test1".getBytes();
=C2=A0 =C2=A0 =C2=A0 =C2=A0 col2.setVal(1, = byteString, 0, byteString.length);

=C2=A0 =C2=A0 =C2=A0 =C2=A0 byteString =3D "Test2".getBytes()= ;
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 col2.setVal(2, byteString, 0, byteString.length);

=C2=A0 =C2=A0 =C2=A0 =C2=A0 byteString= =3D "Test3".getBytes();
=C2=A0 =C2=A0 =C2=A0 =C2=A0 col2.setVal(3, byteString,= 0, byteString.length);

=C2=A0 = =C2=A0 =C2=A0 =C2=A0 byteString =3D null;

=C2=A0 =C2=A0 =C2=A0 =C2=A0 col2.isNull[4] =3D Boole= an.TRUE;


=C2=A0 =C2=A0 =C2=A0 =C2=A0 VectorizedR= owBatch batch =3D schema.createRowBatch();
=C2=A0 =C2=A0 =C2=A0 =C2=A0 batch.cols[0] =3D = col1;
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 batch.cols[1] =3D col2;

=C2=A0 =C2=A0 =C2=A0 =C2=A0 batch.size=3D5;


=C2=A0 =C2=A0 =C2=A0 =C2=A0 try {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 F= ile f =3D new File("/tmp/my-file.orc");
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 f.delete();

=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Configuration conf =3D new Configuration();
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 Writer writer =3D OrcFile.createWriter(new Path(&q= uot;/tmp/my-file.orc"), OrcFile.writerOptions(conf).setSchema(sch= ema));
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 writer.addRowBatch(batch);
=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 writer.close();

=

=C2=A0 =C2=A0 =C2=A0 =C2=A0 } catc= h (Exception e) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 e.printStackTrace();
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 }
}

--001a114e25ac92f48205496129c8--