Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 39296 invoked from network); 9 Aug 2010 20:07:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Aug 2010 20:07:46 -0000 Received: (qmail 75365 invoked by uid 500); 9 Aug 2010 20:07:45 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 75239 invoked by uid 500); 9 Aug 2010 20:07:45 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 75231 invoked by uid 99); 9 Aug 2010 20:07:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Aug 2010 20:07:45 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of heyongqiangict@gmail.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Aug 2010 20:07:38 +0000 Received: by wyb35 with SMTP id 35so9641003wyb.35 for ; Mon, 09 Aug 2010 13:07:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=k04IMxs+01sbKxqvnrH4W28g9cZwiwn7ylKSJ9sIqis=; b=o9OIVlwffXPzl98ZpGL+LOGuUhJ2iDy7ewlCIfbIGtYu8GUpOL06KwWOm7Cp2LWVGZ aLYaJH5Vm2uFIwbVwM5yTtFqwN66jC5xQYwFL5OrE0BX2IjcXtopXC8tsyl2Ycpf2RUU ZfV8NDiebjYSYPUJd1aZ77Fnyds/szsSORqnM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=IWWhDx7DpUPBXqZpZnAeX5ZKFzD92fq/dJC7589E/rMwTIeJVQJdLLNAALcWmqQADX 2F/gtBuhnxkg8cq8biS1zRXf3hgreDO1ld4sDX6YnnBMiGyPXarLpzodSmsAdRo3iPsa gC/gPgZsiPDgNTZm6BTVYSzdCuebZa2Mp0OEQ= MIME-Version: 1.0 Received: by 10.216.5.83 with SMTP id 61mr3058779wek.95.1281384438193; Mon, 09 Aug 2010 13:07:18 -0700 (PDT) Received: by 10.216.81.209 with HTTP; Mon, 9 Aug 2010 13:07:17 -0700 (PDT) In-Reply-To: References: <362A2DDD005E584BAB2BCE3E2FE0E12212D7845378@SP2-EX07VS02.ds.corp.yahoo.com> Date: Mon, 9 Aug 2010 13:07:17 -0700 Message-ID: Subject: Re: How are nulls represented in data? From: yongqiang he To: hive-user@hadoop.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Yes. In LazySimpleSerde/SequenceFile/TextFile, "\N" is used as NULL. (It is a table property: serialization.null.format) In ColumnSerDe/RCFile, there is no NULL stored. (zero byte, column byte length is zero). But RCFile/ColumnarSerde also use this property when do serializing to determine if a column is a null or not. ( This is unavoidable because client can only pass a string to serde and let serde serialize it. need some special charater to represent NULL). On Mon, Aug 9, 2010 at 11:46 AM, Ning Zhang wrote: > How it is serialized/deserialized is determined by specific serde.=A0NULL= is > serialized as \N by SimpleLazySerDe (default serde for text).=A0RCFile > (ColumnarSerDe) uses the same default parameters as LazySimpleSerDe. > Unless I missed something, NULL serialization/deserialization is type > independent (at least in LazySimpleSerDe). > On Aug 9, 2010, at 9:42 AM, Pradeep Kamath wrote: > > Hi, > =A0=A0 What value does hive expect in the data for a column to be treated= as > null? I tried some permutations on a text data based table but couldn=92t > figure out what the correct representation was. I tried empty string, the > string NULL and the string null for a string column and in all three case= s > the =93is null=94 operator returned false. > > A couple of related questions: > =A0- Does the representation of null depend on the type of the column =96= is it > different for string Vs non-string columns? > =A0- Is the representation of null different for different storage format= s =96 > text Vs RCFile Vs SequenceFile =96 I am particularly interested in text a= nd > RCFile. > > Thanks in advance, > > Pradeep >