hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yongqiang he <>
Subject Re: How are nulls represented in data?
Date Mon, 09 Aug 2010 20:07:17 GMT
Yes. In LazySimpleSerde/SequenceFile/TextFile, "\N" is used as NULL.
(It is a table property: serialization.null.format)

In ColumnSerDe/RCFile, there is no NULL stored. (zero byte, column
byte length is zero).
But RCFile/ColumnarSerde also use this property when do serializing to
determine if a column is a null or not. ( This is unavoidable because
client can only pass a string to serde and let serde serialize it.
need some special charater to represent NULL).

On Mon, Aug 9, 2010 at 11:46 AM, Ning Zhang <> wrote:
> How it is serialized/deserialized is determined by specific serde. NULL is
> serialized as \N by SimpleLazySerDe (default serde for text). RCFile
> (ColumnarSerDe) uses the same default parameters as LazySimpleSerDe.
> Unless I missed something, NULL serialization/deserialization is type
> independent (at least in LazySimpleSerDe).
> On Aug 9, 2010, at 9:42 AM, Pradeep Kamath wrote:
> Hi,
>    What value does hive expect in the data for a column to be treated as
> null? I tried some permutations on a text data based table but couldn’t
> figure out what the correct representation was. I tried empty string, the
> string NULL and the string null for a string column and in all three cases
> the “is null” operator returned false.
> A couple of related questions:
>  - Does the representation of null depend on the type of the column – is it
> different for string Vs non-string columns?
>  - Is the representation of null different for different storage formats –
> text Vs RCFile Vs SequenceFile – I am particularly interested in text and
> RCFile.
> Thanks in advance,
> Pradeep

View raw message