hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tucker, Matt" <Matt.Tuc...@disney.com>
Subject RE: Need a smart way to delete the first row of my data
Date Wed, 07 Mar 2012 16:34:26 GMT
Hi Dan,

If you're guaranteed that the NumericID column will have an integer value, you can keep the
column type as INT, and filter on 'WHERE NumericID IS NOT NULL'.  This will filter out record1
in your example, which was your header row.

Matt Tucker

From: Dan Y [mailto:dan.m.yelle@gmail.com]
Sent: Wednesday, March 07, 2012 11:29 AM
To: user@hive.apache.org
Subject: Re: Need a smart way to delete the first row of my data

Hi guys - Thank you both.

I was trying to avoid having to define two different schema for my data.

For example, let's say the first three rows of my data were:
record1:   NumericID
record2:   1234
record3:   5678

The schema I want associated with my data is: NumericID int

But I think to implement the current suggestion I would have to first use an intermediate
schema:  NumericID string
(This will allow all 3 records to validate against the schema, whereas record1 would have
violated my desired schema).

Then delete the Header row as Matt suggested.  Lastly, associate the desired schema with the
cleanded table:  NumericID int

Matt, you are probably already aware of this and that is why you called it a "least-worst"
solution.  But it is better than what I am doing now, so I think I will use it.

Thanks again!

Mime
View raw message