hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vikas Parashar <para.vi...@gmail.com>
Subject Re: Deleting empty rows from hive table through java
Date Tue, 05 Jan 2016 11:40:28 GMT
Well said Mich,

I had gone through from the same scenario in which we had done ETL out side
the hive. Once the transformation is done then we loaded all data into hive
warehouse. I think, that's the best practice, we should follow it.

Regards,
Vikas Parashar

On Tue, Jan 5, 2016 at 5:02 PM, Mich Talebzadeh <mich@peridale.co.uk> wrote:

> In would be interesting to do ETL outside of Hive by getting Data from
> Webpage to an intermediate file, pruning the empty rows and loading the
> final CSV file into Hive destination table.
>
>
>
> I am pretty sure this clean up outside of Hive would be faster compared to
> said thing in Hive
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Mich Talebzadeh [mailto:mich@peridale.co.uk]
> *Sent:* 05 January 2016 08:55
> *To:* user@hive.apache.org
> *Subject:* RE: Deleting empty rows from hive table through java
>
>
>
> Hi Sateesh,
>
>
>
> You can do the clean-up in Hive by creating a staging table in Hive,
> feeding your CSV data there and then inserting data into main table where
> COL1 is NOT NULL.
>
>
>
> Alternatively you can create your Hive table as transactional. Although I
> would say the staging table is better as you will have a full record of
> your CSV data at any time.
>
>
>
> You can of course do the pruning of data outside of Hive using a simple
> shell script with sed and awk (if you are familiar with those tools).
>
>
>
> cat CSV_FILE | '|sed -e '/^$/d'
>
>
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Sateesh Karuturi [mailto:sateesh.karuturi9@gmail.com
> <sateesh.karuturi9@gmail.com>]
> *Sent:* 05 January 2016 06:59
> *To:* user@hive.apache.org
> *Subject:* Deleting empty rows from hive table through java
>
>
>
> Hello...
>
> Anyone please help me how to delete empty rows from hive table through
> java?
>
> Thanks in advance
>

Mime
View raw message