hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <alanfga...@gmail.com>
Subject Re: Query performance correlated to increase in delta files?
Date Fri, 20 Nov 2015 22:17:46 GMT
Are you running the compactor as part of your metastore?  It's 
occasionally compacts the delta files in order to reduce read time.  See 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions for 
details.

Alan.

> Sai Gopalakrishnan <mailto:sai.gopalakrishnan@aspiresys.com>
> November 19, 2015 at 21:17
>
> Hello fellow developer,
>
> Greetings!
>
> I am using Hive for querying transactional data. I transfer data from 
> RDBMS to Hive using Sqoop and prefer the ORC format for speed and its 
> ACID properties. I found out that Sqoop has no support for reflecting 
> the updated and deleted records in RDBMS and hence I am inserting 
> those modified records into the HDFS and updating/deleting the Hive 
> tables to reflect the changes. Every update/delete in the Hive table 
> results in creation of new delta files. I noticed a considerable drop 
> in speed over a period of time. I realize that lookups tend to take 
> more time with growing files. Is there any way to overcome this issue? 
> INSERT OVERWRITE the table is costly, I deal with about 1TB data, and 
> it keeps growing every day.
>
> Kindly reply with a suitable solution at the earliest.
>
> Thanks & Regards,
>
> Saisubramaniam Gopalakrishnan
>
> Aspire Systems (India) Pvt. Ltd.
>
> Aspire Systems
>
> This e-mail message and any attachments are for the sole use of the 
> intended recipient(s) and may contain proprietary, confidential, trade 
> secret or privileged information. Any unauthorized review, use, 
> disclosure or distribution is prohibited and may be a violation of 
> law. If you are not the intended recipient, please contact the sender 
> by reply e-mail and destroy all copies of the original message.
>

Mime
View raw message