hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Ferguson <>
Subject Re: OLAP with Hive
Date Sun, 14 Dec 2008 23:51:48 GMT
What would columnar organization look like and what are the benefits  
and drawbacks to this?


On Dec 14, 2008, at 3:13 PM, Joydeep Sen Sarma wrote:

> That’s a hard one. We can wish whatever we want to – but I guess  
> it’s all a question of who has the resources to contribute to it and  
> what they want from Hive.
> I can speak a little bit about Facebook. The reason we invested in  
> indexing was not that it was the primary usage (or even a bottleneck  
> for, say, performance optimization) – but because once you have so  
> much data in one place – chances are that someone will come along  
> and want to have quick lookups over some part of it (and u don’t  
> want to kill ur cluster by doing scans all the time). So that  
> definitely makes indexing useful. We are also seeing that with  
> dimensional analysis – where there is a need to drill down into  
> detailed data – multidimensional indexes can be very useful. So in  
> the long term – I think this is one of the desired features.
> That doesn’t make it akin to hbase though (in the sense that we  
> still wouldn’t have row level updates or real-time index updates).  
> Katta may be complimentary and we were actually interested in  
> investigating it for indexing (instead of rolling things from  
> scratch).
> Columnar organization is also very interesting. With all the hooks  
> in hadoop (inputformatters) and hive(serdes) – I think it’s fairly  
> tractable to do this ..
> From: Josh Ferguson []
> Sent: Sunday, December 14, 2008 1:20 PM
> To:
> Subject: Re: OLAP with Hive
> I'd honestly like to see hive remain a partitioned flat file store.  
> I don't think indexing what's inside the files is too incredibly  
> useful in most situations where you'd use hive. I also think this  
> kind of store is just the right fit for the hadoop and large scale  
> analytics situation. I don't want to see hive go toward hbase or  
> katta. What is the long term vision for hive?
> Josh
> On Dec 14, 2008, at 1:06 PM, Joydeep Sen Sarma wrote:
> We have done some preliminary work with indexing – but that’s not  
> the focus right now and no code is available in the open source  
> trunk for this purpose. I think it’s fair to say that hive is not  
> optimized for online processing right now. (and we are quite some  
> ways off from columnar storage).
> From: Martin Matula []
> Sent: Sunday, December 14, 2008 6:54 AM
> To:
> Subject: OLAP with Hive
> Hi,
> Is Hive capable of indexing the data and storing them in a way  
> optimized for querying (like a columnar database - bitmap indexes,  
> compression, etc.)?
> I need to be able to get decent response times for queries (up to a  
> few seconds) over huge amounts of analytical data. Is that  
> achievable (with appropriate number of machines in a cluster)? I saw  
> the serialization/deserialization of tables is pluggable. Is that  
> the way to make the storage more efficient? Any existing  
> implementation (either ready or in progress) that would be targeted  
> at this? Or any hints on what I may want to take a look at among the  
> things that are currently available in Hive/Hadoop?
> Thanks,
> Martin

View raw message