hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ranjith <>
Subject Re: Managed vs external tables in hive
Date Sun, 13 May 2012 20:07:48 GMT
Did you confirm this through the explain plan or through the execution of the ddl alone. And
have you tried buckets with external tables?


On May 13, 2012, at 2:33 PM, Edward Capriolo <> wrote:

> The original design docs say you can not build indexes on external tables but I tried
it in 0.8.x and confirmed you can.
> On Sunday, May 13, 2012, Ranjith <ranjith.raghunat> wrote:
> > Indexes can be built on tables managed by hive. For external tables I do not believe
that to be true. Please feel to correct if I am wrong.
> >
> > Thanks,
> > Ranjith
> > On May 12, 2012, at 9:24 PM, Nanda Vijaydev <> wrote:
> >
> > In hive, the raw data is in HDFS and there is a metadata layer that defines the
structure of the raw data. Table is usually a reference to metadata, probably in a mySQL server
and it contains a reference to the location of the data in HDFS, type of delimiter or serde
to use and so on.  
> > 1. With hive managed tables, when you drop a table, both the metadata in mysql and
raw data on the cluster gets deleted. 
> > 2. With external tables, when you drop a table, just the metadata gets deleted and
the raw data continues to exist on the cluster. 
> >  
> > On Thu, May 10, 2012 at 3:02 PM, David Kulp <> wrote:
> >>
> >> It's simpler than this.  All files look the same -- and are often very simple
delimited text -- whether managed or external.  The only difference is that the files associated
with a managed table are dropped when the table is dropped and files that are loaded into
a managed table are moved into hive's private path.  External tables never move or remove
files.  Performance is the same.
> >>
> >> On May 10, 2012, at 5:52 PM, wrote:
> >>
> >> > I am pretty new to hive and was trying to clearly understand the difference
between a managed and an external table.
> >> >
> >> > As my current understanding stands, a managed table is a table whose data
is completely owned by hive whereas an external table is usually created to have a hive frontend
for the data managed in external systems.I would suppose this would mean that a query on an
external table goes out to fetch data from the given external table, deserialize according
to the given/suitable SerDe and then show the output of the query in hive format.
> >> >
> >> > So does this mean that cost of using external tables is much higher than
the native ones? Or is there some caching that comes into play that I am not seeing right
> >> >
> >> > Thanks for the help.
> >> >
> >> > --
> >> > Swarnim
> >>
> >
> >

View raw message