hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: Continuous log analysis requires 'dynamic' partitions, is that possible?
Date Wed, 25 Jul 2012 08:51:08 GMT
@Puneet Khatod : I found that out. And that's why I am asking here. I guess non
AWS users might have the same problems and a way to solve it.

@Ruslan Al-fakikh : It seems great. Is there any documentation for msck? I
will find out with the diff file but is there a wiki page or a blog post
about it? It would be best. I could not find any.

@Edward Capriolo : I now feel silly. This is clearly a better approach that
my proposed hacks. The performance impact should be negligible, even more
when ensuring partition pruning. I am using hive to 'piggy back' on an
external way of writing data. So in my case, I could indeed tell in advance
to hive where the data will be written. (Same as you say but the logic is
reverse.) I guess I skipped over alter table touch. But it would not help
me. The partitions are external. And if I add partitions, I will do it with
cron and a shell file.

Bertrand


On Tue, Jul 24, 2012 at 7:24 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> Alter table touch will create partitions even if they have no data,
> You can also just create partitions ahead of time and have your code
> "know" where to write data.
>
>
> On Tue, Jul 24, 2012 at 12:35 PM, Ruslan Al-fakikh
> <ruslan.al-fakikh@jalent.ru> wrote:
> > If you are not using Amazon take a look at this:
> >
> > https://issues.apache.org/jira/browse/HIVE-874
> >
> >
> >
> > Ruslan
> >
> >
> >
> > From: Puneet Khatod [mailto:puneet.khatod@tavant.com]
> > Sent: Tuesday, July 24, 2012 8:32 PM
> > To: user@hive.apache.org
> > Subject: RE: Continuous log analysis requires 'dynamic' partitions, is
> that
> > possible?
> >
> >
> >
> > If you are using Amazon (AWS), you can use ‘recover partitions’ to enable
> > all top level partitions.
> >
> > This will add required dynamicity.
> >
> >
> >
> > Regards,
> >
> > Puneet Khatod
> >
> >
> >
> > From: Bertrand Dechoux [mailto:dechouxb@gmail.com]
> > Sent: 24 July 2012 21:15
> > To: user@hive.apache.org
> > Subject: Continuous log analysis requires 'dynamic' partitions, is that
> > possible?
> >
> >
> >
> > Hi,
> >
> > Let's say logs are stored inside hdfs using the following file tree
> > /<logtype>/<month>/<day>.
> > So for apache, that would be :
> > /apache/01/01
> > /apache/01/02
> > ...
> > /apache/02/01
> > ...
> >
> > I would like to know how to define a table for this information. I found
> out
> > that the table should be external and should be using partitions.
> > However, I did not found any way to dynamically create the partitions. Is
> > there no automatic way to define them?
> > In that case, the partition 'template' would be <month>/<day> with the
> root
> > being apache.
> >
> > I know how to 'hack a fix' : create a script which would generate all the
> > "add partition statement" and run the resulting statements without caring
> > about the results because partitions may not exist or may already have
> been
> > added. Better, I could parse the result of 'show partition' for the table
> > and run only the relevant statement but it still feels like a hack.
> >
> > Is there any clean way to do it?
> >
> > Regards,
> >
> > Bertrand Dechoux
> >
> > Any comments or statements made in this email are not necessarily those
> of
> > Tavant Technologies.
> > The information transmitted is intended only for the person or entity to
> > which it is addressed and may
> > contain confidential and/or privileged material. If you have received
> this
> > in error, please contact the
> > sender and delete the material from any computer. All e-mails sent from
> or
> > to Tavant Technologies
> > may be subject to our monitoring procedures.
>



-- 
Bertrand Dechoux

Mime
View raw message