hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <alanfga...@gmail.com>
Subject Re: Hive orc use case
Date Mon, 26 Sep 2016 18:20:17 GMT
As long as there is a spare worker thread this should be picked up within a few seconds.  It’s
true you can’t force it to happen immediately if other compactions are happening, but that’s
by design so that compaction work doesn’t take take too many resources.

Alan.

> On Sep 26, 2016, at 11:07, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
> 
> alter table payees compact 'minor';
> Compaction enqueued.
> OK
> 
> It queues compaction but there is no way I can force it to do compaction immediately?
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage
or destruction of data or any other property which may arise from relying on this email's
technical content is explicitly disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.
>  
> 
> On 26 September 2016 at 18:54, Alan Gates <alanfgates@gmail.com> wrote:
> alter table compact forces a compaction.  See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionCompact
> 
> Alan.
> 
> > On Sep 26, 2016, at 10:41, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
> >
> > Can the temporary table be a solution to the original thread owner issue?
> >
> > Hive streaming for example from Flume to Hive is interesting but the issue is that
one ends up with a fair bit of delta files due to transactional nature of ORC table and I
know that Spark will not be able to open the table until compaction takes place which cannot
be forced. I don't know where there is a way to enforce quick compaction..
> >
> > Thanks
> >
> > Dr Mich Talebzadeh
> >
> > LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> > http://talebzadehmich.wordpress.com
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage
or destruction of data or any other property which may arise from relying on this email's
technical content is explicitly disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.
> >
> >
> > On 26 September 2016 at 17:41, Alan Gates <alanfgates@gmail.com> wrote:
> > ORC does not store data row by row.  It decomposes the rows into columns, and then
stores pointer to those columns, as well as a number of indices and statistics, in a footer
of the file.  Due to the footer, in the simple case you cannot read the file before you close
it or append to it.  We did address both of these issues to support Hive streaming, but it’s
a low level interface.  If you want to take a look at how Hive streaming handles this you
could use it as your guide.  The starting point for that is HiveEndPoint in org.apache.hive.hcatalog.streaming.
> >
> > Alan.
> >
> > > On Sep 26, 2016, at 01:18, Amey Barve <ameybarve15@gmail.com> wrote:
> > >
> > > Hi All,
> > >
> > > I have an use case where I need to append either 1 or many rows to orcFile
as well as read 1 or many rows from it.
> > >
> > > I observed that I cannot read rows from OrcFile unless I close the OrcFile's
writer, is this correct?
> > >
> > > Why doesn't write actually flush the rows to the orcFile, is there any alternative
where I write the rows as well as read them without closing the orcFile's writer ?
> > >
> > > Thanks and Regards,
> > > Amey
> >
> >
> 
> 


Mime
View raw message