hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: ORC vs TEXT file
Date Mon, 12 Aug 2013 15:20:27 GMT
Pandees,
  I've never seen a table that was larger with ORC than with text. Can you
share your text's file schema with us? Is the table very small? How many
rows and GB are the tables? The overhead for ORC is typically small, but as
Ed says it is possible for rare cases for the overhead to dominate the data
size itself.

-- Owen


On Mon, Aug 12, 2013 at 6:52 AM, pandees waran <pandeesh@gmail.com> wrote:

> Thanks Edward.  I shall try compression besides orc and let you know. And
> also,  it looks like the cpu  usage is lesser while querying orc rather
> than text file.
> But the total time taken by the query time is slightly more in orc than
> text file.  Could you please explain the difference between cumulative cpu
> time and the total time taken (usually in last line in terms or secs)?
> Which one should we give preference?
> On Aug 12, 2013 7:01 PM, "Edward Capriolo" <edlinuxguru@gmail.com> wrote:
>
>> Colmnar formats do not always beat row wise storage. Many times gzip plus
>> block storage will compress something better then columnar storage
>> especially when you have repeated data in different columns.
>>
>> Based on what you are saying it could be possible that you missed a
>> setting and the ocr are not compressed.
>>
>>
>> On Monday, August 12, 2013, pandees waran <pandeesh@gmail.com> wrote:
>> > Hi,
>> >
>> > Currently, we use TEXTFILE format in hive 0.8 ,while creating the
>> > external tables in intermediate processing .
>> > I have read about ORC in 0.11. I have created the same table in 0.11
>> > with ORC format.
>> > Without any compression, the ORC file(totally 3 files) occupied the
>> > space twice more than the TEXTFILE(only one file).
>> > Even, when i query the data from ORC:
>> > Select count(*) from orc_table
>> >
>> > It took more time than the same query against textfile.
>> > But, i see cumulative CPU time is lesser in ORC than the text file.
>> >
>> > What sort of queries will benefit, if we use ORC?
>> > In which cases TEXTFILE will be preferred more than ORC?
>> >
>> > Thanks.
>> >
>
>

Mime
View raw message