hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Vertical partitioning
Date Thu, 17 Jun 2010 16:02:11 GMT
On Thu, Jun 17, 2010 at 3:00 AM, jaydeep vishwakarma <
jaydeep.vishwakarma@mkhoj.com> wrote:

> Just looking opportunity and feasibility for it. In one of my table have
> more than 20 fields where most of the time I need only 10 main fields. We
> rarely need other fields for day to day analysis.
>
> Regards,
> Jaydeep
>
>
> Ning Zhang wrote:
>
> Hive support columnar storage (RCFile) but not vertical partitioning. Is
> there any use case for vertical partitioning?
>
> On Jun 16, 2010, at 6:41 AM, jaydeep vishwakarma wrote:
>
>
>
> Hi,
>
> Does hive support Vertical partitioning?
>
> Regards,
> Jaydeep
>
>
>
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify us
> immediately by responding to this email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of the
> information contained in this communication nor for any delay in its
> receipt.
>
>
>
>
>
>
>
> ________________________________
>
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify us
> immediately by responding to this email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of the
> information contained in this communication nor for any delay in its
> receipt.
>

Vertical partitioning is just as practical in a traditional RDBMS as it
would be in hive. Normally you would do it for a few reasons:
1) You have some rarely used columns and you want to reduce the table/row
size
2) Your DBMS has terrible blob/clob/text support and the only want to get
large objects out of your way is to put them in other tables.

If you go the option of vertical partitioning in hive, you may have to join
to select the columns you need. I do not consider row serialization and de
serialization to be the majority of a hive job, and in most cases hadoop
handles 1 large file better then two smaller ones. Then again we have some
tables 140+ columns so i can see vertical partitioning helping with those
tables but it doubles the management.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message