hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HFile vs Parquet for very wide table
Date Thu, 21 Jan 2016 23:25:32 GMT
I have very limited knowledge on Parquet, so I can only answer from HBase
point of view.

Please see recent thread on number of columns in a row in HBase:

http://search-hadoop.com/m/YGbb3NN3v1jeL1f

There're a few Spark hbase connectors.
See this thread:

http://search-hadoop.com/m/q3RTt4cp9Z4p37s

Sorry I cannot answer performance comparison question.

Cheers

On Thu, Jan 21, 2016 at 2:43 PM, Krishna <research800@gmail.com> wrote:

> We are evaluating Parquet and HBase for storing a dense & very, very wide
> matrix (can have more than 600K columns).
>
> I've following questions:
>
>    - Is there is a limit on # of columns in Parquet or HFile? We expect to
>    query [10-100] columns at a time using Spark - what are the performance
>    implications in this scenario?
>    - HBase can support millions of columns - anyone with prior experience
>    that compares Parquet vs HFile performance for wide structured tables?
>    - We want a schema-less solution since the matrix can get wider over a
>    period of time
>    - Is there a way to generate wide structured schema-less Parquet files
>    using map-reduce (input files are in custom binary format)?
>
> What other solutions other than Parquet & HBase are useful for this
> use-case?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message