hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fengjiao Jiang <grapej...@gmail.com>
Subject Use Hadoop and other Apache products for SQL query manipulations
Date Wed, 18 Jun 2014 15:26:30 GMT

We have a large data set originally stored on MS SQL and for intensive data
aggregation manipulation, we’re currently using Vertica. The thing is the
data is very large and sometimes, a “select” or “insert” query which is
very complex may needs even 10 minutes to return the correct results. (the
database size is maybe 2GB)

So we’re thinking whether we can use Hadoop together with some other Apache
Products (built on hadoop) to make the query faster.
For example, if we can use Hadoop & HBase & ZooKeeper and write MR
functions for these “SELECT” “INSERT” or complex queries like that to
improve the query speed?

Also, I don’t know if the combination I listed above is a good one, should
I use Hadoop, HBase and ZooKeepr or should I use Hadoop, Pig and Hive?

My question is mainly a “SQL-on-Hadoop” thing, would please tell me if it’s
possible and if so, would you give me some suggestions? I do appreciate it
a lot !



View raw message