hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dai, Jason" <jason....@intel.com>
Subject Announcement of Project Panthera: Better Analytics with SQL, MapReduce and HBase
Date Mon, 17 Sep 2012 13:55:42 GMT
Hi,

I'd like to announce Project Panthera, our open source efforts that showcase better data analytics
capabilities on Hadoop/HBase (through both SW and HW improvements), available at https://github.com/intel-hadoop/project-panthera.

In the last several years, we have been working closely with many users and customers on their
next-gen data analytics platforms using the Hadoop stack, and specifically using HBase for
semi realtime analytics. While the Hadoop/HBase stack has laid a solid foundation for these
systems, we are still required to implement many new capabilities in building a flexible and
efficient data analytics platform (e.g., better integration with existing infrastructure using
SQL, better query processing on HBase, and efficiently utilizing new hardware platform technologies).

Project Panthera is our open source efforts to contribute these new capabilities we have built
to the Apache Hadoop community. Under Project Panthera, we will make our implementations available
at the project repo, showcasing these new capabilities; in addition, we will collaborate with
the Hadoop community (by going through the standard Apache open source process) to have some
of these ideas reviewed and hopefully incorporated into related Apache projects.

In today's first release of Project Panthera, two new capabilities are made available for
better analytical queries support:

1)      An analytical SQL engine for MapReduce (built on top of Hive)

   Under Project Panthera, we will gradually make our implementation of the SQL engine available
as an extension to Hive (https://github.com/intel-hadoop/hive-0.9-panthera). Specifically,
today's release provides support for many common SQL constructs used by our users and customers,
including some important features (e.g., sub-query in WHERE clauses, multiple-table SELECT
statement, etc.) that are not supported in Hive today. Going forward, we will also use Hive-3472<https://issues.apache.org/jira/browse/HIVE-3472>
as the umbrella JIRA to track our efforts to get the SQL engine idea reviewed and hopefully
incorporated into Apache Hive.

2)      A document store (built on top of HBase) for better query processing
   Under Project Panthera, we will gradually make our implementation of the document store
available as an extension to HBase (https://github.com/intel-hadoop/hbase-0.94-panthera).
Specifically, today's release provides document store support in HBase by utilizing co-processors,
which brings up-to 3x reduction in storage usage and up-to 1.8x speedup in query processing.
Going forward, we will also use HBase-6800<https://issues.apache.org/jira/browse/HBASE-6800>
as the umbrella JIRA to track our efforts to get the document store idea reviewed and hopefully
incorporated into Apache HBase.

Please refer to our project github repository (https://github.com/intel-hadoop/project-panthera)
for more details on Project Panthera.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message