incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Incubator Wiki] Update of "HRdfStoreProposal" by udanax
Date Thu, 06 Mar 2008 02:49:21 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The following page has been changed by udanax:

The comment on the change is:
Add the HrdfStore proposal

New page:
== Abstract ==
HrdfStore will develop a Planet-Scale RDF Data Store based on [ Hadoop]
& [ Hbase].

== Proposal ==
HrdfStore will develop a Hadoop subsystem for RDF, called HrdfStore, which uses Hbase + MapReduce
to store RDF data and execute queries (e.g., SPARQL) on them.

== Background ==
We can store very sparse RDF data in a single table in Hbase, with as many columns as they
need. For example, we might make a row for each RDF subject in a table and store all the properties
and their values as columns in the table. This reduces costly self-joins in answering queries
asking questions on the same subject, which results in efficient processing of queries, although
we still need self-joins to answer RDF path queries.

We can further accelerate query performance by using MapReduce for parallel, distributed query

== Rationale ==
=== HRDF Data Loader ===

HRDF Data Loader (HDL) reads RDF data from a file, and organizes the data into a Hbase table
in such a way that efficient query processing is possible. In Hbase, we can store everything
in a single table. The sparsicy of RDF data is not a problem, because Hbase, which is a column-based
storage and adopts various compression techniques, is very good at dealing with nulls in the

=== HRDF Query Processor ===

HRDF Query Processor (HQP) executes RDF queries on RDF data stored in a Hbase table. It translates
RDF queries into API calls to Hbase, or MapReduce jobs, gathers and returns the results to
the user.

Query processing steps are as follows:

{{{SPARQL query -> Parse tree -> Logical operator tree 
-> Physical operator tree -> Execution}}}

Implemenation of each step may proceed as an individual issue.
=== HRDF Data Materializer ===

HRDF Data Materializer (HDM) pre-computes RDF path queries and stores the results into a Hbase
table. Later, HQP uses those materialized data for efficient processing of RDF path queries.

== Current Status ==

This is a new project.

== Meritocracy ==

The initial developers are very familiar with meritocratic open source development, both at
Apache and elsewhere. Apache was chosen specifically because the initial developers want to
encourage this style of development for the project.

=== Community ===

HrdfStore seeks to develop developer and user communities during incubation. 

== Core Developers ==

The initial set of committers includes folks from the [ Hadoop] &
[ Hbase] communities. We have varying degrees of experience
with Apache-style open source development, ranging from none to ASF Members. 

 * Edward Yoon, Master of mathematics, Servcie Development Center, NHN
 * Inchul Song, Ph.D. Candidate, Database Lab Division of Computer Science, KAIST

== Alignment ==

The developers of HrdfStore want to work with the Apache Software Foundation specifically
because Apache has proven to provide a strong foundation and set of practices for developing
standards-based infrastructure and server components. 

== Known Risks ==
=== Orphaned products ===
Due to its small number of committers, there is a risk of being orphaned.
=== Inexperience with Open Source ===
We has already a good experience with Apache open source development process.

=== Homogenous Developers ===
With only two core developers, at least they are not homogenous, Edward and Inchul knew each
other only due to their common interest in HrdfStore.
=== Reliance on Salaried Developers ===
Edward is a full-time open source developer at NHN, and Inchul is a Ph.D student in computer
=== Relationships with Other Apache Products ===
HrdfStore has a strong relationship with Apache [ Hadoop] & [
Hbase]. Being part of Apache could help for a closer collaboration between the three projects.

=== A Excessive Fascination with the Apache Brand ===

We believe in the processes, systems, and framework Apache has put in place. The brand is
nice, but is not why we wish to come to Apache.

== Documentation ==


== Initial Source ==
The initial source will consist of the current [
HQL] and a Java based RDF query language.

== External Dependencies ==
 * Hadoop (HDFS, Map/Reduce) License: Apache License, 2.0
 * Hbase (Sparse Matrix Table) License: Apache License, 2.0

== Required Resources ==

 * Developer and user mailing lists
 * A subversion repository
 * A JIRA issue tracker 

== Initial Committers ==
 * Edward Yoon (edward AT udanax DOT org)
 * Inchul Song (icsong AT gmail DOT com)

== Sponsors ==
=== Nominated Mentors ===
In need of mentors to volunteer.
=== Sponsoring Entity ===
The Apache Incubator. 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message