Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Apache Wiki <wikidiffs@apache.org>
To: hadoop-commits@lucene.apache.org
Date: Sat, 18 Aug 2007 07:15:35 -0000
Message-ID: <20070818071535.25661.26747@eos.apache.org>
Subject: [Lucene-hadoop Wiki] Update of "Hbase/RDF" by InchulSong

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by InchulSong:
http://wiki.apache.org/lucene-hadoop/Hbase/RDF

------------------------------------------------------------------------------
  We propose an Hbase subsystem for RDF called HbaseRDF, which uses Hbase + MapReduce to store RDF data and execute queries (e.g., SPARQL) on them.
  We can store very sparse RDF data in a single table in Hbase, with as many columns as 
  they need. For example, we might make a row for each RDF subject in a table and store all the properties and their values as columns in the table. 
- This reduces costly self-joins, which results in efficient processing of queries, although we still need self-joins for RDF path queries.
+ This reduces costly self-joins in answering queries asking questions on the same subject, which results in efficient processing of queries, although we still need self-joins to answer RDF path queries.
  
  We can further accelerate query performance by using MapReduce for 
  parallel, distributed query processing. 
@@ -26, +26 @@

   * [:InchulSong: Inchul Song] [[MailTo(icsong AT SPAMFREE gmail DOT com)]] (Database Lab. , KAIST) 
  
  == Considerations ==
- When we store RDF data in a single Hbase table and process queries on them, an important issue we have to consider is how to reduce costly self-joins needed to process RDF path queries. 
+ When we store RDF data in a single Hbase table and process queries on them, an important issue we have to consider is how to efficiently perform costly self-joins needed to process RDF path queries. 
  
  To speed up these costly self-joins, it is natural to think about using 
  the MapReduce framework we already have. However, in the Sawzall paper from Google, the authors say that the MapReduce framework is