hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/RDF" by InchulSong
Date Sat, 18 Aug 2007 07:15:35 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by InchulSong:
http://wiki.apache.org/lucene-hadoop/Hbase/RDF

------------------------------------------------------------------------------
  We propose an Hbase subsystem for RDF called HbaseRDF, which uses Hbase + MapReduce to store
RDF data and execute queries (e.g., SPARQL) on them.
  We can store very sparse RDF data in a single table in Hbase, with as many columns as 
  they need. For example, we might make a row for each RDF subject in a table and store all
the properties and their values as columns in the table. 
- This reduces costly self-joins, which results in efficient processing of queries, although
we still need self-joins for RDF path queries.
+ This reduces costly self-joins in answering queries asking questions on the same subject,
which results in efficient processing of queries, although we still need self-joins to answer
RDF path queries.
  
  We can further accelerate query performance by using MapReduce for 
  parallel, distributed query processing. 
@@ -26, +26 @@

   * [:InchulSong: Inchul Song] [[MailTo(icsong AT SPAMFREE gmail DOT com)]] (Database Lab.
, KAIST) 
  
  == Considerations ==
- When we store RDF data in a single Hbase table and process queries on them, an important
issue we have to consider is how to reduce costly self-joins needed to process RDF path queries.

+ When we store RDF data in a single Hbase table and process queries on them, an important
issue we have to consider is how to efficiently perform costly self-joins needed to process
RDF path queries. 
  
  To speed up these costly self-joins, it is natural to think about using 
  the MapReduce framework we already have. However, in the Sawzall paper from Google, the
authors say that the MapReduce framework is 

Mime
View raw message