Return-Path: Delivered-To: apmail-lucene-hadoop-commits-archive@locus.apache.org Received: (qmail 45628 invoked from network); 18 Aug 2007 07:15:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Aug 2007 07:15:56 -0000 Received: (qmail 50275 invoked by uid 500); 18 Aug 2007 07:15:54 -0000 Delivered-To: apmail-lucene-hadoop-commits-archive@lucene.apache.org Received: (qmail 50251 invoked by uid 500); 18 Aug 2007 07:15:54 -0000 Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-commits@lucene.apache.org Received: (qmail 50242 invoked by uid 99); 18 Aug 2007 07:15:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Aug 2007 00:15:54 -0700 X-ASF-Spam-Status: No, hits=-98.8 required=10.0 tests=ALL_TRUSTED,DNS_FROM_DOB,RCVD_IN_DOB X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Aug 2007 07:15:55 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id C41765A250 for ; Sat, 18 Aug 2007 07:15:35 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: hadoop-commits@lucene.apache.org Date: Sat, 18 Aug 2007 07:15:35 -0000 Message-ID: <20070818071535.25661.26747@eos.apache.org> Subject: [Lucene-hadoop Wiki] Update of "Hbase/RDF" by InchulSong X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by InchulSong: http://wiki.apache.org/lucene-hadoop/Hbase/RDF ------------------------------------------------------------------------------ We propose an Hbase subsystem for RDF called HbaseRDF, which uses Hbase + MapReduce to store RDF data and execute queries (e.g., SPARQL) on them. We can store very sparse RDF data in a single table in Hbase, with as many columns as they need. For example, we might make a row for each RDF subject in a table and store all the properties and their values as columns in the table. - This reduces costly self-joins, which results in efficient processing of queries, although we still need self-joins for RDF path queries. + This reduces costly self-joins in answering queries asking questions on the same subject, which results in efficient processing of queries, although we still need self-joins to answer RDF path queries. We can further accelerate query performance by using MapReduce for parallel, distributed query processing. @@ -26, +26 @@ * [:InchulSong: Inchul Song] [[MailTo(icsong AT SPAMFREE gmail DOT com)]] (Database Lab. , KAIST) == Considerations == - When we store RDF data in a single Hbase table and process queries on them, an important issue we have to consider is how to reduce costly self-joins needed to process RDF path queries. + When we store RDF data in a single Hbase table and process queries on them, an important issue we have to consider is how to efficiently perform costly self-joins needed to process RDF path queries. To speed up these costly self-joins, it is natural to think about using the MapReduce framework we already have. However, in the Sawzall paper from Google, the authors say that the MapReduce framework is