Return-Path: X-Original-To: apmail-incubator-drill-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-drill-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 450EEDFC9 for ; Sat, 15 Sep 2012 12:31:10 +0000 (UTC) Received: (qmail 60028 invoked by uid 500); 15 Sep 2012 12:31:09 -0000 Delivered-To: apmail-incubator-drill-dev-archive@incubator.apache.org Received: (qmail 59880 invoked by uid 500); 15 Sep 2012 12:31:09 -0000 Mailing-List: contact drill-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: drill-dev@incubator.apache.org Delivered-To: mailing list drill-dev@incubator.apache.org Received: (qmail 59846 invoked by uid 99); 15 Sep 2012 12:31:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Sep 2012 12:31:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of azuryyyu@gmail.com designates 209.85.214.175 as permitted sender) Received: from [209.85.214.175] (HELO mail-ob0-f175.google.com) (209.85.214.175) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Sep 2012 12:31:02 +0000 Received: by obblz20 with SMTP id lz20so7623507obb.6 for ; Sat, 15 Sep 2012 05:30:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=eB2llk3O3EzmabH2MiifIsRWcwZDsbmoCmrGvYpOG5k=; b=UX0TfUjGBIiIWU0erdmsB9u2T/sRMcudDo9aivsvytDIiNHcotlZ9ysiDbTumU9AkH t5FrSlFHI3cOzUO5zK4/SRaRiGVsiXL39vjDb8CKRzbNQWK9zX1QogSHTEtywhRioNVB 6uqThd0PlgmgjhoHFeE5dGu6VVJJS6injturduk44RRFurQKvO8SAPSNRTeCcZIPMPKB 4pvMyWCm2lIoAgjkoWBekmfbggArsWTgDDkmxByfNFoP+tFHHpBoO0DzWW6H7GkOMcRY 0MT7ThEnkzYGAON7OtQpXGY2HEcIkt2G7k8v+IIVzjPkIwTdyBB3vcdkazI+YjOWznKH BLNw== MIME-Version: 1.0 Received: by 10.60.169.100 with SMTP id ad4mr7394525oec.21.1347712241283; Sat, 15 Sep 2012 05:30:41 -0700 (PDT) Received: by 10.60.10.99 with HTTP; Sat, 15 Sep 2012 05:30:41 -0700 (PDT) Date: Sat, 15 Sep 2012 20:30:41 +0800 Message-ID: Subject: Storage file format From: Azuryy Yu To: drill-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=bcaec517a4c066b41204c9bcb5cb --bcaec517a4c066b41204c9bcb5cb Content-Type: text/plain; charset=ISO-8859-1 Hi All, I am interested in working on storage format. (sign up?) I wrote a HDFS file format, which is similar to Sequence file (row storage, block management, compress), I provide InputFormat and OutputFormat, sometimes it get a great performance, sometimes not, depends on the data. for Drill, we should implement a column-storage, this can skip some columns during query, and skip some rows within one column file. but this column-storage should based on the distributed file system, such as HDFS, Mapr DFS, I like Mapr DFS because of HA. we can implement the following column storage file format, I think it's enough to us. http://arxiv.org/pdf/1105.4252.pdf --bcaec517a4c066b41204c9bcb5cb--