Return-Path: X-Original-To: apmail-tajo-dev-archive@minotaur.apache.org Delivered-To: apmail-tajo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1F1E8CBA8 for ; Sun, 2 Feb 2014 01:34:02 +0000 (UTC) Received: (qmail 43643 invoked by uid 500); 2 Feb 2014 01:34:01 -0000 Delivered-To: apmail-tajo-dev-archive@tajo.apache.org Received: (qmail 43554 invoked by uid 500); 2 Feb 2014 01:34:01 -0000 Mailing-List: contact dev-help@tajo.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tajo.incubator.apache.org Delivered-To: mailing list dev@tajo.incubator.apache.org Received: (qmail 43545 invoked by uid 99); 2 Feb 2014 01:34:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Feb 2014 01:34:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of coderplay@gmail.com designates 209.85.212.52 as permitted sender) Received: from [209.85.212.52] (HELO mail-vb0-f52.google.com) (209.85.212.52) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Feb 2014 01:33:55 +0000 Received: by mail-vb0-f52.google.com with SMTP id p14so3794322vbm.39 for ; Sat, 01 Feb 2014 17:33:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=9Uk0jZphXdzNZukv0vjByeaYIyBzNyveBqWBzFrwxsI=; b=QTn7A8836fv18xxjbwKcfCxE2E24Faq2/ivfZXTIiAOc3o/yi/CLzZRoMnYNw6r/38 EJ6MHngr59Ogs3ipmC8quj4PUYt7lV/JrTzjeAaGXCv7ZaYa8c0qOF9Xx8lpWN5PycOf c8OVPid1EhAbdHP0OOnR8QSonMcBr9yszaYyzmTS7baHgFmyivF+8ClnI5qtwaw179pL iDPGSJ2xCDOCu2GnSOLdVj0tHU/wXmDURqFt7uu5MTyj9RX5HFVpQrq/HdKE+Zc6aq7U KHP/pkUBucQy34zpvWIeYxPfUR146qW9opZLtCnf6xV8wz1bGS777Aetr8N2bUVy1ZZr Fb7g== MIME-Version: 1.0 X-Received: by 10.52.171.68 with SMTP id as4mr18911643vdc.0.1391304813935; Sat, 01 Feb 2014 17:33:33 -0800 (PST) Received: by 10.221.61.17 with HTTP; Sat, 1 Feb 2014 17:33:33 -0800 (PST) In-Reply-To: References: Date: Sat, 1 Feb 2014 17:33:33 -0800 Message-ID: Subject: Re: Tajo storage layer From: Min Zhou To: dev Content-Type: multipart/alternative; boundary=047d7b6dbd9a357fda04f162653e X-Virus-Checked: Checked by ClamAV on apache.org --047d7b6dbd9a357fda04f162653e Content-Type: text/plain; charset=ISO-8859-1 Thanks for your explanation Jihoon. Seems it's like a thread pool balance the different overhead of each disk, and the queues are for letting the physical operators fell into step with disk scans. More clear than before. Still confusing about those aspect 1. see the code in StorageManagerFactory.getStorageManager(TajoConf conf, Path warehouseDir, boolean v2). Different file name will get a different StorageManagerV2 thread pool. If the number of files a worker scans getting larger, the thread pool number will be larger as well. 2. Is that designed for local disk scan? Seems not only for locals, I observed remote hdfs scanning too. Seems this design isn't suitable for remote hdfs scanning. 3. From the diagram you showed to me, each local hard disk has its own queue. How do you find the queue with minimum number of read requests since the queue is determined by the disk where the target file reside in. For example, is file f is on disk /dev/sda, the request of scanning f should be put into the queue of /dev/sda. Why need find a queue with minimum number of read requests? Please correct me if I got anything wrong. Regards, Min On Sat, Feb 1, 2014 at 7:29 AM, Jihoon Son wrote: > Hi, Min > > The operation of StorageManagerV2 is as follows. The > ScanSchedulercoordinates read requests for each disk. That is, when it > receives a number > of read requests, it first finds the DiskFileScanScheduler who is assigned > the minimum number of read requests. After that, it assigns a read request > to the found DiskFileScanScheduler. This process is repeated for remaining > read requests. DiskFileScanScheduler creates FileScanRunners for every > assigned request. FileScanRunner just reads data by a fixed size of buffer. > You can see the related issue at > https://issues.apache.org/jira/browse/TAJO-178 and this > figure< > https://issues.apache.org/jira/secure/attachment/12602567/tajo_storage_manager.png > >will > help you understand. > > Although StorageManagerV2 is designed to accelerate the read performance by > scheduling disk scans, its performance was not up to our expectations. As > you said, its thread model is too complex, and it might degrade the > performance. So, StorageManager is mainly used instead of StorageManagerV2. > (StorageManager is used by default). > > Thanks, > Jihoon > > > 2014-02-01 Min Zhou : > > > Hi all, > > > > Seems the thread model of tajo storage layer is quite complex. > > Each call of StorageManagerFactory.getStorageManager(TajoConf) creates > > one instance of StorageManagerV2, which creates a scan scheduler thread > > and several disk file scan schedulers threads. Why those threads are > > needed? What's their function? How do those threads work with file > > scanners? > > > > > > Regards, > > Min > > -- > > My research interests are distributed systems, parallel computing and > > bytecode based virtual machine. > > > > My profile: > > http://www.linkedin.com/in/coderplay > > My blog: > > http://coderplay.javaeye.com > > > -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com --047d7b6dbd9a357fda04f162653e--