tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Min Zhou <coderp...@gmail.com>
Subject Re: Tajo storage layer
Date Sun, 02 Feb 2014 01:33:33 GMT
Thanks for your explanation  Jihoon.  Seems it's like a thread pool balance
the different overhead of each disk, and the queues are for letting the
physical operators fell into step with disk scans. More clear than before.
Still confusing about those aspect

1. see the code in StorageManagerFactory.getStorageManager(TajoConf conf,
Path warehouseDir, boolean v2).  Different file name will get a different
StorageManagerV2 thread pool. If the number of files a worker scans getting
larger, the thread pool number will be larger as well.

2.  Is that designed for local disk scan?  Seems not only for locals, I
observed remote hdfs scanning too. Seems this design isn't suitable for
remote hdfs scanning.

3.  From the diagram you showed to me,  each local hard disk has its own
queue.  How do you find the queue with minimum number of read requests
since the queue is determined by the disk where the target file reside in.
For example, is file  f is on disk /dev/sda,   the request of scanning f
should be put into the queue of /dev/sda. Why need find a queue with
minimum number of read requests?


Please correct me if I got anything wrong.


Regards,
Min



On Sat, Feb 1, 2014 at 7:29 AM, Jihoon Son <jihoonson@apache.org> wrote:

> Hi, Min
>
> The operation of StorageManagerV2 is as follows. The
> ScanSchedulercoordinates read requests for each disk. That is, when it
> receives a number
> of read requests, it first finds the DiskFileScanScheduler who is assigned
> the minimum number of read requests. After that, it assigns a read request
> to the found DiskFileScanScheduler. This process is repeated for remaining
> read requests. DiskFileScanScheduler creates FileScanRunners for every
> assigned request. FileScanRunner just reads data by a fixed size of buffer.
> You can see the related issue at
> https://issues.apache.org/jira/browse/TAJO-178 and this
> figure<
> https://issues.apache.org/jira/secure/attachment/12602567/tajo_storage_manager.png
> >will
> help you understand.
>
> Although StorageManagerV2 is designed to accelerate the read performance by
> scheduling disk scans, its performance was not up to our expectations. As
> you said, its thread model is too complex, and it might degrade the
> performance. So, StorageManager is mainly used instead of StorageManagerV2.
> (StorageManager is used by default).
>
> Thanks,
> Jihoon
>
>
> 2014-02-01 Min Zhou <coderplay@gmail.com>:
>
> > Hi all,
> >
> > Seems the thread model of tajo storage layer is quite complex.
> > Each call of StorageManagerFactory.getStorageManager(TajoConf)  creates
> > one instance of StorageManagerV2,  which creates a scan scheduler thread
> > and several disk file scan schedulers threads.  Why those threads are
> > needed? What's their function?  How do those threads work with file
> > scanners?
> >
> >
> > Regards,
> > Min
> > --
> > My research interests are distributed systems, parallel computing and
> > bytecode based virtual machine.
> >
> > My profile:
> > http://www.linkedin.com/in/coderplay
> > My blog:
> > http://coderplay.javaeye.com
> >
>



-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message