tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jihoon Son <ghoon...@gmail.com>
Subject Re: Tajo storage layer
Date Sun, 02 Feb 2014 07:08:29 GMT
Hi, Min.
Here are my answers.

1. see the code in StorageManagerFactory.getStorageManager(TajoConf conf,
> Path warehouseDir, boolean v2).  Different file name will get a different
> StorageManagerV2 thread pool. If the number of files a worker scans getting
> larger, the thread pool number will be larger as well.
>

You are right. It increases the thread overhead significantly according to
the number of different file names.

2.  Is that designed for local disk scan?  Seems not only for locals, I
> observed remote hdfs scanning too. Seems this design isn't suitable for
> remote hdfs scanning.
>

It is designed for both of HDFS and local file system. As you said, its
primary goal is to balance the access requests for each disk. If requests
are balanced well, the disk access scheduling will increase the access
performance regardless of the underlying file system.


> 3.  From the diagram you showed to me,  each local hard disk has its own
> queue.  How do you find the queue with minimum number of read requests
> since the queue is determined by the disk where the target file reside in.
> For example, is file  f is on disk /dev/sda,   the request of scanning f
> should be put into the queue of /dev/sda. Why need find a queue with
> minimum number of read requests?
>

You are right. Since the disk locality should be primarily considered, disk
balancing does not work. I think that its architecture has a fundamental
problem.

Its architecture is hard to understand, and also has fundamental problems
which include the described above. We should improve it.

Thanks,
Jihoon

On Sat, Feb 1, 2014 at 7:29 AM, Jihoon Son <jihoonson@apache.org> wrote:
>
> > Hi, Min
> >
> > The operation of StorageManagerV2 is as follows. The
> > ScanSchedulercoordinates read requests for each disk. That is, when it
> > receives a number
> > of read requests, it first finds the DiskFileScanScheduler who is
> assigned
> > the minimum number of read requests. After that, it assigns a read
> request
> > to the found DiskFileScanScheduler. This process is repeated for
> remaining
> > read requests. DiskFileScanScheduler creates FileScanRunners for every
> > assigned request. FileScanRunner just reads data by a fixed size of
> buffer.
> > You can see the related issue at
> > https://issues.apache.org/jira/browse/TAJO-178 and this
> > figure<
> >
> https://issues.apache.org/jira/secure/attachment/12602567/tajo_storage_manager.png
> > >will
> > help you understand.
> >
> > Although StorageManagerV2 is designed to accelerate the read performance
> by
> > scheduling disk scans, its performance was not up to our expectations.
> As
> > you said, its thread model is too complex, and it might degrade the
> > performance. So, StorageManager is mainly used instead of
> StorageManagerV2.
> > (StorageManager is used by default).
> >
> > Thanks,
> > Jihoon
> >
> >
> > 2014-02-01 Min Zhou <coderplay@gmail.com>:
> >
> > > Hi all,
> > >
> > > Seems the thread model of tajo storage layer is quite complex.
> > > Each call of StorageManagerFactory.getStorageManager(TajoConf)
>  creates
> > > one instance of StorageManagerV2,  which creates a scan scheduler
> thread
> > > and several disk file scan schedulers threads.  Why those threads are
> > > needed? What's their function?  How do those threads work with file
> > > scanners?
> > >
> > >
> > > Regards,
> > > Min
> > > --
> > > My research interests are distributed systems, parallel computing and
> > > bytecode based virtual machine.
> > >
> > > My profile:
> > > http://www.linkedin.com/in/coderplay
> > > My blog:
> > > http://coderplay.javaeye.com
> > >
> >
>
>
>
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com
>



-- 
Jihoon Son

Database & Information Systems Group,
Prof. Yon Dohn Chung Lab.
Dept. of Computer Science & Engineering,
Korea University
1, 5-ga, Anam-dong, Seongbuk-gu,
Seoul, 136-713, Republic of Korea

Tel : +82-2-3290-3580
E-mail : jihoonson@korea.ac.kr

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message