incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terje Marthinussen <tmarthinus...@gmail.com>
Subject Re: Minimizing the impact of compaction on latency and throughput
Date Tue, 13 Jul 2010 11:18:41 GMT
> (2) posix_fadvise() feels more obscure and less portable than
> O_DIRECT, the latter being well-understood and used by e.g. databases
> for a long time.
>

Due to the need for doing data alignment in the application itself (you are
bypassing all the OS magic here), there is really nothing portable about
O_DIRECT. Just have a look at open(2) on linux:
----
  O_DIRECT
       The  O_DIRECT  flag may impose alignment restrictions on the length
and
       address of userspace buffers and the file offset  of  I/Os.   In
Linux
       alignment restrictions vary by file system and kernel version and
might
       be absent entirely.  However there is currently no file
system-indepen‐
       dent  interface for an application to discover these restrictions for
a
       given file or file system.  Some file systems provide their own
inter‐
       faces  for  doing  so,  for  example  the  XFS_IOC_DIOINFO operation
in
       xfsctl(3).
----
So, just within Linux you got different mechanisms for this depending on
kernel and fs in use and you need to figure out what to do yourself as the
OS will not tell you that. Don't expect this alignment stuff to be more
standardized across OSes than inside of Linux. Still find this portable?

O_DIRECT also bypasses the cache completely, so you loose a lot of the I/O
scheduling and caching across multiple reads/writers in threaded apps and
separated processes which the OS may offer. This can especially be a big
loss when you got servers with loads of memory for large filesystem caches
where you might find it hard to actually utilize the cache in the
application.

O_DIRECT was made to solve HW performance limitation on servers 10+ years
ago. It is far from an ideal solution today (but until stuff like fadvice is
implemented properly, somewhat unavoidable)

Best regards,
Terje

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message