incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "CrailProposal" by PatrickStuedi
Date Thu, 05 Oct 2017 19:11:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "CrailProposal" page has been changed by PatrickStuedi:
https://wiki.apache.org/incubator/CrailProposal?action=diff&rev1=21&rev2=22

  
  == Rational ==
  
+ During the last decade, I/O hardware has undergone rapid performance improvements, typically
in the order of magnitudes. Modern day networking and storage hardware can deliver 100+ Gbps
(10+ GBps) bandwidth with a few microseconds of access latencies. However, despite such progress
in raw I/O performance, effectively leveraging modern hardware in data processing frameworks
remains challenging. In most of the cases, upgrading to high-end networking or storage hardware
has very little effect on the performance of analytics workloads. The problem comes from heavily
layered software imposing overheads such as deep call stacks, unnecessary data copies, thread
contention, etc. These problems have already been addressed at the operating system level
with new I/O APIs such as RDMA verbs, NVMe, etc., allowing applications to bypass software
layers during I/O operations. Distributed data processing frameworks on the other hand, are
typically implemented on legacy I/O interfaces such as such as sockets or block storage. These
interfaces have been shown to be insufficient to deliver the full hardware performance. Yet,
to the best of our knowledge, there are no active and systematic efforts to integrate these
new user level I/O APIs into Apache software frameworks. This problem affects all end-users
and organizations that use Apache software. We expect them to see unsatisfactory small performance
gains when upgrading their networking and storage hardware. 
- During the last decade, I/O hardware has undergone rapid performance improvements, typically
in the order of magnitudes. Modern day networking and storage hardware can deliver 100+ Gbps
(10+ GBps) bandwidth with a few microseconds of access latencies. However, despite such progress
in raw I/O performance, effectively leveraging modern hardware in data processing frameworks
remains challenging for two reasons: first, often hardware integration takes place too low
in the stack (e.g., emulating socket I/O on RDMA networks), and as a result, performance gains
are overshadowed by software overheads. These overheads come from heavy layering, multiple
data copies, JVM overheads, thread contentions, etc. And secondly, I/O hardware improvements
have also brought up the need for new I/O APIs such as RDMA verbs, NVMe, etc., since traditional
abstractions such as sockets or block I/O have been shown to be insufficient to deliver the
full hardware performance. Yet, to the best of our knowledge, there are no active and systematic
efforts to integrate these new user level I/O APIs into Apache software frameworks.
- This problem affects all end-users and organizations that use Apache software. We expect
them to see unsatisfactory small performance gains when upgrading their networking and storage
hardware. 
  
  Crail solves this problem by providing an efficient storage platform built upon user-level
I/O, thus, bypassing layers such as JVM and OS during I/O operations. Moreover, Crail directly
leverages the specific hardware features of RDMA and NVMe to provide a better integration
with high-level data operations in Apache compute frameworks. As a consequence, Crail enables
users to run larger, more complex queries against ever increasing amounts of data at a speed
largely determined by the deployed hardware. Crail is generic solution that integrates well
with the Apache ecosystem including frameworks like Spark, Hadoop, Hive, etc. 
  

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message