incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "CrailProposal" by PatrickStuedi
Date Wed, 04 Oct 2017 19:29:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "CrailProposal" page has been changed by PatrickStuedi:
https://wiki.apache.org/incubator/CrailProposal?action=diff&rev1=20&rev2=21

  
  == Rational ==
  
+ During the last decade, I/O hardware has undergone rapid performance improvements, typically
in the order of magnitudes. Modern day networking and storage hardware can deliver 100+ Gbps
(10+ GBps) bandwidth with a few microseconds of access latencies. However, despite such progress
in raw I/O performance, effectively leveraging modern hardware in data processing frameworks
remains challenging for two reasons: first, often hardware integration takes place too low
in the stack (e.g., emulating socket I/O on RDMA networks), and as a result, performance gains
are overshadowed by software overheads. These overheads come from heavy layering, multiple
data copies, JVM overheads, thread contentions, etc. And secondly, I/O hardware improvements
have also brought up the need for new I/O APIs such as RDMA verbs, NVMe, etc., since traditional
abstractions such as sockets or block I/O have been shown to be insufficient to deliver the
full hardware performance. Yet, to the best of our knowledge, there are no active and systematic
efforts to integrate these new user level I/O APIs into Apache software frameworks.
- During the last decade, I/O hardware has undergone rapid performance improvements, typically
in the order of magnitudes. Modern day networking and storage hardware can deliver 100+ Gbps
(10+ GBps) bandwidth with a few microseconds of access latencies. However, despite such progress
in raw I/O performance, effectively leveraging modern hardware in data processing frameworks
remains challenging.
-  	
- Delivering the performance of modern I/O hardware at application level is a problem due
to two key reasons. First, often hardware integration takes place too low in the stack (e.g.,
just emulating socket I/O on RDMA), and as a result, performance gains are overshadowed by
software overheads [1]. These overheads come from heavy layering, multiple data copies, JVM
overheads, thread contentions, etc. And secondly, I/O hardware improvements have also brought
up the need for new I/O APIs such as RDMA verbs, NVMe, etc., since traditional abstractions
such as sockets or block I/O have been shown to be insufficient to deliver the full hardware
performance. Yet, to the best of our knowledge, there are no active and systematic efforts
to integrate these new user level I/O APIs into Apache software frameworks.
  This problem affects all end-users and organizations that use Apache software. We expect
them to see unsatisfactory small performance gains when upgrading their networking and storage
hardware. 
  
  Crail solves this problem by providing an efficient storage platform built upon user-level
I/O, thus, bypassing layers such as JVM and OS during I/O operations. Moreover, Crail directly
leverages the specific hardware features of RDMA and NVMe to provide a better integration
with high-level data operations in Apache compute frameworks. As a consequence, Crail enables
users to run larger, more complex queries against ever increasing amounts of data at a speed
largely determined by the deployed hardware. Crail is generic solution that integrates well
with the Apache ecosystem including frameworks like Spark, Hadoop, Hive, etc. 

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message