hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Embree <cemb...@gmail.com>
Subject Re: Low latency data access Vs High throughput of data
Date Mon, 20 May 2013 17:51:42 GMT
I'll take a swing at this one.

Low latency data access:  I hit the enter key (or submit button) and I
expect results within seconds at most.  My database query time should be
High throughput of data:  I want to scan millions of rows of data and count
or sum some subset.  I expect this will take a few minutes (or much longer
depending on complexity) to complete.  Think of more batch style jobs.

Caveats: This is really a map/reduce issue also.  The Set up and processing
of M/R jobs takes a bit of overhead.  There are a couple of projects
working now to move toward lower latency data access.

Also, HDFS stores data in blocks and distributes them across many nodes.
 This means that there will (almost) always be some network data transfer
required to get the final answer, and that "slows" things down a bit,
depending on throughput and various other factors.

Hope that helps. :)

On Mon, May 20, 2013 at 10:48 AM, Raj Hadoop <hadoopraj@yahoo.com> wrote:

> Hi,
> I have a basic question on HDFS. I was reading that HDFS doesnt work well
> with low latency data access. Rather it is designed for the high throughput
> of data. Can you please explain in simple words the difference between "Low
> latency data access Vs High throughput of data".
> Thanks,
> Raj

View raw message