cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cyril Scetbon (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-4131) Integrate Hive support to be in core cassandra
Date Wed, 14 Aug 2013 09:39:55 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738387#comment-13738387
] 

Cyril Scetbon edited comment on CASSANDRA-4131 at 8/14/13 9:39 AM:
-------------------------------------------------------------------

I've met a performance issue where there is a few data. In my example, I have only a few rows
:

{code}cqlsh>select count(*) from light_column;

 count
-------
     4
{code} 

It takes less than a second with cqlsh whereas it takes near 600 seconds with Hive. Please
see logs at http://pastebin.com/ippy96GY
There are 257 mappers (to scan data from 256 vnodes) and they took a lot of CPU even if the
process says at the end :
*Total MapReduce CPU Time Spent: 0 msec*

Another issue is that the count number is false as it returns 5 instead of 4, and it's caused
by a deleted row counted as alive !
                
      was (Author: cscetbon):
    I've met a performance issue where there is a few data. In my example, I have only a few
rows :

{code}cqlsh>select count(*) from light_column;

 count
-------
     4
{code} 

It takes less than a second with cqlsh whereas it takes near 600 seconds with Hive. Please
see logs at http://pastebin.com/ippy96GY
There are 257 mappers (to scan data from 256 vnodes) and they took a lot of CPU even if the
process says at the end :
*Total MapReduce CPU Time Spent: 0 msec*

Another issue is that the count number is false as it returns 5 instead of 4.
                  
> Integrate Hive support to be in core cassandra
> ----------------------------------------------
>
>                 Key: CASSANDRA-4131
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4131
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jeremy Hanna
>            Assignee: Edward Capriolo
>              Labels: hadoop, hive
>
> The standalone hive support (at https://github.com/riptano/hive) would be great to have
in-tree so that people don't have to go out to github to download it and wonder if it's a
left-for-dead external shim.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message