cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mck SembWever (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
Date Wed, 07 Sep 2011 19:19:09 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099234#comment-13099234
] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/7/11 7:17 PM:
------------------------------------------------------------------

Here keyRange is startToken to split.getEndToken()
startToken is updated each iterate to the last row read (each iterate is batchRowCount rows).

What happens if split.getEndToken() doesn't correspond to any of the rowKeys?
To me it reads that startToken will hop over split.getEndToken() and get_range_slices(..)
will start returning wrapping ranges. This will still return rows and so the iteration will
continue, now forever.

The only way out for this code today is a) startToken equals split.getEndToken(), or b) get_range_slices(..)
is called with startToken equals split.getEndToken() OR a gap so small there exists no rows
in between.

      was (Author: michaelsembwever):
    Here keyRange is startToken to split.getEndToken()
startToken is updated each iterate to the last row read (each iterate is batchRowCount rows).

What happens is split.getEndToken() doesn't correspond to any of the rowKeys?
To me it reads that startToken will hop over split.getEndToken() and get_rage_slices(..) will
start returning wrapping ranges. This will still return rows and so the iteration will continue,
now forever.

The only way out for this code today is a) startToken equals split.getEndToken(), or b) get_range_slices(..)
is called with startToken equals split.getEndToken() OR a gap so small there exists no rows
in between.
  
> ColumnFormatRecordReader loops forever
> --------------------------------------
>
>                 Key: CASSANDRA-3150
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 0.8.4
>            Reporter: Mck SembWever
>            Assignee: Mck SembWever
>            Priority: Critical
>         Attachments: CASSANDRA-3150.patch
>
>
> From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
> {quote}
> bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
> bq. CFIF's inputSplitSize=196608
> bq. 3 map tasks (from 4013) is still running after read 25 million rows.
> bq. Can this be a bug in StorageService.getSplits(..) ?
> getSplits looks pretty foolproof to me but I guess we'd need to add
> more debug logging to rule out a bug there for sure.
> I guess the main alternative would be a bug in the recordreader paging.
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message