cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7464) Retire/replace sstable2json and json2sstable
Date Fri, 27 Jun 2014 17:34:24 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046175#comment-14046175
] 

Sylvain Lebresne commented on CASSANDRA-7464:
---------------------------------------------

bq. I'm curious what a better tool's output would look like?

That's a good question, thanks for asking it :)

Honestly, I haven't though about it yet. But imo what we'd want is something that:
# is reasonably easily human readable (but it can still be json so it's also easy to handle
with tools), even if that means something verbose (we're not targeting performance).
# contains all the information the SSTable store so we can reverse it (but it doesn't have
to be the json representation that is closest to the actual underlying sstable format)
# can be generated without needing to load the entire sstable in memory.

For instance (and that's just meant to illustrate what I have in mind, I haven't though it
through), I could imagine something along the lines of:
{noformat}
[
  {
    'type' : 'partition',
    'partition_key' : [
        { 'name' : 'pk1', 'value' : 3 }
        { 'name' : 'pk2', 'value' : 'foo' }
    ]
    'deletion_info' : { 'deletion_time' : 32423423, 'tstamp' : 324234234 }
  },
  {
    'type' : 'static_block',
    'columns' : [ 
        { 'name' : 'static_col', 'value' : 'foo', 'tstamp' : 32423423 },
    ],
  },
  {
    'type' : 'range_tombstone',
    'start' : [
        { 'name' : 'ck', 'value' : 10  },
    ],
    'end' : [
        { 'name' : 'ck', 'value' : 50  },
    ],
    'deletion_info' : { 'deletion_time' : 32423423, 'tstamp' : 324234234 }
  },
  {
    'type' : 'row',
    'columns' : [
        { 'name' : 'ck', 'value' : 42  },
        { 'name' : 'v1', 'value' : [ 'foo', 'bar' ], 'tstamp' : 213893242 },
        { 'name' : 'v2', 'value' : { 'field1' : 3, 'field2' : 'foo' }, 'tstamp' : 213893242,
'ttl' : 2133 },
    ],
    'tombstones' : [
        { 'name' : 'v3', 'deletion_time' : 214124124, 'tstamp' : 322342342 }
    ]
  },
  ...
]
{noformat}


Also, while we're at it, it would be nice if such new tool were able to do stuff like "show
me partition X for this sstable" (which would be done without scanning the whole sstable obviously)


> Retire/replace sstable2json and json2sstable
> --------------------------------------------
>
>                 Key: CASSANDRA-7464
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> Both tools are pretty awful. They are primarily meant for debugging (there is much more
efficient and convenient ways to do import/export data), but their output manage to be hard
to handle both for humans and for tools (especially as soon as you have modern stuff like
composites).
> There is value to having tools to export sstable contents into a format that is easy
to manipulate by human and tools for debugging, small hacks and general tinkering, but sstable2json
and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better replacements.
It shouldn't be too hard to come up with an output format that is more aware of modern concepts
like composites, UDTs, ....



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message