cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Bradberry (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-7464) Replace sstable2json and json2sstable
Date Tue, 29 Dec 2015 16:27:49 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074017#comment-15074017
] 

Russell Bradberry edited comment on CASSANDRA-7464 at 12/29/15 4:27 PM:
------------------------------------------------------------------------

I would like to see an option to have an output method that is more digestible by scripts.
 The old sstable2json and currently this one, output the entire SSTable as a single array
that is pretty-formatted.  This is great for visually looking at it but requires the loading
of an entire SSTable into memory before JSON parsing it.  There are tools that attempt to
read a large JSON stream and emit objects as they are complete, but these are rather cumbersome
and difficult to use, also tend to be different from language to language.

What I would propose is to have a command line option that will output one partition per line
(escaping any newlines encountered) without any leading trailing brackets or commas.  This
will allow for an application to be able to read one partition at a time and work on it in
a streaming fashion.

I also put my thoughts on this in this github issue: https://github.com/tolbertam/sstable-tools/issues/19


was (Author: devdazed):
I would like to see an option to have an output method that is more digestible by scripts.
 The old sstable2json and currently this one, output the entire SSTable as a single array
that is pretty-formatted.  This is great for visually looking at it but requires the loading
of an entire SSTable into memory before JSON parsing it.  There are tools that attempt to
read a large JSON stream and emit objects as they are complete, but these are rather cumbersome
and difficult to use, also tend to be different fromm language to language.

What I would propose is to have a command line option that will output one partition per line
(escaping any newlines encountered) without any leading trailing brackets or commas.  This
will allow for an application to be able to read one partition at a time and work on it in
a streaming fashion.

I also put my thoughts on this in this github issue: https://github.com/tolbertam/sstable-tools/issues/19

> Replace sstable2json and json2sstable
> -------------------------------------
>
>                 Key: CASSANDRA-7464
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Chris Lohfink
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: sstable-only.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is much more
efficient and convenient ways to do import/export data), but their output manage to be hard
to handle both for humans and for tools (especially as soon as you have modern stuff like
composites).
> There is value to having tools to export sstable contents into a format that is easy
to manipulate by human and tools for debugging, small hacks and general tinkering, but sstable2json
and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better replacements.
It shouldn't be too hard to come up with an output format that is more aware of modern concepts
like composites, UDTs, ....



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message