cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lohfink (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-7464) Replace sstable2json and json2sstable
Date Tue, 26 Jan 2016 06:45:40 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116784#comment-15116784
] 

Chris Lohfink edited comment on CASSANDRA-7464 at 1/26/16 6:44 AM:
-------------------------------------------------------------------

The debugging format that may be nice for both the "one per line" and the we can do that pretty
easily using the UnfilteredRow.toString so instead of 
{code}
[
  {
    "partition" : {
      "key" : [ "127.0.0.1-getWriteLatencyHisto" ],
      "position" : 19385620
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 19385664,
        "clustering" : [ "694621867" ],
        "cells" : [
          { "name" : "value", "value" : "00", "tstamp" : 1452861829846001, "ttl" : 604800,
"expires_at" : 1453466629, "expired" : true }
        ]
      },
      {
        "type" : "row",
        "position" : 19385686,
        "clustering" : [ "694621927" ],
        "cells" : [
          { "name" : "value", "value" : "00", "tstamp" : 1452861769124000, "ttl" : 604800,
"expires_at" : 1453466569, "expired" : true }
        ]
      },
      {
        "type" : "row",
        "position" : 19385708,
        "clustering" : [ "694621987" ],
        "cells" : [
          { "name" : "value", "value" : "00", "tstamp" : 1452861709303002, "ttl" : 604800,
"expires_at" : 1453466509, "expired" : true }
        ]
      },
      {
        "type" : "row",
        "position" : 19385730,
        "clustering" : [ "694622047" ],
        "cells" : [
          { "name" : "value", "value" : "00", "tstamp" : 1452861649548002, "ttl" : 604800,
"expires_at" : 1453466449, "expired" : true }
        ]
      },
...
{code}
it can be
{code}
[127.0.0.1-getWriteLatencyHisto]@19385620 Row[info=[ts=-9223372036854775808] ]: 694621867
| [value=00 ts=1452861829846001 ttl=604800 ldt=1453466629]
[127.0.0.1-getWriteLatencyHisto]@19385686 Row[info=[ts=-9223372036854775808] ]: 694621927
| [value=00 ts=1452861769124000 ttl=604800 ldt=1453466569]
[127.0.0.1-getWriteLatencyHisto]@19385708 Row[info=[ts=-9223372036854775808] ]: 694621987
| [value=00 ts=1452861709303002 ttl=604800 ldt=1453466509]
[127.0.0.1-getWriteLatencyHisto]@19385730 Row[info=[ts=-9223372036854775808] ]: 694622047
| [value=00 ts=1452861649548002 ttl=604800 ldt=1453466449]
...
{code}
This would also have benefit for easily splitting files for hadoop jobs etc since it would
have a cql row per line (easing wide partition issues with the compact output from Russell
discussion in other ticket). It would also tie the rendering to something already maintained
for debug logging etc so little additional work for refactoring/storage changes. I am kinda
a fan of both. So I implemented a {{-d}} (could use better name) option for the 1 row per
line "debuggy" compact option (worth noting this is very hard to read if theres a lot of cells).

Also added the current position from the scanner in the results (see above examples).

Until CASSANDRA-9587 I had to add an alternative not to print out clustering key names in
the toString since its not available anywhere which is a little hacky but can be removed once
we have the names.


was (Author: cnlwsu):
The debugging format that may be nice for both the "one per line" and the we can do that pretty
easily using the UnfilteredRow.toString so instead of 
{code}
[
  {
    "partition" : {
      "key" : [ "127.0.0.1-getWriteLatencyHisto" ],
      "position" : 19385620
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 19385664,
        "clustering" : [ "694621867" ],
        "cells" : [
          { "name" : "value", "value" : "00", "tstamp" : 1452861829846001, "ttl" : 604800,
"expires_at" : 1453466629, "expired" : true }
        ]
      },
      {
        "type" : "row",
        "position" : 19385686,
        "clustering" : [ "694621927" ],
        "cells" : [
          { "name" : "value", "value" : "00", "tstamp" : 1452861769124000, "ttl" : 604800,
"expires_at" : 1453466569, "expired" : true }
        ]
      },
      {
        "type" : "row",
        "position" : 19385708,
        "clustering" : [ "694621987" ],
        "cells" : [
          { "name" : "value", "value" : "00", "tstamp" : 1452861709303002, "ttl" : 604800,
"expires_at" : 1453466509, "expired" : true }
        ]
      },
      {
        "type" : "row",
        "position" : 19385730,
        "clustering" : [ "694622047" ],
        "cells" : [
          { "name" : "value", "value" : "00", "tstamp" : 1452861649548002, "ttl" : 604800,
"expires_at" : 1453466449, "expired" : true }
        ]
      },
...
{code}
it can be
{code}
[127.0.0.1-getWriteLatencyHisto]@19385620 Row[info=[ts=-9223372036854775808] ]: 694621867
| [value=00 ts=1452861829846001 ttl=604800 ldt=1453466629]
[127.0.0.1-getWriteLatencyHisto]@19385686 Row[info=[ts=-9223372036854775808] ]: 694621927
| [value=00 ts=1452861769124000 ttl=604800 ldt=1453466569]
[127.0.0.1-getWriteLatencyHisto]@19385708 Row[info=[ts=-9223372036854775808] ]: 694621987
| [value=00 ts=1452861709303002 ttl=604800 ldt=1453466509]
[127.0.0.1-getWriteLatencyHisto]@19385730 Row[info=[ts=-9223372036854775808] ]: 694622047
| [value=00 ts=1452861649548002 ttl=604800 ldt=1453466449]
...
{code}
This would also have benefit for easily splitting files for hadoop jobs etc since it would
have a cql row per line (easing wide partition issues with the compact output mentioned above).
It would also tie the rendering to something already maintained for debug logging etc so little
additional work for refactoring/storage changes. I am kinda a fan of both. So I implemented
a {{-d}} (could use better name) option for the 1 row per line "debuggy" compact option (worth
noting this is very hard to read if theres a lot of cells).

Also added the current position from the scanner in the results (see above examples).

Until CASSANDRA-9587 I had to add an alternative not to print out clustering key names in
the toString since its not available anywhere which is a little hacky but can be removed once
we have the names.

> Replace sstable2json and json2sstable
> -------------------------------------
>
>                 Key: CASSANDRA-7464
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7464
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Chris Lohfink
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: sstable-only.patch, sstabledump.patch
>
>
> Both tools are pretty awful. They are primarily meant for debugging (there is much more
efficient and convenient ways to do import/export data), but their output manage to be hard
to handle both for humans and for tools (especially as soon as you have modern stuff like
composites).
> There is value to having tools to export sstable contents into a format that is easy
to manipulate by human and tools for debugging, small hacks and general tinkering, but sstable2json
and json2sstable are not that.  
> So I propose that we deprecate those tools and consider writing better replacements.
It shouldn't be too hard to come up with an output format that is more aware of modern concepts
like composites, UDTs, ....



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message