cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrés de la Peña (JIRA) <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12245) initial view build can be parallel
Date Wed, 23 Aug 2017 11:13:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138233#comment-16138233
] 

Andrés de la Peña commented on CASSANDRA-12245:
-----------------------------------------------

[Here|https://github.com/apache/cassandra/compare/trunk...adelapena:12245-trunk] is a new
version of the patch addressing the review comments. The updated dtests are [here|https://github.com/apache/cassandra-dtest/compare/master...adelapena:12245].

bq. It would be nice to maybe try to reuse the {{Splitter}} methods if possible, so we can
reuse tests, or if that's not straightforward maybe put the methods on splitter and add some
tests to make sure it's working correctly.

I have moved the [methods to split the ranges|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/dht/Splitter.java#L137-L187]
to the {{Splitter}}, reusing its [{{valueForToken}}|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/dht/Splitter.java#L47]
method. Tests [here|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/test/unit/org/apache/cassandra/dht/SplitterTest.java#L162-L224].

bq. Can probably remove the generation field from the builds in progress table and remove
this comment

Removed.

bq. {{views_builds_in_progress_v2}} sounds a bit hacky, so perhaps we should call it {{system.view_builds_in_progress}}
(remove the s) and also add a NOTICE entry informing the previous table was replaced and data
files can be removed.

Renamed to {{system.view_builds_in_progress}}. Added an [NEWS.txt entry|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/NEWS.txt#L183-L185]
informing about the replacement.

bq. I'm a bit concerned about starving the compaction executor for a long period during view
build of large base tables, so we should probably have another option like {{concurret_view_builders}}
with a conservative default and perhaps control the concurrency at the {{ViewBuilderController}}.
WDYT?

Agree. I have added a new dedicated [executor|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L126]
in the {{CompactionManager}}, similar to the executors used for validation and cache cleanup.
The concurrency of this executor is determined by the new config property [{{concurrent_materialized_view_builders}}|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/conf/cassandra.yaml#L756],
which defaults to a perhaps too much conservative value of {{1}}. This property can be modified
through both JMX and the new [{{setconcurrentviewbuilders}}|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/tools/nodetool/SetConcurrentViewBuilders.java]
and [{{getconcurrentviewbuilders}}|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/tools/nodetool/GetConcurrentViewBuilders.java]
nodetool commands. These commands are tested [here|https://github.com/adelapena/cassandra-dtest/blob/cf893982c361fd7b6018b2570d3a5a33badd5424/nodetool_test.py#L162-L190].

I'm not sure about if it still makes sense for the builder task to extend {{CompactionInfo.Holder}}.
If so, I'm neither sure about how to use {{prevToken.size(range.right)}} (that returns a {{double}})
to create {{CompationInfo}} objects. WDYT?

bq. ViewBuilder seems to be reimplementing some of the logic of PartitionRangeReadCommand,
so I wonder if we shoud take this chance to simplify and use that instead of manually constructing
the commands via ReducingKeyIterator and multiple SinglePartitionReadCommands? We can totally
do this in other ticket if you prefer.

I would prefer to do this in another ticket.

bq. Perform view marking on ViewBuilderController instead of ViewBuilder

I have moved the marking of system tables (and the retries in case of failure) from the {{ViewBuilderTask}}
to the {{ViewBuilder}}, using a callback to do the marking. I think the code is clearer this
way.

bq. Updating the view built status at every key is perhaps a bit inefficient and unnecessary,
so perhaps we should update it every 1000 keys or so.

Done, it is updated [every 1000 keys|https://github.com/adelapena/cassandra/blob/af6b2547b58b8215c65be7546173f3c48b398072/src/java/org/apache/cassandra/db/view/ViewBuilderTask.java#L62].
It doesn't seem to make a great difference in some small benchmarks that I have run.

bq. Would be nice to update the {{interrupt_build_process_test}} to stop halfway through the
build (instead of the start of the build) and verify it it's being resumed correctly with
the new changes.

Updated [here|https://github.com/adelapena/cassandra-dtest/blob/cf893982c361fd7b6018b2570d3a5a33badd5424/materialized_views_test.py#L935-L969].
It also uses [a byteman script|https://github.com/adelapena/cassandra-dtest/blob/cf893982c361fd7b6018b2570d3a5a33badd5424/byteman/skip_view_build_finalization.btm]
to make sure that the MV build isn't finished before stopping the cluster, which is more likely
to happen.

CI results seem ok, there are no failures in unit tests and the failing dtests seem unrelated.

> initial view build can be parallel
> ----------------------------------
>
>                 Key: CASSANDRA-12245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12245
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Materialized Views
>            Reporter: Tom van der Woerdt
>            Assignee: Andrés de la Peña
>             Fix For: 4.x
>
>
> On a node with lots of data (~3TB) building a materialized view takes several weeks,
which is not ideal. It's doing this in a single thread.
> There are several potential ways this can be optimized :
>  * do vnodes in parallel, instead of going through the entire range in one thread
>  * just iterate through sstables, not worrying about duplicates, and include the timestamp
of the original write in the MV mutation. since this doesn't exclude duplicates it does increase
the amount of work and could temporarily surface ghost rows (yikes) but I guess that's why
they call it eventual consistency. doing it this way can avoid holding references to all tables
on disk, allows parallelization, and removes the need to check other sstables for existing
data. this is essentially the 'do a full repair' path



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message