cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Dynamic schema modification an anti-pattern?
Date Tue, 07 Oct 2014 08:01:15 GMT
Furthermore, dynamically altering the schema will prevent adding new node
to the cluster. I've faced a similar issue recently. While the new node is
joining the cluster, data are streamed from old to new node. If the
application alter the schema on the fly (DROP TABLE, DROP COLUMN ....) the
data stream arriving at the new node cannot be processed because the schema
has changed (table dropped, column dropped). The streaming is then stalled
and new node remains on JOINING state forever

 It can be a serious blocker for scaling the cluster

On Tue, Oct 7, 2014 at 9:41 AM, Colin <colin@clark.ws> wrote:

> Anti-pattern.  Dynamically altering the schema won't scale and is bad ju
> ju.
>
> --
> *Colin Clark*
> +1-320-221-9531
>
>
> On Oct 6, 2014, at 10:56 PM, Todd Fast <todd@toddfast.com> wrote:
>
> There is a team at my work building a entity-attribute-value (EAV) store
> using Cassandra. There is a column family, called Entity, where the
> partition key is the UUID of the entity, and the columns are the attributes
> names with their values. Each entity will contain hundreds to thousands of
> attributes, out of a list of up to potentially ten thousand known attribute
> names.
>
> However, instead of using wide rows with dynamic columns (and serializing
> type info with the value), they are trying to use a static column family
> and modifying the schema dynamically as new named attributes are created.
>
> (I believe one of the main drivers of this approach is to use collection
> columns for certain attributes, and perhaps to preserve type metadata for a
> given attribute.)
>
> This approach goes against everything I've seen and done in Cassandra, and
> is generally an anti-pattern for most persistence stores, but I want to
> gather feedback before taking the next step with the team.
>
> Do others consider this approach an anti-pattern, and if so, what are the
> practical downsides?
>
> For one, this means that the Entity schema would contain the superset of
> all columns for all rows. What is the impact of having thousands of columns
> names in the schema? And what are the implications of modifying the schema
> dynamically on a decent sized cluster (5 nodes now, growing to 10s later)
> under load?
>
> Thanks,
> Todd
>
>

Mime
View raw message