Subject [jira] [Commented] (CASSANDRA-10367) Aggregate with Initial Condition fails with C* 3.0
Robert Stupp commented on CASSANDRA-10367:

Sure. It's all about maintaining the state of the aggregate.

The current flow for UDAs is (roughly) like this
# create initial state variable instance, serialized (the collection in this case)
# for each row
## deserialize state variable instance (the collection in this case)
## call UDA state function with deserialized state variable and row's column value
## UDA state function modifies state variable (the collection in this case)
## store serialized state variable instance as returned by UDA state function
# for final function
## deserialize state variable instance
## call UDA final function with deserialized state variable
# return UDA final value

Superfluous re-serialization is addressed in CASSANDRA-9613. So the flow would then be:
# Create state variable instance (non-serialized, a "real" object)
# for each row
## Call state function with state variable object and row's column value
## store state variable object returned from state function
# for final function
## Call final function with state variable object
# serialize state variable or final function's return value

But for unmodifiable collections, the UDA's state variable has to do something like this:
public List myStateFunction(List state, String value)
  state = new ArrayList(state); // <-- THIS ONE
  return state;
This can become quite expensive (CPU and garbage) if the UDA's being used on a partition with
several hundred/thousand rows - especially if people use bigger maps, store more intermediate
results, etc, etc.

OTOH this will also be true for tuples and UDTs as these always (de)serialize for every get/set,
which is also imperfect IMO (but not something to be addressed in the driver).

So, long wall of text...
*TL;DR* - Having said that tuples and UDTs do serialization, I'd like to address that in CASSANDRA-9613
to also prevent that. So my plan would be: resolve this as "duplicate" of 9613 and fix it
there for UDAs. But I'm still unsure if returning an unmodifiable collection is a good idea
in the driver. 

