sxjscience commented on issue #16955: [Dataset] add flatten API to dataset
URL: https://github.com/apache/incubator-mxnet/pull/16955#issuecomment-560853809
Explicitly constructing the dataset may be a better choice than adding a `flatten` method.
```python
new_dataset = preprocess_function(dataset)
```
From my perspective, the major design choice of `gluon.dataset` is to support `__getitem__`
+ lazy evaluation in `transform()`. With the help of lazy evaluation, we can generate the
data on-the-fly and the overall data processing pipeline uses less memory. However, the `flatten`
method is equivalent to this python one-liner `SimpleDataset(list(itertools.chain.from_iterable(self)))`
and there is no speed/memory benefit.
Moreover, think about the case where each sample is a (data, label) pair. Calling flatten()
will make the dataset look like `[data0, label0, data1, label1, ...]`, which is not very meaningful.
I suggest we should just use `SimpleDataset(list(itertools.chain.from_iterable(self)))`
to implement this functionality.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
|