mxnet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-mxnet] sxjscience commented on issue #16955: [Dataset] add flatten API to dataset
Date Mon, 02 Dec 2019 23:10:42 GMT
sxjscience commented on issue #16955: [Dataset] add flatten API to dataset
URL: https://github.com/apache/incubator-mxnet/pull/16955#issuecomment-560853809
 
 
   Explicitly constructing the dataset may be a better choice than adding a `flatten` method.
   ```python
   new_dataset = preprocess_function(dataset)
   ```
   
   From my perspective, the major design choice of `gluon.dataset` is to support `__getitem__`
+ lazy evaluation in `transform()`. With the help of lazy evaluation, we can generate the
data on-the-fly and the overall data processing pipeline uses less memory. However, the `flatten`
method  is equivalent to this python one-liner `SimpleDataset(list(itertools.chain.from_iterable(self)))`
and there is no speed/memory benefit.
   
   Moreover, think about the case where each sample is a (data, label) pair. Calling flatten()
will make the dataset look like `[data0, label0, data1, label1, ...]`, which is not very meaningful.
   
   I suggest we should just use `SimpleDataset(list(itertools.chain.from_iterable(self)))`
to implement this functionality.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message