singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wang Wei <wang...@comp.nus.edu.sg>
Subject Re: activity recognition using apache singa
Date Sat, 08 Oct 2016 13:43:14 GMT
Actually, the char-rnn example is from type (4), where each rnn unit would
generate a prediction and has a ground truth label.

For your model (type 2), you only need to use the y128 (of shape 256, 28)
from the rnn::forward() as the input to the dense layer. All other yi
should be ignored.
Consequently, you would have an output (denoted as o) of shape (256, 6)
from the dense layer, which is the prediction for the whole sequence (of
length 128).
By feeding the prediction o and the label into the loss layer, you can
compute the loss value and compute the gradient for o (denoted as o').
Backward propagating the o through the dense layer, you would get the
gradient for y128, denoted as y'128.

*The input of the rnn::backward() would be <y'1, y'2, ...y'128, hy', cy'>,
where only y'128 is a valid tensor. y'1, y'2 ... should be tensor with
value 0.*

Best,
Wei


On Sat, Oct 8, 2016 at 9:33 PM Arash Shafiei <arash.shafiei@gmail.com>
wrote:

> Thanks. It worked.
>
> I am now at the phase of evaluating the loss.
>
> singa.loss.SoftmaxCrossEntropy has a forward function where it takes
> prediction tensors and ground truth.
>
> My problem now is that the prediction is a sequence and my label is not a
> sequence.
>
> Your char-rnn example is an application of type (1) in the figure bellow,
> but activity recognition is an application of type (2).
>
>
> [image: Inline image 1]
> Therefore for each sequence in a batch I have only 1 label. (although this
> label can be of one dimension from the set of {1,2,3,4,5,6} or of 6
> dimension from the set of { [1,0,0,0,0,0], [0,1,0,0,0,0] , etc. }
>
> So now I need predictions and ground truth. The prediction for me is of
> shape
> (128, 256, 28)
> where 128 is the length of the sequence, 256 is the batch size and 28 is
> the hidden layer size.
>
> And my ground truth is of shape
> (256, 1) or (256, 6) -- depending on how you model it..
>
> But as I understood from the example of char-rnn my ground truth must be
> of shape:
> (128, 256)
>
> Would you have any insight about this?
> Thanks..
>
>
> On Sat, Oct 8, 2016 at 6:42 PM, Wang Wei <wangwei@comp.nus.edu.sg> wrote:
>
> Currently, numpy array of dtype=np.float32 or np.int could be converted
> into singa tensor.
> Please convert the numpy array into np.float32 and then call
> tensor.from_numpy(t) (without dtype=np.float32).
>
> On Sat, Oct 8, 2016 at 6:36 PM Arash Shafiei <arash.shafiei@gmail.com>
> wrote:
>
> The values that I have are floating points [-1 1].
>
> While using tensor.from_numpy(...), I was getting this error:
>
> Not implemented yet for  float64
>
> I understood from the tutorial that we could pass the data type:
>
> y = tensor.from_numpy(..., dtype=np.float32)
>
> But using dtype, I am getting another error:
>
> TypeError: from_numpy() got an unexpected keyword argument 'dtype'
>
>
>
> On Sat, Oct 8, 2016 at 3:45 PM, Wang Wei <wangwei@comp.nus.edu.sg> wrote:
>
> Hi
>
> According to the API of forward function:
> http://singa.apache.org/en/docs/layer.html#singa.layer.RNN.forward
> The input should be a vector of Tensors, <x1, x2, ... x128, hx, cx>, xi is
> of shape (1500, 9), hx and cx are optional whose shape should be (1500, 28).
> The output would be a vector of Tensors, <y1, y2, ..., y128, hy, cy>, yi
> is of shape (1500, 28), hy and cy are optional depending on the existence
> of hx and cx.
> If you want to put the dense layer on top of the last rnn unit (i.e. the
> 128-th), then you feed y128 to the dense layer.
>
> function convert just reshapes the raw data into a sequence of tensors
> <x1, x2, ..>.
>
> BTW, typically, people would use a smaller batchsize e.g. less than 256.
>
> May I forward our discussion to the incubator email list in case others
> have similar problems?
> Thanks.
>
> Best,
> Wei
>
> So here what I have:
>
> input batch of dimension (1500, 128, 9)
> This means a batch of 1500 windows each having 128 vector of 9 dimensions.
>
> input label of dimension (1500, 6)
> This means a label batch of 1500 vector of 6 dimensions. This is to label
> if the person is sitting ([1,0,0,0,0,0]) or standing ([0,1,0,0,0,0]), etc.
>
> I am creating an lstm layer with hidden_size=28 and
> input_sample_shape=(9,) and num_stacks=1
>
> Then I create a dense layer with num_output=6 and input_sample_shape=(28,)
>
> Now I would like to feed the data to the 'forward' function of lstm and
> dense layer. But I could not make it work and I could not quit understand
> from the example what 'convert' and 'numpy2tensors' are suppose to do...
>
> I would appreciate your comments..
>
> On Sun, Sep 25, 2016 at 12:23 PM, Arash Shafiei <arash.shafiei@gmail.com>
> wrote:
>
> Yes, I was thinking of batch size to be 32.
>
> Thanks. I am getting more how it works and I am thinking how RNN would be
> helpful. Because we do not want to predict a sequence. We just have a
> sequence (in raw data) and a set of features (in processed data) and we
> want to know the classification.
>
> So I was thinking of using other approaches with SINGA. I understood that
> there is also MLP. We could use MLP from SINGA to see the result first.
>
> In this case input would be a set of 561 values with a label.
> Then the MLP, given a set of test data with 561 features would predict the
> label.
>
> Thanks for advices..
>
>
>
> On Sun, Sep 25, 2016 at 12:03 PM, Wang Wei <wangwei@comp.nus.edu.sg>
> wrote:
>
>
>
> On Sun, Sep 25, 2016 at 9:37 AM, Arash Shafiei <arash.shafiei@gmail.com>
> wrote:
>
> Hi Wang Wei,
>
> I am trying to understand the char-nn example, but there is still
> something that I am missing and cannot figure is out by myself.
>
> The convert function creates two numpy array x and y. As I understood the
> array x is the data and array y are labels.
>
> I checked the dimentions of these arrays.
> x.shape is (32, 100, 101)
> y.shape is (32, 100)
>
> 32 is the batch size
> 100 is the sequence size
> 101 is the vocabulary size, i.e. there ae 101 unique chars in the
> linux_input.txt.  each input from one sample and at one time step is a
> one-hot vector with all positions being 0 except the position of the
> character (set to 1).
>
>
> given a sequence of chars,   a,b,c,d,e,f
> if the input (x) is  a, b, c, d, e
> then the label is  b, c, d, e, f
>
>
>
> In my understanding you are taking a batch of 100 character and the next
> character must be the label. So according to my understanding
> x.shape must be (32, 100)
> y.shape must be (32, 1)
>
> I mean that you have a batch of 32 sample to train and each sample is a
> series of 100 character. For each sample, there must be a label, which says
> what character must follow this series. And that character is only 1.
>
> Is there anything that I do not quit understand?
>
> I would need this information in order to modify your sample program for
> the activity recognition.
> So ultimately in my use case:
> x.shape probably is (32, 561)
> y.shape probably is (32, 1)
>
>
> For you case, if you use 561 features, then how about the sequence length?
> Is 32 the batchsize?
>
> 561 are floating point features which is between [-1:1].
> 1 is the label which is in [1,2,3,4,5,6]
>
> I would appreciate your help.
> Thanks.
>
> On Sat, Sep 24, 2016 at 1:59 PM, Wang Wei <wangwei@comp.nus.edu.sg> wrote:
>
> No . Don't average them.
> xij is a a vector of 6 values. You can normalize them using standard
> normalization methods.
>
> On Sat, Sep 24, 2016 at 1:54 PM, Arash Shafiei <arash.shafiei@gmail.com>
> wrote:
>
> Thanks for the analysis. I appreciate it.
>
> There is only one thing:
> The activities do not seem to be continuous for a person. It is like
> people are told to walk for a fixed period and 128 sample in R^6 is
> collected. Then people are told to sit, etc.
>
> So the person is not the focus and the focus is one activity.
>
> We are currently working on the first approach you proposed and will see
> result.
>
> Later, we would like to try the second approach. My only concern was that
> xi0, xi1, ... are in R^6 and you propose to concatenate them. Since they
> are floating points I do not know how concatenation would work. Even if we
> average, we would lose lots of information. We will think about it.
>
> Thanks for your advices.
>
>
> On Sat, Sep 24, 2016 at 1:27 PM, Wang Wei <wangwei@comp.nus.edu.sg> wrote:
>
> Let's denote xij \in R^6 for the j-th time point of the i-th activity of a
> person,
> let yi \in R561 for the i-th activity of a person.
>
> If the activities of a person are continuous, then you have to approaches
> 1. use y0, y1, y2, .... (all activities of a person) as input, and use the
> labels l0, l1, l2... as the corresponding output of the RNN. The RNN needs
> to output a label for each activity.
> 2. use the raw data, xi0, xi1, xi2.... (all information from a activity)
> as the input, and use the label li as the output of the RNN. The RNN needs
> to output of a label for all time points of one activity.
>
>
>
> On Sat, Sep 24, 2016 at 12:33 PM, Arash Shafiei <arash.shafiei@gmail.com>
> wrote:
>
> Yes, in the raw data, for each labeled sample (activity) there are 128
> time points, each with 6 channels of floating point data. (acc-x, acc-y,
> acc-z, gyro-x, gyro-y, gyro-z)
>
> For each sample (activity) of 128 points of 6 channels, 561 features are
> generated.
>
> Each person performs almost 200 activities.
>
>
>
>
>
> On Sat, Sep 24, 2016 at 12:20 PM, Wang Wei <wangwei@comp.nus.edu.sg>
> wrote:
>
> Do you mean that in the dataset, each sample(person) has 128 time points,
> each one with 6 channels?
> If so, I think you can concatenate all 6 channels into a single channel.
>
> On Sat, Sep 24, 2016 at 12:03 PM, Arash Shafiei <arash.shafiei@gmail.com>
> wrote:
>
> Hi Wan Wei,
>
> We were wondering if the input of RNN can have multiple channel.
>
> In the example that you have for text prediction, the only channel is the
> characters entering the network.
>
> Now if there are multiple time series, then the network needs multiple
> channels.
>
> For example the raw data coming from accelerometers and gyroscopes are
> compose 6 time series. It means that the data can have 6 dimensions and
> therefore the input of network can have 6 channels.
>
> I verified the data set and it turns out that 561 features are generated
> from 128*6 raw data. So a sequence of samples has 128 values for acc-x,
> acc-y, acc-z, gyro-x, gyro-y, and gyro-z.
>
> As a result the 561 features are not time series anymore.
>
> We are thinking of:
> 1) Use a decision tree of 561 processed feature.
> 2) Use RNN for raw data.
>
> To use RNN for raw data, we would need channels for the input. Would this
> be possible with SINGA?
>
> Thanks.
>
>
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message