singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Wang <wangwei...@gmail.com>
Subject Re: activity recognition using apache singa
Date Sun, 09 Oct 2016 02:55:09 GMT
Could you please paste the relevant code leading to this error?



> On 9 Oct 2016, at 10:32 AM, Arash Shafiei <arash.shafiei@gmail.com> wrote:
> 
> Thanks, it worked.
> 
> So far, I managed to do rnn::forward(...) but now I am stuck somewhere else.
> 
> rnn::forward(...) returns a tensor (denoted as lvalue). I have to obtain the L1 norm
using lvalue.l1().
> 
> But I get this error:
> [F d1009 t10:30:14 p23056:-56 /home/wuwf/work/incubator-singa/src/core/tensor/./tensor_math_cuda.h:344]
Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR
> Aborted (core dumped)
> 
>> On Sat, Oct 8, 2016 at 9:43 PM, Wang Wei <wangwei@comp.nus.edu.sg> wrote:
>> Actually, the char-rnn example is from type (4), where each rnn unit would generate
a prediction and has a ground truth label.
>> 
>> For your model (type 2), you only need to use the y128 (of shape 256, 28) from the
rnn::forward() as the input to the dense layer. All other yi should be ignored.
>> Consequently, you would have an output (denoted as o) of shape (256, 6) from the
dense layer, which is the prediction for the whole sequence (of length 128).
>> By feeding the prediction o and the label into the loss layer, you can compute the
loss value and compute the gradient for o (denoted as o').
>> Backward propagating the o through the dense layer, you would get the gradient for
y128, denoted as y'128.
>> 
>> The input of the rnn::backward() would be <y'1, y'2, ...y'128, hy', cy'>, where
only y'128 is a valid tensor. y'1, y'2 ... should be tensor with value 0.
>> 
>> Best,
>> Wei
>> 
>> 
>>> On Sat, Oct 8, 2016 at 9:33 PM Arash Shafiei <arash.shafiei@gmail.com>
wrote:
>>> Thanks. It worked.
>>> 
>>> I am now at the phase of evaluating the loss.
>>> 
>>> singa.loss.SoftmaxCrossEntropy has a forward function where it takes prediction
tensors and ground truth.
>>> 
>>> My problem now is that the prediction is a sequence and my label is not a sequence.
>>> 
>>> Your char-rnn example is an application of type (1) in the figure bellow, but
activity recognition is an application of type (2).
>>> 
>>> 
>>> 
>>> Therefore for each sequence in a batch I have only 1 label. (although this label
can be of one dimension from the set of {1,2,3,4,5,6} or of 6 dimension from the set of {
[1,0,0,0,0,0], [0,1,0,0,0,0] , etc. }
>>> 
>>> So now I need predictions and ground truth. The prediction for me is of shape
>>> (128, 256, 28)
>>> where 128 is the length of the sequence, 256 is the batch size and 28 is the
hidden layer size.
>>> 
>>> And my ground truth is of shape
>>> (256, 1) or (256, 6) -- depending on how you model it..
>>> 
>>> But as I understood from the example of char-rnn my ground truth must be of shape:
>>> (128, 256)
>>> 
>>> Would you have any insight about this?
>>> Thanks..
>>> 
>>> 
>>> On Sat, Oct 8, 2016 at 6:42 PM, Wang Wei <wangwei@comp.nus.edu.sg> wrote:
>>> Currently, numpy array of dtype=np.float32 or np.int could be converted into
singa tensor.
>>> Please convert the numpy array into np.float32 and then call tensor.from_numpy(t)
(without dtype=np.float32).
>>> 
>>> On Sat, Oct 8, 2016 at 6:36 PM Arash Shafiei <arash.shafiei@gmail.com>
wrote:
>>> The values that I have are floating points [-1 1].
>>> 
>>> While using tensor.from_numpy(...), I was getting this error:
>>> 
>>> Not implemented yet for  float64
>>> 
>>> I understood from the tutorial that we could pass the data type:
>>> y = tensor.from_numpy(..., dtype=np.float32)
>>> But using dtype, I am getting another error:
>>> 
>>> TypeError: from_numpy() got an unexpected keyword argument 'dtype'
>>> 
>>> 
>>> On Sat, Oct 8, 2016 at 3:45 PM, Wang Wei <wangwei@comp.nus.edu.sg> wrote:
>>> Hi 
>>> 
>>> According to the API of forward function: http://singa.apache.org/en/docs/layer.html#singa.layer.RNN.forward
>>> The input should be a vector of Tensors, <x1, x2, ... x128, hx, cx>, xi
is of shape (1500, 9), hx and cx are optional whose shape should be (1500, 28).
>>> The output would be a vector of Tensors, <y1, y2, ..., y128, hy, cy>, yi
is of shape (1500, 28), hy and cy are optional depending on the existence of hx and cx.
>>> If you want to put the dense layer on top of the last rnn unit (i.e. the 128-th),
then you feed y128 to the dense layer.
>>> 
>>> function convert just reshapes the raw data into a sequence of tensors <x1,
x2, ..>.
>>> 
>>> BTW, typically, people would use a smaller batchsize e.g. less than 256.
>>> 
>>> May I forward our discussion to the incubator email list in case others have
similar problems? 
>>> Thanks.
>>> 
>>> Best,
>>> Wei
>>> 
>>> So here what I have:
>>> 
>>> input batch of dimension (1500, 128, 9)
>>> This means a batch of 1500 windows each having 128 vector of 9 dimensions.
>>> 
>>> input label of dimension (1500, 6) 
>>> This means a label batch of 1500 vector of 6 dimensions. This is to label if
the person is sitting ([1,0,0,0,0,0]) or standing ([0,1,0,0,0,0]), etc.
>>> 
>>> I am creating an lstm layer with hidden_size=28 and input_sample_shape=(9,) and
num_stacks=1
>>> 
>>> Then I create a dense layer with num_output=6 and input_sample_shape=(28,)
>>> 
>>> Now I would like to feed the data to the 'forward' function of lstm and dense
layer. But I could not make it work and I could not quit understand from the example what
'convert' and 'numpy2tensors' are suppose to do...
>>> 
>>> I would appreciate your comments..
>>> 
>>> On Sun, Sep 25, 2016 at 12:23 PM, Arash Shafiei <arash.shafiei@gmail.com>
wrote:
>>> Yes, I was thinking of batch size to be 32.
>>> 
>>> Thanks. I am getting more how it works and I am thinking how RNN would be helpful.
Because we do not want to predict a sequence. We just have a sequence (in raw data) and a
set of features (in processed data) and we want to know the classification.
>>> 
>>> So I was thinking of using other approaches with SINGA. I understood that there
is also MLP. We could use MLP from SINGA to see the result first.
>>> 
>>> In this case input would be a set of 561 values with a label.
>>> Then the MLP, given a set of test data with 561 features would predict the label.
>>> 
>>> Thanks for advices..
>>> 
>>> 
>>> 
>>> On Sun, Sep 25, 2016 at 12:03 PM, Wang Wei <wangwei@comp.nus.edu.sg> wrote:
>>> 
>>> 
>>> On Sun, Sep 25, 2016 at 9:37 AM, Arash Shafiei <arash.shafiei@gmail.com>
wrote:
>>> Hi Wang Wei,
>>> 
>>> I am trying to understand the char-nn example, but there is still something that
I am missing and cannot figure is out by myself.
>>> 
>>> The convert function creates two numpy array x and y. As I understood the array
x is the data and array y are labels.
>>> 
>>> I checked the dimentions of these arrays. 
>>> x.shape is (32, 100, 101)
>>> y.shape is (32, 100)
>>> 
>>> 32 is the batch size
>>> 100 is the sequence size
>>> 101 is the vocabulary size, i.e. there ae 101 unique chars in the linux_input.txt.
 each input from one sample and at one time step is a one-hot vector with all positions being
0 except the position of the character (set to 1).
>>> 
>>> 
>>> given a sequence of chars,   a,b,c,d,e,f
>>> if the input (x) is  a, b, c, d, e
>>> then the label is  b, c, d, e, f
>>> 
>>>  
>>> In my understanding you are taking a batch of 100 character and the next character
must be the label. So according to my understanding
>>> x.shape must be (32, 100)
>>> y.shape must be (32, 1)
>>> 
>>> I mean that you have a batch of 32 sample to train and each sample is a series
of 100 character. For each sample, there must be a label, which says what character must follow
this series. And that character is only 1.
>>> 
>>> Is there anything that I do not quit understand?
>>> 
>>> I would need this information in order to modify your sample program for the
activity recognition.
>>> So ultimately in my use case:
>>> x.shape probably is (32, 561)
>>> y.shape probably is (32, 1) 
>>> 
>>> 
>>> For you case, if you use 561 features, then how about the sequence length? Is
32 the batchsize? 
>>> 561 are floating point features which is between [-1:1].
>>> 1 is the label which is in [1,2,3,4,5,6]
>>> 
>>> I would appreciate your help.
>>> Thanks.
>>> 
>>> On Sat, Sep 24, 2016 at 1:59 PM, Wang Wei <wangwei@comp.nus.edu.sg> wrote:
>>> No . Don't average them.
>>> xij is a a vector of 6 values. You can normalize them using standard normalization
methods.
>>> 
>>> On Sat, Sep 24, 2016 at 1:54 PM, Arash Shafiei <arash.shafiei@gmail.com>
wrote:
>>> Thanks for the analysis. I appreciate it.
>>> 
>>> There is only one thing:
>>> The activities do not seem to be continuous for a person. It is like people are
told to walk for a fixed period and 128 sample in R^6 is collected. Then people are told to
sit, etc.
>>> 
>>> So the person is not the focus and the focus is one activity.
>>> 
>>> We are currently working on the first approach you proposed and will see result.
>>> 
>>> Later, we would like to try the second approach. My only concern was that xi0,
xi1, ... are in R^6 and you propose to concatenate them. Since they are floating points I
do not know how concatenation would work. Even if we average, we would lose lots of information.
We will think about it.
>>> 
>>> Thanks for your advices.
>>> 
>>> 
>>> On Sat, Sep 24, 2016 at 1:27 PM, Wang Wei <wangwei@comp.nus.edu.sg> wrote:
>>> Let's denote xij \in R^6 for the j-th time point of the i-th activity of a person,
>>> let yi \in R561 for the i-th activity of a person.
>>> 
>>> If the activities of a person are continuous, then you have to approaches
>>> 1. use y0, y1, y2, .... (all activities of a person) as input, and use the labels
l0, l1, l2... as the corresponding output of the RNN. The RNN needs to output a label for
each activity.
>>> 2. use the raw data, xi0, xi1, xi2.... (all information from a activity) as the
input, and use the label li as the output of the RNN. The RNN needs to output of a label for
all time points of one activity.
>>> 
>>>  
>>> 
>>> On Sat, Sep 24, 2016 at 12:33 PM, Arash Shafiei <arash.shafiei@gmail.com>
wrote:
>>> Yes, in the raw data, for each labeled sample (activity) there are 128 time points,
each with 6 channels of floating point data. (acc-x, acc-y, acc-z, gyro-x, gyro-y, gyro-z)
>>> 
>>> For each sample (activity) of 128 points of 6 channels, 561 features are generated.
>>> 
>>> Each person performs almost 200 activities.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Sat, Sep 24, 2016 at 12:20 PM, Wang Wei <wangwei@comp.nus.edu.sg> wrote:
>>> Do you mean that in the dataset, each sample(person) has 128 time points, each
one with 6 channels?
>>> If so, I think you can concatenate all 6 channels into a single channel.
>>> 
>>> On Sat, Sep 24, 2016 at 12:03 PM, Arash Shafiei <arash.shafiei@gmail.com>
wrote:
>>> Hi Wan Wei,
>>> 
>>> We were wondering if the input of RNN can have multiple channel.
>>> 
>>> In the example that you have for text prediction, the only channel is the characters
entering the network.
>>> 
>>> Now if there are multiple time series, then the network needs multiple channels.
>>> 
>>> For example the raw data coming from accelerometers and gyroscopes are compose
6 time series. It means that the data can have 6 dimensions and therefore the input of network
can have 6 channels.
>>> 
>>> I verified the data set and it turns out that 561 features are generated from
128*6 raw data. So a sequence of samples has 128 values for acc-x, acc-y, acc-z, gyro-x, gyro-y,
and gyro-z.
>>> 
>>> As a result the 561 features are not time series anymore. 
>>> 
>>> We are thinking of:
>>> 1) Use a decision tree of 561 processed feature.
>>> 2) Use RNN for raw data.
>>> 
>>> To use RNN for raw data, we would need channels for the input. Would this be
possible with SINGA?
>>> 
>>> Thanks.
>>> 
>>> 
>>> 
> 

Mime
View raw message