Thank you Matthias! Your answer works perfectly.
And I realized the issue with my code later.
Thanks again!
Arijit
________________________________
From: Matthias Boehm <mboehm7@googlemail.com>
Sent: Monday, May 1, 2017 1:45:51 AM
To: dev@systemml.incubator.apache.org
Subject: Re: Randomly Selecting rows from a dataframe
well, you can pull a sample with replacement by constructing the
permutation matrix slightly differently (optionally you could also sort the
sample if required):
P = table(seq(1,N), sample(nrow(X),N,TRUE), N, nrow(X));
Xsample = P %*% X;
Btw, your script didn't work because removeEmpty with selection vector
expects a nonzero indicator by position not by value (e.g., nonzero in
7th cell indicates that you want to select the 7th row which ignores the
actual value you feed in).
Regards,
Matthias
On Sun, Apr 30, 2017 at 1:47 AM, arijit chakraborty <akc14@hotmail.com>
wrote:
> Hi,
>
>
> The solution Matthias gave works perfectly when we are doing random sample
> of the dataframe without replacement. But it's not working with
> replacement. E.g. if I've the original dataframe of the form
> matrix(seq(1,100,100, 1) and want to select randomly 20 rows. With Matthias
> example, we can randomly sample that and the new matrix might look like this
>
>
> matrix("1 2 3 21 29 36 37 40 45 53 55 56 71 72 79 82 90 96 97 99", 20,1).
>
>
> But if I want a matrix of this form, (which can be possible with random
> sampling with replacement)
>
>
> matrix("1 2 3 21 21 21 37 40 45 53 53 56 71 79 79 82 90 96 97 99", 20,1).
>
>
> I'm not getting it.
>
>
> I tried the following code:
>
>
> data_ind = matrix(seq(1,nrow(actual_data), 1), nrow(bdframe_bt_subset_1),
> 1)
>
> data_sample = sample(nrow(data_ind), 100, TRUE)
>
> data_sample_matrix= matrix(data_sample, 100, 1)
>
> a = matrix(0, (nrow(data_ind) nrow(data_sample_matrix)), 1)
>
> data_sample1 = rbind(data_sample, a)
>
> b = removeEmpty(target=actual_data, margin="rows", select = data_sample1);
>
> But this is not giving me the repeated row even though I can see in
> "data_sample_matrix" I've repeated position in the data.
>
> I also tried the follow "sample.dlm" in "utils" folder, but that also not
> giving me the answer I'm looking for.
>
> We can use the forloop in this case using "data_sample_matrix" matrix.
> But want to avoid looping.
>
> Can anyone please help?
>
> Thank you!
> Arijit
>
>
>
>
> ________________________________
> From: arijit chakraborty <akc14@hotmail.com>
> Sent: Saturday, April 22, 2017 12:45 PM
> To: dev@systemml.incubator.apache.org
> Subject: Re: Randomly Selecting rows from a dataframe
>
> Thank you Matthias! You are most helpful!
>
>
> Thanks again!
>
> Arijit
>
> ________________________________
> From: Matthias Boehm <mboehm7@googlemail.com>
> Sent: Saturday, April 22, 2017 2:20:48 AM
> To: dev@systemml.incubator.apache.org
> Subject: Re: Randomly Selecting rows from a dataframe
>
> you can take for example a 1% sample of rows via a permutation matrix
> (specifically selection matrix) as follows
>
> I = (rand(rows=nrow(X), cols=1, min=0, max=1) <= 0.01);
> P = removeEmpty(target=diag(I), margin="rows");
> Xsample = P %*% X;
>
> or via removeEmpty and selection vector
>
> I = (rand(rows=nrow(X), cols=1, min=0, max=1) <= 0.01);
> Xsample = removeEmpty(target=X, margin="rows", select=I);
>
> Both should be compiled internally to very similar plans.
>
> Regards,
> Matthias
>
> On Fri, Apr 21, 2017 at 1:42 PM, arijit chakraborty <akc14@hotmail.com>
> wrote:
>
> > Hi,
> >
> >
> > Suppose I've a dataframe of 10 variables (X1X10) and have 1000 rows. Now
> > I want to randomly select rows so that I've a subset of the dataset.
> >
> >
> > Can anyone please help me to solve this problem?
> >
> >
> > I tried the following code:
> >
> >
> > randSample = sample(nrow(dataframe), 200);
> >
> >
> > This gives me a column matrix with position of the row randomly selected.
> > But I could not able to solve how from this matrix I can subset data from
> > original dataframe.
> >
> >
> > Thank you!
> >
> >
> > Arijit
> >
>
