Thank you Matthias! You are most helpful!
Thanks again!
Arijit
________________________________
From: Matthias Boehm <mboehm7@googlemail.com>
Sent: Saturday, April 22, 2017 2:20:48 AM
To: dev@systemml.incubator.apache.org
Subject: Re: Randomly Selecting rows from a dataframe
you can take for example a 1% sample of rows via a permutation matrix
(specifically selection matrix) as follows
I = (rand(rows=nrow(X), cols=1, min=0, max=1) <= 0.01);
P = removeEmpty(target=diag(I), margin="rows");
Xsample = P %*% X;
or via removeEmpty and selection vector
I = (rand(rows=nrow(X), cols=1, min=0, max=1) <= 0.01);
Xsample = removeEmpty(target=X, margin="rows", select=I);
Both should be compiled internally to very similar plans.
Regards,
Matthias
On Fri, Apr 21, 2017 at 1:42 PM, arijit chakraborty <akc14@hotmail.com>
wrote:
> Hi,
>
>
> Suppose I've a dataframe of 10 variables (X1X10) and have 1000 rows. Now
> I want to randomly select rows so that I've a subset of the dataset.
>
>
> Can anyone please help me to solve this problem?
>
>
> I tried the following code:
>
>
> randSample = sample(nrow(dataframe), 200);
>
>
> This gives me a column matrix with position of the row randomly selected.
> But I could not able to solve how from this matrix I can subset data from
> original dataframe.
>
>
> Thank you!
>
>
> Arijit
>
