systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From arijit chakraborty <ak...@hotmail.com>
Subject Re: Randomly Selecting rows from a dataframe
Date Sat, 22 Apr 2017 07:15:21 GMT
Thank you Matthias! You are most helpful!


Thanks again!

Arijit

________________________________
From: Matthias Boehm <mboehm7@googlemail.com>
Sent: Saturday, April 22, 2017 2:20:48 AM
To: dev@systemml.incubator.apache.org
Subject: Re: Randomly Selecting rows from a dataframe

you can take for example a 1%  sample of rows via a permutation matrix
(specifically selection matrix) as follows

I = (rand(rows=nrow(X), cols=1, min=0, max=1) <= 0.01);
P = removeEmpty(target=diag(I), margin="rows");
Xsample = P %*% X;

or via removeEmpty and selection vector

I = (rand(rows=nrow(X), cols=1, min=0, max=1) <= 0.01);
Xsample = removeEmpty(target=X, margin="rows", select=I);

Both should be compiled internally to very similar plans.

Regards,
Matthias

On Fri, Apr 21, 2017 at 1:42 PM, arijit chakraborty <akc14@hotmail.com>
wrote:

> Hi,
>
>
> Suppose I've a dataframe of 10 variables (X1-X10) and have 1000 rows. Now
> I want to randomly select rows so that I've a subset of the dataset.
>
>
> Can anyone please help me to solve this problem?
>
>
> I tried the following code:
>
>
> randSample = sample(nrow(dataframe), 200);
>
>
> This gives me a column matrix with position of the row randomly selected.
> But I could not able to solve how from this matrix I can subset data from
> original dataframe.
>
>
> Thank you!
>
>
> Arijit
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message