madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fmcquillan99 <...@git.apache.org>
Subject [GitHub] madlib issue #223: Balance datasets : re-sampling technique
Date Tue, 16 Jan 2018 19:22:29 GMT
Github user fmcquillan99 commented on the issue:

    https://github.com/apache/madlib/pull/223
  
    Regarding (2) and (3) above,  looks like it does not fail with `'red:7, blue:7'` but the
MADlib convention is 'red=7, blue=7' so need to change to use `=`.
    
    (4)
    Seems to take only the 1st param in 
    ```
    DROP TABLE IF EXISTS output_table;
    SELECT madlib.balance_sample(
                                  'flags',             -- Source table
                                  'output_table',      -- Output table
                                  'mainhue',           -- Class column
                                  'red:7, blue:7');    -- Want 7 reds and 7 blues`
    SELECT * FROM output_table ORDER BY mainhue, name;
    ```
    which produces 7 red but leaves 5 blue (should be 7)
    ```
      id |    name     | landmass | zone | area | population | language | colours | mainhue

    ----+-------------+----------+------+------+------------+----------+---------+---------
      1 | Argentina   |        2 |    3 | 2777 |         28 |        2 |       2 | blue
      2 | Australia   |        6 |    2 | 7690 |         15 |        1 |       3 | blue
      8 | Greece      |        3 |    1 |  132 |         10 |        6 |       2 | blue
      9 | Guatemala   |        1 |    4 |  109 |          8 |        2 |       2 | blue
     17 | Sweden      |        3 |    1 |  450 |          8 |        6 |       2 | blue
      4 | Brazil      |        2 |    3 | 8512 |        119 |        6 |       4 | green
     11 | Jamaica     |        1 |    4 |   11 |          2 |        1 |       3 | green
     13 | Mexico      |        1 |    4 | 1973 |         77 |        2 |       4 | green
      3 | Austria     |        3 |    1 |   84 |          8 |        4 |       2 | red
      5 | Canada      |        1 |    4 | 9976 |         24 |        1 |       2 | red
      7 | Denmark     |        3 |    1 |   43 |          5 |        6 |       2 | red
     12 | Luxembourg  |        3 |    1 |    3 |          0 |        4 |       3 | red
     15 | Portugal    |        3 |    4 |   92 |         10 |        6 |       5 | red
     18 | Switzerland |        3 |    1 |   41 |          6 |        4 |       2 | red
     19 | UK          |        3 |    4 |  245 |         56 |        1 |       3 | red
     10 | Ireland     |        3 |    4 |   70 |          3 |        1 |       3 | white
     20 | USA         |        1 |    4 | 9363 |        231 |        1 |       3 | white
    (17 rows)
    ```


---

Mime
View raw message