mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Random Forest possible error
Date Sat, 14 Dec 2013 21:38:56 GMT
Can you file a JIRA at https://issues.apache.org/jira/browse/MAHOUT ?

It sounds like you have a test case in mind along with your fix.  If you
could package that work up as a patch file, then it would be much
appreciated.


On Sat, Dec 14, 2013 at 9:24 AM, sam wu <swu5530@gmail.com> wrote:

> Hi,
>
> I am using random forest of Mahout. It works well when I don't use feature
> descriptor with Ignore feature ( No I flag).
>
> If using Ignore flag, the returned feature value is -1
> (for in the code dataset.valueOf(aId, token) return -1).
>
> I did some investigation, and found that there some problems in the
> DataConverter.java
>
> source code
> ------
>
>  for (int attr = 0; attr < nball; attr++) {  --51
>       if (ArrayUtils.contains(dataset.getIgnored(), attr)) {
>         continue; // IGNORED
>       }
>
>       String token = tokens[attr].trim();
>
>       if ("?".equals(token)) {
>         // missing value
>         return null;
>       }
>
>  if (dataset.isNumerical(aId)) { --63
>         vector.set(aId++, Double.parseDouble(token));
>       } else { // CATEGORICAL
>         vector.set(aId, dataset.valueOf(aId, token)); --66
>         aId++;
>       }
> -------
> Let feature descriptor be 9 I N L (Breiman Example)
> 11 features, 1-9 Ignored, 10th is Numeric, 11th is label variable
> (Is Breiman example really works  based on web instruction ?)
>
> line 51 -- attr is #feature, 0-10
> aId is filtered feature #, 0-1 ( two non-Ignored features)
> Problem in line 66
> if attr=10, Label feature
> aId=1
> token=true
> dataset.valueOf(aId, token) return -1 , for current code, CATEGORICAL
> feature valueOf() kind mixed aId and attr concept.
>
> Just by changing line 66
> vector.set(aId, dataset.valueOf(aId, token)); --66
> to vector.set(aId, dataset.valueOf(attr, token));
> not working, because some validation fails (also attr, aId mixture).
>
>
>
> There might be things that I overlook, just some thoughts.
>
>
> Sam
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message