mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robin Anil" <robin.a...@gmail.com>
Subject Re: In case you haven't noticed
Date Mon, 20 Oct 2008 10:17:31 GMT
Hi, when I was working on Bayes Classifier, I did feel that float will
overflow/loose precision in some extraneous case. But the reason for using
float was due to the limitation of hadoop. There was no DoubleWritable
(equivalent to FloatWritable) which could be used in M/R mappers and
reduces. I would prefer sed s/float/double/g .

Robin

On Mon, Oct 20, 2008 at 3:36 PM, Sean Owen <srowen@gmail.com> wrote:

> On Sun, Oct 19, 2008 at 11:57 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
> >> I see some more complex cases (particularly the Float fields in many
> >> classes) that probably could be improved too but am being
> >> conservative.
> >
> > Can you point to an example?  You have made me very curious.
>
> Try BayesThetaNormalizerMapper -- I suspect, though am not totally
> sure, that the Float fields can be primitives. I don't see the need
> for it to be an object.
>
> >> Incidentally why aren't we using doubles? In cases where storage isn't
> >> a concern.
> >
> >
> > I have no idea.  There may be some hold-over in traditions from Lucene,
> but
> > there are not many places any more where floats are truly better.  Most
> > importantly, there are many cases where the extremely limited precision
> of
> > floats causes complete loss of all data.
>
> I agree. The above is another example where a double would be more
> appropriate I think.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message