From Bruce Barkstrom <brbarkst...@gmail.com>
Subject Re: My Hadoop Summit Talk: NASA+BigData
Date Wed, 20 Mar 2013 12:46:25 GMT
That may be a bit better.

However, it still isn't clear to me how the physics of the instruments
and of the data processing gets into what users understand they
can do with the data.

As I understand Big Data and analytics, it usually appears to using
a lot of statistics to find unexpected correlations in the data, but
the techniques aren't looking for causation.  If you're dealing with
scientific data, you're usually trying to get to physical causation.
That means, I think, that users need to understand how the
physics and math constrain what they can do.

Let me see if I can identify a more concrete example of a
concern.  Usually, when we want to deal with physically
connected phenomena, we want disparate data to be
observing the same chunk of space at the same time.
If the Big Data user picks up one piece of data from region
X_1 and t_1 and then develops a correlation with observations
with data from X_2 and t_2, where X_1 /= X_2 and t_1 /= t_2,
it isn't clear why that correlation has anything to do with
physical causation.  Of, to put it another way, Big Data
may just give more examples of the "cherry picking"
climate deniers do when they select data without
paying attention to the statistical and physical significance
of their "results".

So, even though the data rates are large by today's
standards, I'm not sure that, by itself, is impressive.
Maybe the relevant example would be all those statistics
on dams built or tons of steel produced by the Soviet
Union.  The hype would be more interesting if it could
talk about what new phenomena or understanding
these techniques will produce - not just the data rate
or the total amount of data being produced.

Maybe it's just a glorified popularity contest; if so,
it would seem to be at about the level of interest
of the new season of "Dancing with the Stars".
I suppose the hype is necessary to generate the
funding (which has its uses), but I'm not sure it
will do as much as a few million sent to appropriate
super PACs to move the politics of climate change

Bruce B.

On Wed, Mar 20, 2013 at 1:16 AM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hey Bruce,
> Hah!
> Unfortunately all you get is the short summary through
> the website which does make it scientifically hard to
> judge, however, then again this isn't science, it's a
> glorified popularity contest.
> I have a little bit more detailed abstract that I wrote up,
> pasted below (of course the part that they don't use to solicit votes):
> ---longer abstract
> The NASA Jet Propulsion Laboratory, California Institute of
> Technology contributes to many Big Data projects for Earth science such as
> the
> U.S. National Climate Assessment (NCA) and for astronomy such as next
> generation astronomical instruments like the Square Kilometre Array (SKA)
> that
> will generate unprecedented volumes of data (700TB/sec!).
> Through these projects, we are addressing four key
> challenges critical for the Hadoop community and broader open source Big
> Data
> community to consider: (1) unobtrusively integrating science algorithms
> into
> large scale processing systems; (2) selecting and deploying high powered
> data
> movement technologies for data staging and remote data acquisition;
> processing,
> and delivery to our customers and users; (3) better leveraging of cloud
> computing (storage and processing) technologies in NASA missions; and (4)
> technologies for automatically and rapidly extracting text and metadata
> from
> the file formats, by some estimates ranging from a few thousand to over
> fifty
> thousand in total.
> This talk will focus on those Big Data challenges, how NASA
> JPL is addressing them both technologically (Hadoop, OODT, Tika, Nutch,
> Solr)
> and from a community standpoint (Apache, interacting with open source,
> etc.).
> I¹ll also discuss the future of Big Data at JPL and NASA and how others
> can get
> Involved.
> -----
> You can think of that as the longer version of what I submitted. *grin*
> Cheers,
> Chris
> On 3/19/13 7:20 PM, "Bruce Barkstrom" <brbarkstrom@gmail.com> wrote:
> >OK, so you've got a three-word summary of some
> >hyperbole with Dumbo, the Flying Elephant.
> >How are you going to deal with the real
> >scientific constraints on the physics of combining real
> >measurement technologies and "mashing stuff together"?
> >
> >You need to remember that imaging instruments integrate
> >radiances with spectral responses and Point Spread Function
> >weighted averages over the FOV of whatever the instrument
> >was looking at - and that's just the instantaneous (L1 measurement).
> >If you do orthorectification, you've got variations in the uncertainties
> >across the image where the parts of the image where you've
> >increased the resolving power (by putting interpolated points
> >closer together) and have also increased the noise from the
> >orthorectification process that acts as a noise multiplier.
> >
> >Next, you've got stuff like cloud identification (and rejection or
> >acceptance) - which depends on spectral response, solar illumination
> >(during the day) and temperature and cloud property stuff during
> >the night - and finally, you've got temporal interpolation (not just
> >creating an average through emission driven by solar illumination
> >during the day and IR cooling at night.  Where (the hel)l is
> >the physics that deals with this stuff?  If you do get some
> >statistical stuff, why should anyone believe it contributes to
> >our understanding of climate change?
> >
> >I won't vote, but you can think of this as my input to your
> >scientific conscience.
> >
> >Bruce B.
> >
> >On Tue, Mar 19, 2013 at 7:51 PM, Mattmann, Chris A (388J) <
> >chris.a.mattmann@jpl.nasa.gov> wrote:
> >
> >> Hey Guys,
> >>
> >> I proposed a talk for NASA and Big Data at the Hadoop Summit:
> >>
> >>
> >>
> http://hadoopsummit2013.uservoice.com/forums/196822-future-of-apache-hado
> >>op
> >> /suggestions/3733470-nasa-science-and-technology-for-big-data-junkies-
> >>
> >>
> >> If you still have votes, and would like to support my talk, I'd
> >>certainly
> >> appreciate it!
> >>
> >> Thank you for considering.
> >>
> >> Cheers,
> >> Chris Mattmann
> >> Vote Herder
> >>
> >>

