oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: My Hadoop Summit Talk: NASA+BigData
Date Thu, 21 Mar 2013 02:09:09 GMT
Hey Bruce,

On 3/20/13 7:56 AM, "Bruce Barkstrom" <brbarkstrom@gmail.com> wrote:

>I'll subside after one minor note on the "sky is the archive."

Don't ever subside! I appreciate your feedback and commentary and
wholly look up to you for advice and help.

Your cynicism at the conference is totally understood amidst as you
mention your ability to download the conference (or something similar ^_^)
off of your Gmail web page :)

>
>I once had a course from W. W. Morgan, the U. Chicago prof who
>developed the atlas of stellar types (A, O, B, etc.).  He had
>the spectrum of a "standard type R".  As I recall, two weeks
>after he published his atlas with the spectra, the star defining
>the type became a variable.

Precisely.

>
>Also, I note that on this very Google Mail page, I can get
>a "Free Guide to Big Data", as well as the "IBM Big Data
>Free eBook".  I suppose I don't need to go to a conference
>to become informed.

Nah, but it would be less fun without you there! Who else will represent
the society of troublemakers, and scientific reality, that is,
the people actually doing the work?!!

Take care my friend.

Cheers,
Chris


>
>Bruce B.
>
>On Wed, Mar 20, 2013 at 10:21 AM, Mattmann, Chris A (388J) <
>chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hey Bruce,
>>
>> A couple points:
>>
>> On 3/20/13 5:46 AM, "Bruce Barkstrom" <brbarkstrom@gmail.com> wrote:
>>
>> >That may be a bit better.
>> >
>> >However, it still isn't clear to me how the physics of the instruments
>> >and of the data processing gets into what users understand they
>> >can do with the data.
>>
>> Yeah agreed. At the same time, this is kind of difficult to throw into
>> a 45 min with 15 mins "techie talk" that I haven't even prepared yet,
>> and even harder to throw in to a 100 word (what you see on the website)
>> and 200 word (longer, what I sent you) abstract that they requested.
>>
>> >
>> >As I understand Big Data and analytics, it usually appears to using
>> >a lot of statistics to find unexpected correlations in the data, but
>> >the techniques aren't looking for causation.  If you're dealing with
>> >scientific data, you're usually trying to get to physical causation.
>> >That means, I think, that users need to understand how the
>> >physics and math constrain what they can do.
>>
>> ++50 agreed.
>>
>> >
>> >Let me see if I can identify a more concrete example of a
>> >concern.  Usually, when we want to deal with physically
>> >connected phenomena, we want disparate data to be
>> >observing the same chunk of space at the same time.
>> >If the Big Data user picks up one piece of data from region
>> >X_1 and t_1 and then develops a correlation with observations
>> >with data from X_2 and t_2, where X_1 /= X_2 and t_1 /= t_2,
>> >it isn't clear why that correlation has anything to do with
>> >physical causation.  Of, to put it another way, Big Data
>> >may just give more examples of the "cherry picking"
>> >climate deniers do when they select data without
>> >paying attention to the statistical and physical significance
>> >of their "results".
>>
>> Totally agree. This is the big difference between card
>> carrying statisticians a lot of time and *computer science*
>> oriented *machine learning* people.
>>
>> >
>> >So, even though the data rates are large by today's
>> >standards, I'm not sure that, by itself, is impressive.
>>
>> Well I have to say it is impressive. Can you show me a disk
>> that can today write 700 TB/data per second? Or the filesystem
>> drivers and parallel I/O necessary to software them? Imagine in
>> astronomy, where they are moving into the time domain, and
>> away from the "sky is the archive" "so just reobserve next
>> time" mentality, and thus triage, which is super important,
>> isn't the main driver and archival is now becoming important,
>> and necessary in these eventually 700TB/sec producing systems.
>>
>> There are all sorts of IO, hardware, computer science, and
>> other advances that we don't have that are needed, and that
>> these types of examples like the SKA will drive.
>>
>> OTOH, the sheer infrastructure, domestic and international policy,
>> investment, and excitement and sense of nationality that many of
>> these new Big Data systems (especially the SKA) are creating in
>> their respective countries (e.g., in South Africa), is enough
>> to at least suggest to my evidence based mind that there is
>> something impressive here.
>>
>> >Maybe the relevant example would be all those statistics
>> >on dams built or tons of steel produced by the Soviet
>> >Union.  The hype would be more interesting if it could
>> >talk about what new phenomena or understanding
>> >these techniques will produce - not just the data rate
>> >or the total amount of data being produced.
>>
>> Agreed, lots of data has been generated for a while. However,
>> the volume (total and discrete); velocity, and variety (in
>> data types, metadata, etc.) are certainly such that they are
>> worthy of current study, at least in the area of data management.
>>
>> >
>> >Maybe it's just a glorified popularity contest; if so,
>> >it would seem to be at about the level of interest
>> >of the new season of "Dancing with the Stars".
>>
>> Perhaps, but I know you guys are interested in that show :)
>> Who's not?
>>
>> >I suppose the hype is necessary to generate the
>> >funding (which has its uses), but I'm not sure it
>> >will do as much as a few million sent to appropriate
>> >super PACs to move the politics of climate change
>> >along.
>>
>> Think of this as an IT super PAC for next generation data management
>> techniques and systems to deal with data volumes and varieties that
>> we don't have hardware or CS tools to manage yet. I'm not talking
>> about writing to tape and letting it die the morgue. I'm talking about
>> even simple things like making it available after you write it to
>>spinning
>> disk.
>>
>> Cheers,
>> Chris
>>
>> >
>> >Bruce B.
>> >
>> >On Wed, Mar 20, 2013 at 1:16 AM, Mattmann, Chris A (388J) <
>> >chris.a.mattmann@jpl.nasa.gov> wrote:
>> >
>> >> Hey Bruce,
>> >>
>> >> Hah!
>> >>
>> >> Unfortunately all you get is the short summary through
>> >> the website which does make it scientifically hard to
>> >> judge, however, then again this isn't science, it's a
>> >> glorified popularity contest.
>> >>
>> >> I have a little bit more detailed abstract that I wrote up,
>> >> pasted below (of course the part that they don't use to solicit
>>votes):
>> >>
>> >> ---longer abstract
>> >> The NASA Jet Propulsion Laboratory, California Institute of
>> >> Technology contributes to many Big Data projects for Earth science
>>such
>> >>as
>> >> the
>> >> U.S. National Climate Assessment (NCA) and for astronomy such as next
>> >> generation astronomical instruments like the Square Kilometre Array
>> >>(SKA)
>> >> that
>> >> will generate unprecedented volumes of data (700TB/sec!).
>> >>
>> >> Through these projects, we are addressing four key
>> >> challenges critical for the Hadoop community and broader open source
>>Big
>> >> Data
>> >> community to consider: (1) unobtrusively integrating science
>>algorithms
>> >> into
>> >> large scale processing systems; (2) selecting and deploying high
>>powered
>> >> data
>> >> movement technologies for data staging and remote data acquisition;
>> >> processing,
>> >> and delivery to our customers and users; (3) better leveraging of
>>cloud
>> >> computing (storage and processing) technologies in NASA missions; and
>> >>(4)
>> >> technologies for automatically and rapidly extracting text and
>>metadata
>> >> from
>> >> the file formats, by some estimates ranging from a few thousand to
>>over
>> >> fifty
>> >> thousand in total.
>> >>
>> >> This talk will focus on those Big Data challenges, how NASA
>> >> JPL is addressing them both technologically (Hadoop, OODT, Tika,
>>Nutch,
>> >> Solr)
>> >> and from a community standpoint (Apache, interacting with open
>>source,
>> >> etc.).
>> >> I¹ll also discuss the future of Big Data at JPL and NASA and how
>>others
>> >> can get
>> >> Involved.
>> >> -----
>> >>
>> >> You can think of that as the longer version of what I submitted.
>>*grin*
>> >>
>> >> Cheers,
>> >> Chris
>> >>
>> >>
>> >>
>> >> On 3/19/13 7:20 PM, "Bruce Barkstrom" <brbarkstrom@gmail.com> wrote:
>> >>
>> >> >OK, so you've got a three-word summary of some
>> >> >hyperbole with Dumbo, the Flying Elephant.
>> >> >How are you going to deal with the real
>> >> >scientific constraints on the physics of combining real
>> >> >measurement technologies and "mashing stuff together"?
>> >> >
>> >> >You need to remember that imaging instruments integrate
>> >> >radiances with spectral responses and Point Spread Function
>> >> >weighted averages over the FOV of whatever the instrument
>> >> >was looking at - and that's just the instantaneous (L1 measurement).
>> >> >If you do orthorectification, you've got variations in the
>> >>uncertainties
>> >> >across the image where the parts of the image where you've
>> >> >increased the resolving power (by putting interpolated points
>> >> >closer together) and have also increased the noise from the
>> >> >orthorectification process that acts as a noise multiplier.
>> >> >
>> >> >Next, you've got stuff like cloud identification (and rejection or
>> >> >acceptance) - which depends on spectral response, solar illumination
>> >> >(during the day) and temperature and cloud property stuff during
>> >> >the night - and finally, you've got temporal interpolation (not just
>> >> >creating an average through emission driven by solar illumination
>> >> >during the day and IR cooling at night.  Where (the hel)l is
>> >> >the physics that deals with this stuff?  If you do get some
>> >> >statistical stuff, why should anyone believe it contributes to
>> >> >our understanding of climate change?
>> >> >
>> >> >I won't vote, but you can think of this as my input to your
>> >> >scientific conscience.
>> >> >
>> >> >Bruce B.
>> >> >
>> >> >On Tue, Mar 19, 2013 at 7:51 PM, Mattmann, Chris A (388J) <
>> >> >chris.a.mattmann@jpl.nasa.gov> wrote:
>> >> >
>> >> >> Hey Guys,
>> >> >>
>> >> >> I proposed a talk for NASA and Big Data at the Hadoop Summit:
>> >> >>
>> >> >>
>> >> >>
>> >>
>> >>
>> 
>>http://hadoopsummit2013.uservoice.com/forums/196822-future-of-apache-hado
>> >> >>op
>> >> >>
>> >>/suggestions/3733470-nasa-science-and-technology-for-big-data-junkies-
>> >> >>
>> >> >>
>> >> >> If you still have votes, and would like to support my talk, I'd
>> >> >>certainly
>> >> >> appreciate it!
>> >> >>
>> >> >> Thank you for considering.
>> >> >>
>> >> >> Cheers,
>> >> >> Chris Mattmann
>> >> >> Vote Herder
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>

Mime
View raw message