oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Desruisseaux <martin.desruisse...@geomatys.fr>
Subject Re: Research project on integrating geoservices with Apache Airavata
Date Fri, 19 Apr 2013 09:09:16 GMT
Hello Amila

Le 18/04/13 20:47, AMILA RANATUNGA a écrit :
> Thank you very much for the replies so far here and in SIS dev list as 
> well. They were really helpful. We are currently writing a research 
> paper regarding what kind of model that should be used by when 
> building a geoscience gateway. We intend to discuss issues that a 
> geoscientist face during his/her research and features that should be 
> inserted to such a gateway to overcome them. Then we can consider them 
> during our main project as well. It is appreciated if you can point 
> any research papers or resources where we can see this domain from an 
> eye of a geoscientist will be really useful. Even a good case study 
> will be really helpful.

There is a chapter in my Ph.D thesis that I wrote 10 years ago [1], but 
it is in French... However there is some points:

Before to start coding open source software, I had a formation on ERDAS 
Imagine. At that time, the raster data could be either measurement (e.g. 
altitude in metres), or categories (land, forest, lake, etc.). However 
for my work in oceanography I needed a mix of both in the same raster: 
Sea Surface Temperature (SST) measurement, together with some NaN 
(Not-a-Number) values indicating that the pixel was a cloud, or a land, 
etc. The software 10 years ago was not allowing that.

My study was correlating data from remote sensing image, with fisheries 
data. From OGC perspective, this is equivalent to getting WCS and 
WebSensor to work together. Raster and sensor are two very different 
kind of data, and doing some work of the kind "I want all temperature 
data at the location and time of each sensor data, and also all 
temperature data 10 days before the time of each sensor data" was needed.

On the remote images side, my study was using 4 different kind of data: 
Sea Surface Temperature (SST), chlorophyll-a concentration, Sea Level 
Anomaly (SLA) and Ekman pumping. Each kind of data have very different 
characteristics in term spatial and temporal coverage, resolution and 
format. Handling such heterogeneous source of data was a challenge. 
Indeed, in my review of previous work, I saw many study correlating fish 
populations with temperature, or correlating fish population with 
chlorophyll, but I found no study correlating fish population to many 
parameters taken together (e.g. some condition of temperature in same 
time than some concentration of chlorophyll-a). Doing such combined 
study has been a big development effort. However it was 10 years ago, 
I'm sure the situation is different now.

For each time and location of a sensor data, I needed to interpolate the 
temperature, chlorophyll-a, etc. measurement from the raster data, at 
the sensor time, 5 days before, 10 days before, etc., compute on the fly 
some derivative quantities like gradient of temperature (i.e. apply the 
Sobel operator on rasters of SST data) again 0 day, 5 days, 10 days, 
etc. before, handle the case of missing (NaN) values (e.g. if got a NaN 
when interpolating a value using the bi-cubic interpolation, try again 
with the bi-linear interpolation since it uses less data and thus reduce 
the risk of getting NaN). So having a software doing the work 
automatically was crucial.

An other way to explain the above paragraph would be to said that for 
each sensor, we create many (potentially hundred) "virtual sensors" 
derived from remote sensing data. For example if you had a sensor 
measuring temperature inside your car, it is like attaching "virtual 
sensors" to the real sensor, where the virtual sensors behave like any 
real sensor but using the data from remote sensing images. Of course we 
have to take in account that the car is moving, to pixel requested on 
the remote sensing images is always changing.

The amount of data produced in the above step was huge. Some statistical 
tools was needed for evaluating the coefficient of correlation between 
the above "virtual sensors" and the real ones, so we can trim the 
"virtual sensors" that do not seem relevant to our study. Again, because 
of the amount of data, automation is key.

Not sure if it is of any help...



View raw message