sdap-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eamonf...@apache.org
Subject [incubator-sdap-ingester] 01/01: Update Collection Manager readme
Date Mon, 07 Dec 2020 21:59:16 GMT
This is an automated email from the ASF dual-hosted git repository.

eamonford pushed a commit to branch update-docs
in repository https://gitbox.apache.org/repos/asf/incubator-sdap-ingester.git

commit ca3898e14aab479c5578b5f8e2a1a72183436c60
Author: Eamon Ford <eamonford@gmail.com>
AuthorDate: Mon Dec 7 13:58:59 2020 -0800

    Update Collection Manager readme
---
 collection_manager/README.md | 80 ++++++++++++++++++++++++++++++++------------
 1 file changed, 59 insertions(+), 21 deletions(-)

diff --git a/collection_manager/README.md b/collection_manager/README.md
index 84df468..bc630cd 100644
--- a/collection_manager/README.md
+++ b/collection_manager/README.md
@@ -26,7 +26,7 @@ From `incubator-sdap-ingester`, run:
 
 A path to a collections configuration file must be passed in to the Collection Manager
 at startup via the `--collections-path` parameter. Below is an example of what the 
-collections configuration file should look like:
+collections configuration file could look like:
 
 ```yaml
 # collections.yaml
@@ -34,35 +34,73 @@ collections configuration file should look like:
 collections:
 
     # The identifier for the dataset as it will appear in NEXUS.
-  - id: TELLUS_GRACE_MASCON_CRI_GRID_RL05_V2_LAND 
+  - id: "CSR-RL06-Mascons_LAND"
 
-    # The local path to watch for NetCDF granule files to be associated with this dataset.

-    # Supports glob-style patterns.
-    path: /opt/data/grace/*land*.nc 
-
-    # The name of the NetCDF variable to read when ingesting granules into NEXUS for this
dataset.
-    variable: lwe_thickness 
+    # The path to watch for NetCDF granule files to be associated with this dataset. 
+    # This can also be an S3 path prefix, for example "s3://my-bucket/path/to/granules/"
+    path: "/data/CSR-RL06-Mascons-land/" 
 
     # An integer priority level to use when publishing messages to RabbitMQ for historical
data. 
-    # Higher number = higher priority.
-    priority: 1 
+    # Higher number = higher priority. Scale is 1-10.
+    priority: 1
 
     # An integer priority level to use when publishing messages to RabbitMQ for forward-processing
data.
-    # Higher number = higher priority.
+    # Higher number = higher priority. Scale is 1-10.
     forward-processing-priority: 5 
 
-  - id: TELLUS_GRACE_MASCON_CRI_GRID_RL05_V2_OCEAN
-    path: /opt/data/grace/*ocean*.nc
-    variable: lwe_thickness
-    priority: 2
-    forward-processing-priority: 6
+    # The type of project to use when processing granules in this collection.
+    # Accepted values are Grid, ECCO, TimeSeries, or Swath.
+    projection: Grid
+
+    dimensionNames:
+      # The name of the primary variable
+      variable: lwe_thickness
+
+      # The name of the latitude variable
+      latitude: lat
+
+      # The name of the longitude variable
+      longitude: lon
+
+      # The name of the depth variable (only include if depth variable exists)
+      depth: Z 
+      
+      # The name of the time variable (only include if time variable exists)
+      time: Time
+
+    # This section is an index of each dimension on which the primary variable is dependent,
mapped to their desired slice sizes.
+    slices:
+      Z: 1 
+      Time: 1
+      lat: 60
+      lon: 60
+
+ - id: ocean-bottom-pressure 
+    path: /data/OBP/
+    priority: 6
+    forward-processing-priority: 7
+    projection: ECCO
+    dimensionNames:
+      latitude: YC
+      longitude: XC
+      time: time
+      # "tile" is required when using the ECCO projection. This refers to the name of the
dimension containing the ECCO tile index.
+      tile: tile
+      variable: OBP
+    slices:
+      time: 1
+      tile: 1
+      i: 30
+      j: 30
+```
 
-  - id: AVHRR_OI-NCEI-L4-GLOB-v2.0
-    path: /opt/data/avhrr/*.nc
-    variable: analysed_sst
-    priority: 1
+Note that the dimensions listed under `slices` will not necessarily match those under `dimensionNames`.
This is because sometimes
+the actual dimensions are referenced by index variables. 
+> **Tip:** An easy way to determine which variables go under `dimensionNames` and which
ones go under `slices` is that the variables 
+> on which the primary variable is dependent should go under `slices`, and the variables
on which _those_ variables are dependent 
+> (which could be themselves, as in the case of the first collection in the above example)
should go under `dimensionNames`. The excepction
+> to this is that the primary variable is always listed under `dimensionNames.variable`.
 
-```
 ## Running the tests
 From `incubator-sdap-ingester/`, run:
 


Mime
View raw message