GLIMS: Global Land Ice Measurements from Space

Monitoring the World's Changing Glaciers

A Method for Transferring GLIMS Analysis Products from Regional Centers to NSIDC

Specfication Version 1.2

Bruce Raup and Siri Jodha Singh Khalsa
National Snow and Ice Data Center
Boulder, Colorado
Contact: braup@nsidc.org
$Revision: 1.3 $
$Date: 2008/07/22 20:32:07 $

Contents

  1. Introduction
  2. Data Organization
  3. Discussion of Each Shapefile
  4. Literature References
  5. Notes on the Ingest Process
  6. Putting It All Together
  7. Changes from previous specification
  8. Resources

  1. Introduction

    The following outlines a method for transferring the results of GLIMS analysis performed at the Regional Centers (RCs) to NSIDC (National Snow and Ice Data Center) for ingest into the GLIMS ( Global Land Ice Measurements from Space) data base. It specifies the formats and conventions for the files that convey these data.

    In a given submission to NSIDC, a Regional Center provides information about their institution, the glaciers (names, size, etc.), and polygonal outlines circumscribing the glaciers. These different levels of information are conveyed in a hierarchy of ESRI shapefiles. Bibliographic data can be conveyed in an Endnote-formatted ASCII text file.

    Design considerations for this specification include:

    1. Platform-independence. The file formats should be readable and writable on all major computer platforms. ESRI shapefiles are platform-independent, as are ASCII text files.
    2. Minimization of programming required for Regional Centers and NSIDC.
    3. Flexibility to accommodate possible future changes.

    The ESRI shapefile format was chosen because

    1. it is a published format with wide support in the GIS (Geographic Information System) world;
    2. there are freely available, open-source libraries and utilities for reading and writing shapefiles;
    3. the open-source (and free) GIS "GRASS" can input and output shapefiles;
    4. GLIMSView, a custom tool for GLIMS analysis, exports data directly into this format;
    5. it is anticipated that most Regional Centers will do their GLIMS glacier analysis using GIS packages or GLIMSView, and therefore shapefiles will be an easily used format;

    A shapefile actually consists of three files, all with the same basename, but ending in .shp, .dbf, and .shx. The .shp file contains the points, lines, or polygons; the .dbf file contains the attributes; and the .shx file is an index file.

  2. Data Organization

    GLIMS glacier data and metadata fall into three categories; they can be:

    • per analysis session (each analysis session is done by one regional center, generally by one analyst, etc.)
    • per basin or glacier (each glacier has its own name, location, etc.)
    • per segment (each segment in a glacier outline can have its own attributes)

    This hierarchy is reflected in the choice of shapefiles and attributes below. There is a "session" shapefile to convey the highest level information (RC info), a "glaciers" shapefile to convey per-glacier attributes, and a "segments" shapefile to hold all the glacier outline segments and their attributes. In addition, there are other shapefiles to hold information about displacement vector sets, area-elevation histograms, images, point measurements, and ancillary data. In Table 1 below, "Geometry" refers to what kind of geometric objects are in the shapefiles.

    Four of the shapefiles are mandatory for any submission from an RC, as they contain the mandatory items for any submission. Not all attributes in each shapefile are mandatory, however.

    Table 1. List of shapefiles for GLIMS data transfer.
    Shapefile name Mandatory? Type* Geometry
    session.shp Y 1,5,11, or 15 outline of region, or point in middle of region, or point where RC is located
    glaciers.shp Y 1 or 11 point location of glacier
    segments.shp Y 3,5,13, or 15 line segments
    images.shp Y if no maps 5 polygon of image footprint, or part of mosaic made up from this image
    maps.shp Y if no images5 or 1 polygon or point of map location
    vec_sets.shp N 1,5,11, or 15 Center of mass (point) of vector set, or convex hull around vectors
    vec_points.shp N 3 or 13 two-point vector arcs
    histograms.shp N 1 or 11 point at center of glacier
    ancillary.shp N 1 or 5
    point_meas.shp N 1 or 11 point measurements

    * Shapefile Types (see "ESRI Shapefile Technical Description: An ESRI White Paper" for details):

    1 Point
    3 PolyLine (arc)
    5 Polygon
    11 PointZ
    13 PolyLineZ (arcz)
    15 PolygonZ

    The following table lists the attributes contained in each shapefile. The Regional Center may include additional attributes for their own purposes, but non-standard attributes will be ignored during data ingest at NSIDC. Also, optional attributes from the list below may be omitted by a Regional Center if those data do not exist.

    Table 2. List of attributes (short names) in each shapefile. For the corresponding attributes in the GLIMS database, see db_xfer_mapping.txt. Many of the attributes are to be chosen from a set list of valid values (valids). For a description of the valids for such attributes, see valids.txt
    Shapefile Attributes Required? Comment
    session Yes
    RC_ID Yes
    analy_time Yes date/time analysis was done
    src_date No date or timestamp for the data acquisition
    data_src Yes description of data source
    proc_desc Yes description of processing
    anlst_surn Yes Surname of analyst
    anlst_givn Yes Given name of analyst
    3d_desc Yes Description of how 3-D information was derived
    glaciers Yes
    ID Yes
    name No Name of glacier, if one exists
    src_date No As-of time for this glacier. This overrides src_date in "session".
    prim_class No WGMS fields
    form No
    front_char No
    long_char No
    mass_src No
    tongue_act No
    width_m No Representative width (meters)
    length_m No Representative length (meters)
    area_km2 No
    abarea_km2 No
    speed_myr No
    snwln_elev No
    wgms_id No
    local_id No
    parent_id No GLIMS ID of glacier that is parent of this glacier.
    image_id1 No supercedes value(s) in "images" for this glacier. These are to be the same kind of IDs as orig_id in the images file.
    image_id2 No Ditto. A glacier may have bridged multiple images.
    image_id3 No Ditto. A glacier may have bridged multiple images.
    image_id4 No Ditto. A glacier may have bridged multiple images.
    image_id5 No Ditto. A glacier may have bridged multiple images.
    map_id1 No supercedes value(s) in "maps" for this glacier. These are to be the same kind of IDs as user_map_id in the maps file.
    map_id2 No Ditto. A glacier may have bridged multiple images.
    map_id3 No Ditto. A glacier may have bridged multiple images.
    map_id4 No Ditto. A glacier may have bridged multiple images.
    map_id5 No Ditto. A glacier may have bridged multiple images.
    segments Yes
    category YesCan be one of: glac_bound, centerline, snow_line, intrnl_rock, pro_lake, supra_lake, debris_cov
    ID Yes glacier ID (G...) or tiepointregion ID (T...)
    type Yes "m" (measured) or "a" (arbitrary)
    label No Is this segment on a terminus? cloud? edge of data?
    loc_unc_x Yes local (within-image) location uncertainty, in meters
    loc_unc_y Yes local (within-image) location uncertainty, in meters
    glob_unc_x Yes global (geographic) location uncertainty, in meters
    glob_unc_y Yes global (geographic) location uncertainty, in meters
    left_mat No material to the left of the segment
    right_mat No material to the right of the segment
    left_feat No feature (higher level of abstraction) on left
    right_feat No feature (higher level of abstraction) on right
    images Yes (if analysis derives from imagery)
    image_id No ID of image within GLIMS (populated on ingest at NSIDC)
    inst_id No Instrument ID (valid IDs: ASTER=1, SPOT5=2, Landsat7=3, SPOT4=4, SPOT3=5, TM5=6, ERS-1 SAR=7, ERS-2 SAR=8, Ikonos1=9, Ikonos2=10, Quickbird=11)
    inst_name Yes Instrument name (e.g. ETM+, ASTER)
    orig_id Yes Original ID of image (e.g. EROS Data Center granule ID)
    imglocurl No Location (URL) of image
    acq_time Yes Time of image acquisition, in 'YYYY-MM-DD' or 'YYYY-MM-DD hh:mm:ss' format
    imgctrlon Yes Longitude of image center, in decimal degrees
    imgctrlat Yes Latitude of image center, in decimal degrees
    imglon_unc No Uncertainty of center long (meters)
    imglat_unc No Uncertainty of center lat (meters)
    image_azim No Image azimuth (deg east of north)
    cloud_pct No Percent of image obscured by clouds
    sun_azim No Solar azimuth (decimal degrees east of north)
    sun_elev No Solar elevation (decimal degrees)
    inst_zen No Instrument zenith (0 degrees = nadir)
    inst_azim No Instrumetn azimuth (deg east of north)
    projection No Name of projection
    maps Yes (if analysis derives from maps)
    usr_map_id Yes Analyst-assigned ID for the map. Should start with "RCnn", where nn is the RC number, or some other similar RC-specific marker.
    regionname Yes Name of region covered by the map
    map_title No Title or name of the map
    auth_pub No Author or publisher of the map
    pub_date No Date the map was published
    asof_date Yes As-of date for the content of the map
    pub_loc No Location of publisher of the map
    scale No Scale of the map, expressed e.g. as 1:10000
    units No Units used in the map for contours, etc.
    proj No Projection of the map, including some parameters such as UTM zone
    series_ed No Series or edition of the map
    sheet No Sheet identifier for this map
    comment No Free text field for comments or additional information
    vec_sets No
    vel_set_id Yes local ID, not stored in database
    anal_id1 Yes Analysis ID for first time
    anal_id2 Yes Analysis ID for second time
    num_vecs No Number of vectors in the vector set
    p1_loc_lon Yes first point, local uncertainty, longitude
    p1_loc_lat Yes
    p1_glb_lon Yes first point, global uncertainty, longitude
    p1_glb_lat Yes
    p2_loc_lon Yes
    p2_loc_lat Yes
    p2_glb_lon Yes
    p2_glb_lat Yes
    vec_points No
    vel_set_id Yes local ID, not store in database
    speed No speed, in m/yr
    azimuth No azimuth of vector (deg east of north)
    histograms No
    ID Yes glacier ID (will be tied to analysis ID at ingest)
    elev_m Yes
    area_km2 Yes
    binwidth_m Yes
    regist Yes these last two are rundundant (will be the same for all rows)
    ancillary No
    ID Yes glacier ID
    ancdat_loc Yes Location of dataset (e.g. URL)
    ancdat_typ Yes Data type
    ancdat_sz Yes Size of dataset (bytes)
    comment Yes Text comment
    point_meas No
    ID Yes glacier ID
    timestamp Yes Time of measurement
    longitude Yes
    latitude Yes
    elev_m No
    label Yes Short description of what the measurement is
    value Yes Numeric value
    unit Yes SI unit
    lon_unc No in meters
    lat_unc No in meters
    elev_unc No in meters
    value_unc Yes in same units as "value"
    comment No

    Names of the attribute fields in shapefiles are limited to 10 characters. Thus, please use the above names for all attributes in the .dbf files. Also, shapefile names must be in the 8.3 format (an ESRI restriction), and the three files (.shp, .shx, .dbf) should have the same basename prefix (the part of the filename left of the '.').

  3. Discussion of Each Shapefile

    session (mandatory)
    The "session" shapefile holds information that pertains to the entire analysis session. The field "analy_time" is the time the analysis was completed; the expected precision is days, not hours or seconds. The field "src_date" is for the time stamp of the data themselves. In the normal GLIMS case, the base data will be satellite imagery, so the source time stamp will be in the "images" shapefile and database table, and will not need to be put in this src_date field. This field is primarily meant for datasets that are not derived from imagery.
    "proc_desc" is a large text field meant to contain a summary description of the processing that was done, as well as pointers to further information.
    glaciers (mandatory)
    The "glaciers" table contains information that pertains to individual glaciers. At ingest, some of this information will go into the Glacier_Static table, and some into the Glacier_Dynamic table. After the first year's analysis, entries into the Glacier_Dynamic table will, of course, be new records, as they represent a new snapshot in time. If fields that go into the Glacier_Static table have new information (values different from the ones already in the database), then the new values will replace the old ones. If the name of a glacier has changed, both old and new can be included in the "name" field, as in "Whillans Ice Stream (formerly Ice Stream B)".
    For a discussion of the "image_id?" fields, see the section images below.
    segments (mandatory)
    The "segments" shapefile holds the individual segments that make up polygonal glacier outlines, centerlines, snow lines, debris boundaries, basin boundaries, and internal rock and water boundaries ("internal" meaning internal to the glacier boundary, not englacial). The category (e.g. glac_bound, centerline, etc.) and ID (glacier or tiepoint region) fields are what ties all the segments into units: an outline for this glacier, a snowline for that glacier. These fields are thus mandatory and important. The positional uncertainty fields are also mandatory, since they directly relate to the credibility and integrity of the database.
    All glacier outlines, centerlines, and tiepoint regions for a given analysis session can be stored in one shapefile, as long as they are all in a "segment"-like shapefile.
    If putting all line types (glacier outlines, centerlines, ela lines, etc.) into the same segment.shp shapefile is a burden to the RC, it is possible to submit multiple shapefiles of the same type (with different names, but with the same attributes), each containing a different line type. The attributes "category" and "ID" must be present in all these shapefiles.
    images (mandatory)
    The "images" shapefile contains the basic information about the image or images used in the analysis. For regions with small glaciers, we expect that analysts will generally analyze one image at a time, but we recognize that some Regional Centers will work from mosaicked imagery. There are three general relationships between images and glaciers:
    1. One image used for the analysis, so one image per glacier.
    2. Multiple images used for the analysis, but the information about which image(s) is/are used for which glacier is not tracked by the analyst.
    3. Multiple images are used for the analysis, and the information about which image(s) is/are used for which glacier is tracked.
    The first two cases are simple: put the information about the images in the "images" shapefile. For the last case, there are fields "image_id1" through "image_id5" in the "glaciers" shapefile, to identify which images were used to analyze which glaciers.
    vec_sets
    The "vec_sets" shapefile holds information about an entire set of displacement vectors, which were derived from analysis of two sequential images. The images must be associated with two analyses, one of which can be the current one. Use the value "0" for the current analysis. The "vel_set_id" field is a simple integer identifier that the RC assigns, in order to keep different vector sets separate.
    vec_points
    This table holds the displacement vectors themselves. The attributes "speed" and "azimuth" are optional, since that information can be derived from the geometric data in the .shp part of the shapefile.
    histograms
    In the GLIMS database, two tables are designed to hold area-elevation histograms: one to hold information about the entire dataset, and one to hold the data from each bin. Those two tables are combined into this one shapefile; thus, the "bin_width_m" and "regist" fields will be duplicated in each record for a particular glacier.
    ancillary
    The "ancillary" shapefile holds basic metadata about additional (primarily raster) datasets, such as DEMs, "mugshot" images of the glacier, etc. These sorts of raster datasets should be included in the submission as separate GeoTiff, JPEG, PNG, or GIF files (ending in .tif, .jpg, .png, or .gif).
    point_meas
    The "point_meas" shapefile holds point measurements done on a glacier, possibly in the field. The "timestamp" field is, obviously, the time the measurement was made. The "unit" field must contain the correct international symbol for the SI unit of the measurement. (See http://www.bipm.fr/enus/6_Publications/si/si-brochure.html for the definitive source of information on SI.)
  4. Literature Reference

    The "Reference_Document" table stores information about journal articles and other reference documents. A reference is conveyed in the "Endnote" export file format, described here. A Regional Center need only supply references with those tags, in plain ASCII text format. We define an additional "custom" tag (%4) to associate with each entry a list of glacier IDs.

    An example file containing two journal article records might look like this:

    %0 Journal Article
    %A Bishop, M.P.
    %A Shroder, J.F.
    %A Hickman, B.L.
    %A Copland, Luke
    %D 1997
    %T Scale-dependent analysis of satellite imagery for characterization of
    glacier surfaces in
    the Karakoram Himalaya
    %J Geomorphology
    %V 591
    %! Scale-dependent analysis of satellite imagery for characterization of
       glacier surfaces in the Karakoram Himalaya
    %F Y
    %K Batura
    %4 G090000E25000N
    %4 G090010E25010N
    %4 G090020E25020N
    %4 G090030E25030N
    
    %0 Journal Article
    %F raup:2000
    %A Raup, Bruce H.
    %A Kieffer, Hugh H.
    %A Hare, Trent M.
    %A Kargel, Jeffrey S.
    %T Generation of Data Acquisition Requests for the ASTER Satellite
       Instrument for Monitoring a Globally Distributed Target: Glaciers
    %J IEEE Transactions On Geoscience and Remote Sensing
    %V 38
    %N 2
    %P 1105--1112
    %D 2000
    %8 March
    

    Notes:

    1. A reference record can include any number of %4 tags to associate glacier IDs with the reference.
    2. This file format does not currently support comments.
    3. Blank lines separate records.
    4. There exist freely available tools for converting between various formats, including Endnote and TeX's format BibTeX.
  5. Notes on the Ingest Process

    During ingest, a perl program parses the shapefiles and checks for some errors, then writes a file of SQL statements that are then used to insert the data into the relational database.

    Whereas the GLIMS database requires that glacier outlines be closed polygons, these polygons are typically be sent to NSIDC as a set of segments, using the Shapefile entity type "PolyLineZ". This allows assignment of different attributes to different segments of the polygon. The segments making up a closed polygon are checked for closure at time of ingest into the database. Glacier outlines that do not have such attribute information (e.g. data digitized from historical maps) may be sent as complete polygons, using the Shapefile entity type "PolygonZ" or "PolyLineZ".

    The "segments" shapefile(s) should use feature type PolyLineZ (called ARCZ in shapelib) to store segments. X, Y and, optionally Z coordinates are defined for each vertex and are written to the shapefile. Also, a fourth measurement, the M variable, can be stored. This is being reserved for future use.

    Glacier and tiepoint region outlines (as closed polygons) can be pieced together upon ingest at NSIDC by extracting all the segments, in order, that have the same category (e.g. 'glac_bound') and ID.

    A given segment may be in the shapefile more than once if it is shared (in the case, say, of an ice divide). In this case, the different instances will have different values for the (probably glacier) ID attribute.

  6. Putting It All Together

    An analysis session will generate a number of files. Each shapefile consists of three files: basename.shp (contains the geographic points), basename.dbf (contains the attributes), and basename.shx (an index file). Bibliographic files should end in '.en'.

    The simplest layout for the shapefiles is to have one of each, as outlined above. In this case, all the segments, from all glaciers and regardless of category, would be in the "segments" shapefile. This is what the GLIMSView software exports, and this is the simplest for NSIDC to handle. However, we recognize that some Regional Centers may want to put all glacier outlines in one shapefile, centerlines in another, snowlines in another, etc. If this is the case, please contact NSIDC.

    An example submission is available. The example dataset consists of a number of files, all packaged into a tar file. For a submission, a regional center could also use the zip utility. Submission packages should therefore end in .tar (simple tar file), .tar.gz (gzip compressed tar file), .tgz (same as .tar.gz), or .zip (zipped file).

    In addition to the above items, we ask that you also include in the submission package the image or images that were used in the analysis. The ideal format for us would be geoTIFF, but we can accept almost any common format for georeferenced imagery. Full resolution imagery would be best; you may want to crop the imagery down to only the area containing glaciers. These images will be viewable in the GLIMS database web interface.

  7. Changes from previous specification

    • Added description of 'maps' shapefile, for conveying information about maps used in an analysis.
    • Added fields to the 'glaciers' shapefile for map IDs.

  8. Resources

    • The GLIMS Analysis Tutorial, found here.
    • Bibliography formats
    • "ESRI Shapefile Technical Description: An ESRI White Paper"
    • GLIMS web site
    • GRASS (free GIS package)
    • MapServer
    • OGR (related to GDAL, the Geographic Data Abstraction Library) an open-source C library for creating and manipulating shapefiles and many other vector data formats (replacement for the older Shapelib)
    • The SI Brochure

Questions/Comments:

  • Question: How are analysisIDs created? An analysisID is associated with a particular glacier and a particular analysis session done by an RC. An RC will submit information on many glaciers at one time.
    Answer: This is done at ingest time, simply assigning the next sequential number. RCs will provide the glacier ID and the time of analysis. The data type for the analysisID is capable of holding 4 billion (giga-) values.
  • Comment: All coordinates are stored in Lon/Lat on the WGS84 datum.
  • Question: How do we represent hierarchy or heritage (this glacier was once part of glacier X) in the transfer?
    Answer: In the Glacier_Static table, there is a field called "parent_icemass_id" to point to that glacier X. If a body of ice that was once connected to a larger body of ice is, for whatever reason, newly analyzed as its own glacier, then the ID from the older record of the parent ice mass should be in the field "parent_icemass_id" of the new record. The new treatment of of the "child glacier" could be due to actual physical separation of the ice masses, or simply to increased detail of analysis in later years.

Selected Glossary

Analysis
one snapshot of one glacier
Analysis ID
ID for one analysis of one glacier
Analysis session
a set of analyses from one region and one time, the results of which are generally submitted as one unit
Glacier ID
ID for the glacier for all time, of the form GnnnnnnEmmmmm[N|S]