Back to Part 1
CHAPTER 6 - The Use of Computer Graphics for Meteorological Data Representation
CHAPTER 7 - Data Bases in a Meteorological Environment
CHAPTER 8 - The Distributed Data Bases Concept
CHAPTER 9 - Data Monitoring
CHAPTER 10 - Computer Software Exchange
CHAPTER 11 - Endnote
LIST OF ACRONYMS


CHAPTER 6

THE USE OF COMPUTER GRAPHICS FOR METEOROLOGICAL DATA REPRESENTATION

6.1 Introduction

Computer Graphics is used to present meteorological data in a visual form. Examples include: observations, time series and other graphs, contours, satellite and radar images, animations of contours and images, and 'three dimensional' pictures of storms and other output from computer models. Generally there are two objectives in displaying the development of meteorological phenomena in space and time. Firstly, to explore and understand the data, usually using the traditional complex charts, and secondly, in presenting that understanding to other people, usually with simpler charts, such as a significant weather chart. In general, the former visualisation techniques aim at using the largest possible amount of information, within time and performance constraints and the limits of human capabilities.

Before the advent of computer graphics, meteorological graphics consisted of various hand drawn two dimensional representations: e.g. weather observations plotted on charts, analyses of various fields (e.g. pressure or wind speed and direction) or cross-sections. Computer graphics allows:

This chapter focuses on the issues raised by the use of screen based technologies for meteorological data visualisation and presentation. As discussed above, the current generation of graphical systems makes it possible to display interactively meteorological data in two and three dimensions. Future development in technology should make possible the widespread use of such systems in the l990s. Also numerical weather prediction models and remote sensing platforms are evolving, generating an increasing volume of data that requires more and more powerful data exploration and presentation tools.

6.2 Current Graphics Systems and Techniques

Approaches to obtaining meteorological graphical systems range from purchasing complete systems and maintenance from one of a number of commercial vendors to developing and maintaining all of the software for your own hardware (nobody builds their own hardware!). The latter approach has the advantage of being more flexible, but the cost of developing the system is not shared with other people.

6.2.1 Graphics Hardware

Computer graphics systems can be classified into two categories:

  1. Vector-based systems, where the graphical picture is defined in terms of lines. Pen-plotters are the only common devices in this category. Their main characteristic is that pictures can be readily scaled in size without changing their appearance, and drawing time is proportional to the complexity of the picture;

  2. Raster-based systems, where the picture is composed of a regular array of dots (called pixels). Nearly all screen based devices are in this category, as well as hardcopy devices such as fax machines, electrostatic plotters and matrix printers. The main characteristic of these systems is that the time to display a picture is largely independent of the picture's complexity but the resolution is lower than that achievable in vector systems.

Most modern systems have components from both categories, and a vector picture can be readily converted to a raster picture, but not vice-versa.

Raster-based systems can be classified into those that only accept raster pictures, such as fax machines, and those that can accept a vector picture and can convert it for display, such as most workstations and personal computers. In these latter systems, the vector to raster conversion may be done by the general purpose computer hardware, as in personal computers, or done by dedicated, specialized graphical hardware, separate from the main processor, as in most workstations.

Nearly all graphical displays are two dimensional screens, but some of the specialized graphical hardware can handle data as if it were three dimensional and project it onto the two dimensional screen, giving a perspective picture. Such devices are slightly misleadingly called three dimensional.

Software reflects the distinction between vector and raster based systems, and older proprietary software may be of one or the other kind. Much, but not all, modern software is designed for both type of graphics.

At present, hardware is developing very rapidly, but software cannot be developed rapidly enough to keep pace. As in all computer systems, the cost of developing and maintaining software is becoming the most significant part of the cost of a computer graphics system over its lifetime. A successful approach to this problem is to ensure that any software developed can be moved to new hardware quickly and easily. A corollary of this is that the data must be readily portable to the new hardware also, and so must the skills of the people maintaining the system.

6.2.2 Graphical Software

A survey of the current technology for meteorological graphics leads to the following classifications:

6.2.2.1 Two-dimensional Graphical Libraries

These systems represent the current mainstream of operational meteorology, and usually consist of graphical subroutine libraries which usually includes collections of FORTRAN callable subroutines. Such systems often make use of the GKS and CGM graphical standards (see Section 6.4). MAGICS and NCAR Graphics are meteorologically orientated examples of such libraries.

The general graphical functions available perform actions such as contouring, shading, wind field plotting, map and text representation and cross-sections. Observation plotting, thermodynamic diagrams and meteograms are specific to meteorology.

Whilst two-dimensional graphics packages are now commonplace, there is still room for improvement, especially in the area of contour labelling and overlaying, e.g. 100-500 hPa thickness on a 1000 hPa chart. Also, current contouring algorithms and techniques can only deal with smooth fields, but not abrupt changes of gradient such as occur with fronts.

Observation plotting in most existing systems is based on WMO recommended practices, which have been devised for manual plotting but are also suitable for machine plotting. However, there are many situations where observations would normally be plotted too close together for readability and where special provisions for plotting should be taken.

6.2.2.2 Two-dimensional Interactive Systems

The evolution of computer technology is enabling the widespread use of interactive two-dimensional systems. One of the important features is the ability to create and/or display an animation sequence within acceptable limits.

Some systems have evolved from satellite image processing, e.g. McIDAS from the University of Wisconsin, or from existing two-dimensional graphical libraries, e.g. MicroMAGICS and SIGMA/NCAR. Non-meteorological packages for the display of scientific data are also becoming more common and may be useful.

A general system should allow for the manipulation of fields, observations and images, both satellite and radar. The availability of interactive graphics and image editing functions are also essential and all data should be geometrically transformed to a consistent geographical base.

6.2.2.3 Three-dimensional Static Systems

These systems are characterised by being able to produce three-dimensional pictures of phenomena to be used both as a diagnostic tool and as a means of portraying and communicating results. Their effectiveness for routine forecasting tasks has not been demonstrated yet, but they are undoubtedly suited to the portrayal of three dimensional flow. Their "static" nature is because they generate animation sequences from computed pictures. Batch processing is often used in such systems.

Typical applications involve the use of a 50 x 50 grid point data at 30 levels and by showing 2 or 3 variables.

To improve performance, various distributed processing options exist, including:

Generation of pictures on a large mainframe or supercomputer with display on a mainframe graphics terminal or a workstation connected over an Ethernet type network;

Partial graphical functions on a supercomputer with additional computing performed on a workstation containing specialised graphical hardware processors;

Rendering pictures (a display process which makes an image look realistically three dimensional) on a workstation, subject to current restrictions in performance.

In the first and second above mentioned cases, data compression techniques are used to achieve acceptable transmission speed over a network, especially in the case of Ethernet. Such methods are, in general, reversible and techniques such as differencing schemes (delta modulation) combined with run-length encoding have proven to be quite effective. Benchmarks tests have reported satisfactory results, achieving compression factors of up to 4:1.

The functions needed for this class of system are general purpose graphical methods, ranging from simple surface rendering with hidden line and hidden surface removal to more elaborate methods such as transparency, volume rendering and texture mapping.

Techniques to produce pictures of near photographic quality such as ray tracing and radiosity are not widely used because of practical performance limitations.

6.2.2.4 Three-dimensional Interactive Systems

The interactive control and manipulation of large data sets in meteorology represents a promising approach to the visualisation of meteorological information. In such systems, the user controls the display by means of a graphical device, and the system is able to give prompt response to commands such as rotate, zoom and pan, i.e. the frame rate gives a perception of continuous motion.

It has been observed that static three-dimensional graphical images are sometimes visually ambiguous. Use of interactive rotation and animation is an effective means to resolve that ambiguity. To this end, an environment with real-time rendering capability is required. Such environments may be achieved by very fast graphics workstations or by a supercomputer with a fast link to a workstation or framebuffer.

Interaction also allows the user to control the information content of the animated display in order to rapidly search for relevant information in large data sets.

Three-dimensional interactive systems use the same kind of functions as three-dimensional static systems, subject to the restrictions of real-time picture display. One effective function of three-dimensional interactive systems is the depiction of trajectories.

Current technology is only able to deal with limited subregions of model output data sets, and therefore the issues of selection from a large database plays an important role.

Due to the rapid evolution of systems in this area no consensus has yet been reached as to what proposals seem the most promising or fruitful.

A desire was expressed for the more widespread availability of application level interfaces. Whilst a tool-kit like AVS is considered to have the correct balance between a high-level interface and functionality, its dependence on a particular vendor limits its more widespread use.

The current status in three-dimensional software environments is seen as not providing sufficient evidence as to which graphics support packages should be used for visualisation software. A general guidance to encapsulate the non-portable part of the system in separate entities is the only option available for system designers at present.

As meteorological data sets are usually simple in structure compared to CAD/CAM structures, Meteorological systems should be able to survive the current "shake-out" in graphical systems.

6.3 The User Interface

In interactive systems, the user interface is considered to be as important as the graphical functionality. Therefore, great care should be taken to design and implement effective interfaces as these are seen as a crucial factor affecting the usability of the system.

A general preference for "pull-down" menu systems has been expressed and "Macintosh-like" window/mouse systems are becoming commonplace. The use of multiple windows for displaying related types of data is seen as a useful adjunct to the simple overlay of graphical and image information. Other approaches such as the use of subordinate screens displaying menus at all times are also appropriate.

As meteorological graphics are generally complicated, their use in windowing systems is more effective on larger computer screens.

The choices should be "data-driven" where the actions the user can take is driven by the meteorological data to be manipulated. The alternative, a "function-driven" system, is one where the choice of actions is made before data selection is seen as inappropriate for operational environments.

The system should also have default parameters that vary according to the meteorological variables being portrayed. There should be both user definable and system defined default values to which the user may return if desired.

A combination of both ease-of-use and flexibility is needed. Systems developers should be aware that user interface design is still a new topic in computer science. Most experts recommend a prototyping phase before the final system design is completed. Users should be allowed to interact with a pre-release version of the system in order to provide feedback.

Due to the very specialised nature of some graphical techniques it is recommended that the user is confronted with meteorologically relevant choices. This is seen as a crucial step towards making systems acceptable to the meteorological community.

Animation is considered an essential tool, with three performance thresholds:

Acceptable minimum: circa 1 frame per second;

Motion tracking: 2 to 4 frames per second and the user is able to track the motion of elements in the scene;

Smooth fusion: more than 8 frames per second and the images are still perceived as an animation sequence but they appear as a continuous smooth motion.

The latter is at the limits of today's technology.

User interfaces should present meteorologically meaningful parameters that minimise complexity and they should be designed for portability over a wide variety of computing environments and for easy and flexible use by the meteorological community. To achieve this, it is preferable that the software should have a clear internal interface to separate the graphical display from meteorological data handling systems.

When developing user interfaces there is a need for high level tool-kits and the use of industry standards such as OPEN LOOK(UI) or Motif (OSF) is considered highly desirable.

6.4 Standards for Meteorological Graphics

Meteorological standards such as GRIB and BUFR represent the data and its geographical coordinates, whereas graphical standards are only concerned with the presentation of pictures on graphical devices, whether screens or hardcopy, and the device's coordinate system.

After ten years of effort by many people, computer graphics acquired its first international standard GKS in 1985. GKS is the main building block of a set of inter-related standards covering the whole area of graphics. GKS concentrates on standardising the interface between application software and a two dimensional graphics system, thus allowing portability of applications across different graphics devices and computers. This saves money in the longer term, as application software is usually in existence for longer than any hardware.

Most graphical standards are defined functionally with a separate specification of how they should be realised in terms of a programing language interface to the application or a protocol between the functions and a device. How to invoke graphical functions are prescribed in a series of separate standards called language bindings. The 'binding' of functions into a language can only be specified in a standard way if the programing language itself is standardised. For example, GKS, at present, has agreed bindings to FORTRAN, Ada, Pascal and C. Protocols may have separate versions, according to need, for a given functionality. These are known as encodings.

Because standards are independent of computer vendors and are achieved by consensus, they may take years to be produced. Appendix A describes the various stages involved. These standards are known as 'de jure' standards, or international standards, and must be distinguished from an alternative meaning of the word 'standard' that is becoming commonplace.

'Standard' often means a 'de facto' standard - something implemented by one particular vendor, and has become so prevalent that it has a virtual monopoly of the marketplace. These 'standards' are prone to alteration by the vendor to suit their purposes, and may not assist in the task of moving software to different hardware.

6.4.1 Graphical Kernel System (GKS)

The major features of GKS are:

  1. 2D graphics only;

  2. Supports both vector and raster devices, and multiple devices at once;

  3. User defined rectilinear coordinates;

  4. Comprehensive description of input;

  5. Grouping of parts of pictures into segments, but no segments within another;

  6. Different levels of support: with or without input, with or without segments;

  7. Supports both long term picture storage and audit trail as virtual devices;

  8. Supports both default 'bundled' or explicit setting of specifications of graphical elements such as colour, size and style;

  9. No current position.

Its advantages are:

  1. It is widely available;

  2. It has established a consistent comprehensive terminology for graphics;

  3. It can be implemented efficiently;

  4. It has established a consistent model of the graphics processes.

Its disadvantages are:

  1. The single level segment hierarchy is not sufficient for many applications;
  2. Initially there were too many implementation differences between different vendors.

GKS provides a standard for graphics in two dimensions (both input and output). A GKS implementation without input is called level 'a', with synchronous input is called level 'b' and with asynchronous input is called level 'c'. An implementation without segmentation is called level '0'. With segments on specific devices, it is level '1', and if the segments can be moved between devices (using a workstation independent segment store) it is level '2'. The most common implementations are levels 0a, 2b and 2c.

The 'bundle' concept allows applications to support very disparate devices efficiently, such as a monochrome pen plotter and a full colour workstation.

The 'current position' is the location of a conceptual pen that draws the picture. Such a concept imposes an unnecessary, and arbitrary, sequential order on the graphics, which may inhibit multiprocessing or parallelisation. This is why the concept was removed from GKS.

The philosophy for GKS is that the functions requested by the application are for almost immediate action. The segmentation facility provides an on-line method of storage of transient graphical information but is not designed for longer term storage between sessions. Once the workstation is closed, the segment store ceases to exist.

GKS recognised the need for the storage of graphical information between sessions and initially included within it a GKS Metafile facility which allowed an audit trail of GKS commands (used to create and manipulate pictures) to be stored and later retrieved and executed.

Metafiles are called such because they can contain metadata, information associated with a picture, but not actually necessary to construct the picture (e.g. a list of stations not plotted).

Annex E to the standard defines this GKS Metafile. The annex is not an integral part of the standard but, if present, will allow communications between GKS systems or long-term storage and auditing within a GKS system.

Once it became clear that there was likely to be more than one graphics standard at the functional level and all would have a need for long term storage and retrieval, it was decided to define the metafile function as a separate standard, the Computer Graphics Metafile (CGM). CGM is a facility for picture storage independent of, but still closely related to, the GKS standard.

The GKS standard contains a set of functions for reading and writing metafiles. The intention is that these functions could be used also to read and write CGM Metafiles.

6.4.2 Graphical Kernel System Metafile (GKSM)

From the programming point of view, the GKS Metafile looks very much like a workstation. Once the special workstation defined as a metafile is opened, any graphical commands obeyed are stored in the metafile. This continues until the metafile is closed. This similarity between a workstation and a metafile implies that there is a close relationship between the protocol used to define the metafile and that required to define the interface between GKS and the virtual device. (The standards activity to provide an interface to the graphical device is the Computer Graphics Interface, CGI.)

GKSM only defines a sequential file. Direct access facilities could be simulated with indexed sequential files in a reasonable manner, but the GKS standard only defines an interface for sequential access. Further, the interpretation of a GKSM metafile is dependent on the exact GKS implementation in use.

6.4.3 Computer Graphics Metafile (CGM)

CGM is the standard for 2D picture storage and transfer. CGM is defined as being compatible with GKS but allows a wider range of functionality so that it can be used for interchange of graphical information between other systems, not just GKS. A key element in the philosophy is that the process creating the information in the CGM can be separated in space and time from the process using it. Thus CGM could be used to generate a magnetic tape to be read at a remote installation many weeks later using a different type of graphics system from the one that generated it.

CGM is effectively transporting a virtual picture and, consequently, defines all picture elements in Virtual Device Coordinates which are closely linked to the Normalised Device Coordinates of GKS. CGM used to be known as Virtual Device Metafile (VDM).

The CGM description includes the coding of how the information is formatted. The CGM standard document now consists of four parts. The first contains the functional specification of any conforming metafile. The other three parts contain specifications of three methods of encoding, each with its own particular goal.

Character Encoding.

This is intended for use where it is important to minimize the size of the metafile; where necessary, this is regarded as more important than processing speed. This encoding makes it suitable for transmission through 'ASCII' networks.

Binary Encoding.

This aims to minimize the processor effort required to generate and/or interpret the metafile. It is therefore highly suitable for storage and retrieval of graphical data within a computer system.

Clear Text Encoding.

This encoding is aimed at the requirement of having a metafile that can be read and edited by people. It is also very safe to transport, even between systems with different character sets.

There are no facilities defined for direct access. A sequential format is obviously more straight forward and compact for both storage and transfer of data. Direct access could be defined but the overheads of direct access may be unacceptable. However most of the facilities of direct access could be supplied by indexed sequential formats without contravening the standard by using the facility to embed 'user data' within the graphics file.

Software to generate CGM files and to interpret them are widely available commercially, especially on PCs. CGMs are also recognised as a specific file type by the ISO standard File Transfer And Management (FTAM).

As the understanding of the graphics standards has improved over the last several years, extra features are now required to be transported or stored in CGMs, so a series of additions to the standard have been developed. These are called Addenda and there are three at present.

Addendum 1 supports segments, so that, for example, a map background need only be stored once at the beginning of a metafile, and is then invoked in each of the pictures within the metafile that require it. Segments were deliberately excluded from CGM to ensure compatibility with various levels of differing graphics standards. Attempting to superimpose segmentation structures onto a basic CGM file is a non-trivial exercise, requiring the recording of various segment transforms and attributes.

The Addendum 1 also supports Pixel Arrays. Standards such as GKS and PHIGS support Cell Arrays, which allow the display of fully transformable, device independent 'raster' images. As such, they are too expensive for most hardware and realistic applications. They have been shown to be useful for low resolution images, such as radar, and they do allow accurate overlays with vector graphics. The addition of Pixel Arrays allows higher resolution imagery to be displayed efficiently, albeit only in a device dependent way. For example, it would allow map coastlines, drawn as vectors, to be precisely overlaid on a satellite image, or a raster image from a scanner, PC painting program or PC screen capture program to be combined with vector graphics, though only correctly for one specific device at a time.

Addendum 1 is fully approved and published, and software is starting to be available.

Addendum 2 is a 3D extension and makes CGM able to support GKS-3D or PHIGS. The draft standard has been finalised and will be issued not as an addendum, but as parts 5 to 8 of the CGM standard and known as CGM-3D.

Addendum 3 will support extra features required by the more sophisticated hardware starting to become available, such as Non-Uniform Rational Beta Splines (NURBS) and sophisticated fonts for text. Externally defined libraries of symbols, such as the WMO weather symbols, will be usable. It is still a draft proposal.

ISO are proposing to update and revise the CGM standard, incorporating Addenda 1 and 3, and correcting a few errors and misprints. The revised standard will be completely compatible with the existing CGM standard (i.e. existing CGM metafiles will still be interpreted and displayed correctly).

6.4.4 Computer Graphics Interface (CGI)

The existence of the GKS standard, the interface between an application and the graphical system, implies an interface (or interfaces) between the graphical system and the hardware, at the 'bottom' of GKS (or any other multi-device standard). This used to be known as the Device Dependent Interface, or Virtual Device Interface (VDI). Because the range of hardware to be supported is vast, from pen-plotters to workstations, CGI is a complex standard, with many options (which probably accounts for its unfinished state after 4 years!).

CGI only supports one device at a time, and does not have comprehensive error checking specified. It is appropriate, for example, for systems designers building graphics cards for PCs.

CGI is quite closely related to the basic CGM standard, which implements a well defined subset of CGI.

6.4.5 Graphical Kernel System - Three Dimensional (GKS-3D)

GKS-3D is a standard for the generation of pictures in a 3D viewing space. It is completely compatible with GKS (2D). It also supports 3D display devices, should these ever become widespread. Its main features are, in addition to the GKS (2D) features; (a) hidden line and hidden surface removal; and (b) perspective and orthographic views.

Its main advantage is in addition to the GKS (2D), multiple views of the same conceptual picture (unlike GKS).

Its main disadvantages are the same as GKS (2D).

6.4.6 Programmer's Hierarchical Interactive Graphics System (PHIGS)

PHIGS is a standard for 3D graphical applications, but has a more sophisticated hierarchical segmentation system than GKS-3D. The main feature is that as well as performing the graphics, it models the solids being portrayed. They are constructed hierarchically in terms of, typically, polylines and polygons. This modelled structure can be dynamically altered, independently of the graphics display. The model structure can be stored for archive purposes, whereas CGM would be used for capturing and storing the resultant picture.

It is suited to rapid dynamic interaction with models/pictures, such as simulated robot arms, or flight simulation. The storage and manipulation of the model causes performance problems on anything other than a very powerful workstation. PHIGS also requires very skilled programmers. It is hoped that in the future, software in the form of higher level tools will make programming and debugging of PHIGS programs easier.

PHIGS has language bindings to FORTRAN, Pascal, Ada and C in progress.

PHIGS was designed to be as compatible as possible with GKS and GKS3D.

6.4.7 PHIGS Plus (PHIGS+)

PHIGS+, pronounced PHIGS PLUS, is an extension to PHIGS which has now reached the Draft International Standard stage. The graphical primitives in PHIGS are identical to those in GKS, GKS-3D and CGM. However, some three dimensional graphics hardware can now support many more sophisticated entities, such as strips of triangles, lit and shaded polygons and splines. PHIGS+ will support these extra facilities, which would typically be used to construct pictures that approach photographs in their realism and quality ('photo-realism'). The PHIGS component is of course totally compatible with PHIGS, and has been shown to be relatively portable software. At present, however, the PLUS part, to do with lighting and shading, is not yet very portable.

6.4.8 PEX and X-Windows

X is becoming a widespread networking protocol amongst workstations connected by Ethernet. X recognises that most workstations support raster and vector graphics and windowing systems. PEX is a proposal to integrate PHIGS+ and the X protocol. It may be some time before PEX becomes an international standard, because of some fundamental technical issues involved in making X an international standard.

Programming X directly is extremely complicated, and is definitely not recommended. For graphical applications, it is much more preferable to program using GKS or PHIGS that have been implemented on top of X. Such systems are commercially available now. The current release of X is X11.5, and this included a PHIGS implementation, and X11.4 included a GKS implementation. For non-graphical applications, other software toolkits are available for use on top of X. Some of these can be used interactively to generate applications, or to configure sophisticated user interfaces, but without conventional programming. Very many are now available, and no specific recommendations can be made.

6.4.9 GKS9X

Because all international standards are reviewed every 5 years, GKS was reviewed in 1990 and a decision was taken to update, rather than continue it unchanged. This was because experience, the evolution of the other standards, newer hardware technology, and a better theoretical understanding of graphical issues have identified certain shortcomings in GKS. The revision is still at a Draft Proposal stage, but will be compatible with the current GKS and will support some of the features introduced in CGM and PHIGS. It will also introduce some features more powerful than those in PHIGS, on both output and input.

6.4.10 Postscript

Postscript is a proprietary format owned by an American company, Adobe. Strictly speaking, it is a low level programming language, but is invariably generated from higher level graphics packages (numerous GKS implementations have Postscript drivers). Numerous graphics devices have specific hardware to interpret the format. It can produce very sophisticated effects, but is very much less compact than CGM for meteorological graphics. It is unsuitable for storage or transmission over medium or low speed lines.

6.4.11 CCITT Recommendation T.4

T4 is a CCITT standard for the transmission of Group 3 facsimile. Its advantages are that fax machines are now becoming very widespread and are a very convenient method of dissemination of graphical products. T4 has the disadvantage that it can only transmit binary images (i.e. black and white, without even greyscales), though CCITT are working on a future standard for the transmission of greyscale and coloured images. T4 images are also much bigger than CGMs, and therefore unsuitable for long term storage of graphical pictures or transmission where bandwidth is a problem. Also, because they are bit maps, all structure in the original picture has been lost and it cannot easily be reconstructed for further processing.

T6 is a similar standard for higher resolution images, with very similar characteristics.

Software to generate T4 or T6 formats from high level graphics packages are only just becoming available.

6.4.12 ISO 10646

As more countries start to use computers widely, there is a greater and greater need to represent all of the alphabets, syllabaries and ideographic character sets properly and effectively.

ISO started working on standard 10646 (the original 7-bit internationalized ASCII is IS-646) allowing up to 32 bits to represent each character, with options for switching into 24, 16, 8 and 7 bit modes. In response to this very complex proposal, a consortium of computer companies, originally mainly American, but now world wide, developed Unicode, a fixed width, 16 bit encoding of all the required characters. To ensure that all the world's characters fitted into just 64K entries, the duplicate entries between Chinese, Japanese and Korean character set were eliminated after these countries agreed.

In 1991, the appropriated ISO committee agreed that Unicode could be encompassed by the DP 10646 standard, with it forming one 16 bit plane of the full 32 bit character space.

6.5 Representation of Image Data

Ideally, there should be only one standard that should cater for different image sources (satellite and radar) and for mapped or un-mapped imagery. In this respect, the proposal to extend the BUFR format to account for radar imagery is seen as conflicting with this basic conception. A possible solution would be to extend GRIB by including provisions for features such as run-length encoding, since there is no conceptual difference between image data representation and model output.

The International Standards Organisation (ISO) has defined a format for the storage and transmission of graphics and image files, called the Computer Graphics Metafile (CGM). CGM has the provision for including other data (metadata) in conjunction with the picture data, such as instrumental data or geographical coordinates. A standard format for embedding this meteorological data has not been defined.

An addition to the CGM standard has been defined to allow the efficient representation of imagery, and the efficient storage of repeated components (known as segments) of pictures, such as coastlines. This is known as Addendum 1 to CGM. CGM and Addendum 1 have been shown to be at least as compact as many proprietary file formats for imagery, contour and plotted charts.

Plotted charts are likely to become even more compactly stored when Addendum 3 is finalised in 1992. This will allow the storage of the meteorological weather symbols outside of the CGM in a library of symbols and text fonts.

6.6 Standard Environments

Two environments are emerging as the principal platforms for two-dimensional interactive systems: PC type systems, and UNIX workstations.

The typical PC system is based on the DOS operating system, and the VGA/EGA display boards. It is possible to enhance PCs with higher resolution boards, but these must be seen as alternatives rather than replacing the standard displays. In the case of developing countries, this environment is seen as particularly suitable due to the increasing availability of various types of PC. The use of a standard such as GKS is seen as very useful, since it provides a common ground for developing a variety of device drivers.

UNIX workstations are seen as becoming more and more widespread as, in general, they have better performance and screen resolution than a PC. In this environment the X-Windows system is the most widespread tool for system development, combined with higher level user interface tool-kits such as OPEN LOOK or Motif. Another important feature of UNIX workstations is the availability of common file systems such as NFS and network protocols such as TCP/IP (in the future, to be complemented by OSI standards).

High performance of three-dimensional interactive visualisation is only available from a limited number of vendors, all of whom offer UNIX-based workstations. There is a definitive preference for UNIX workstations and systems for three-dimensional graphics environments, subject to the problems of standardising the development environment as discussed above.


APPENDIX A

A standards document goes through various stages within ISO before becoming an international standard. These are:

Work Item, recently renamed Working Draft or Committee Draft. An official project within ISO with agreed scope and goals, assigned to an appropriate committee;

Draft Proposal. The appropriate committee produce successive Working Drafts until it is considered mature enough to be registered as a draft proposal for a standard. DPs are circulated within ISO for technical review and ballot. If it passes the ballot it goes on to be a draft standard, otherwise it is revised in light of the review and becomes a further draft proposal;

Draft International Standard. When sufficient agreement is reached on the DP document, it is registered as a DIS and made available publicly. This should indicate that technical agreement has been reached. The document is circulated within ISO for editorial review and a DIS ballot;

International Standard. The DIS, revised in the light of the DIS ballot becomes the Final Text. After another final ballot, the Final Text is published as an IS. The document will be reviewed after five years and will then be either endorsed, revised or abandoned.

In theory, all technical changes should be complete by the DIS stage and this is, therefore, a reasonable time to start using the standard. However the move from DIS to IS sometimes includes some minor technical revisions, if only to remove ambiguities. If a proposal has only reached the DP stage, it is quite likely that significant changes will appear before it reaches DIS status.

The complete process, from WI to full IS may take at least two or three years. If standards have been developed outside of ISO by another acceptable international body, such as CCITT or IEEE, they may follow a 'fast track' and be adopted directly at the DIS stage.


CHAPTER 7

DATABASES IN A METEOROLOGICAL ENVIRONMENT

7.1 Introduction

WWW DM is concerned not only with ways of representing data for transmission and display, but also with identifying efficient methods of storage and retrieval. Through the 1960s and 70s the data base requirements of meteorological services tended to be unique, and given the small size of the sector, poorly served by the commercial sector. Over the past few years however, commercial databases have grown in both size and stature to the point where they can now be considered for use in some meteorological environments. The factors which facilitate this are the availability, relatively cheaply, of large scale computing environments and the adoption of, and adherence to, rigorous standards of both the database itself and the interface to the database through Structured Query Language (SQL).

7.2 Database Technology

A database is generally accepted to be a set of data controlled by a Database Management System (DBMS), which supplies defined standard mechanisms for the storage and retrieval of the data. These mechanisms allow logical associations to be defined and for associated data to be retrieved together, and for details of any physical structure, as opposed to logical structure, of the database to be hidden from the users. The DBMS also has procedures hidden from the user for backing up, archiving and restoration in the event of failure. A single DBMS may control a number of apparently separate databases.

As these features are programmed into the DBMS once, the applications using the database can be made simpler, and hence easier to maintain. If the physical format of the data needs to be altered to improve efficiency, the applications need not be aware of this change. Similarly if new data types are added, the applications only need altering if they need the new data.

If application software makes use of explicit relationships between data, it can be much more efficient, in computer resources, than using a general database. However, as the range and complexity of data relationships increases, the increasing burden of application software maintenance favours the use of a database, especially as hardware continues to decrease in cost relative to software.

7.2.1 Indexed Sequential Data Bases

The simplest databases store data in records, and complete records have to be retrieved to inspect any data within the record. The records often have an Indexed Sequential structure. Many cheap personal computer databases are based on this technique. The WMO Volume A catalogue of stations has this kind of structure. Some items of information may be duplicated numerous times in different records. The physical structure of the data is not usually hidden from the applications retrieving the data.

7.2.2 Hierarchical Data Bases

Hierarchical databases define the data in terms of 'trees', so that duplication is minimised. Such databases allow efficient retrieval of data, providing it is required according to the pre-defined tree structure. These databases then evolved into network databases, where more flexibility in different ways of accessing the data was allowed. Such databases are called CODASYL after the committee that originally defined them. The networks have to be defined in advance when the database is constructed. Data of the same kind is often linked by pointers, and applications 'navigate' around the data base, retrieving linked data in successive retrievals.

7.2.3 Relational Data Bases

The relational data base is based on the mathematical theory of relations (set theory) and as such has a sound underlying basis. It was developed in the early 70s at IBM by Ted Codd and his associates. The basic data structure of the relational database is the table which is used to represent a relation. Each row of a table represents a different record and each column of a table represents a different field. The order of rows or columns is insignificant. One of the main tenets of relational database theory is that one fact is stored in one place in the database.

There is a separate theory called "normalisation" that is concerned with arranging the structure of the tables in a database (which columns appear in which tables) to ensure this. For each table, a group of columns is used to uniquely identify each row. That group of columns is known as the primary key. A foreign key is one or more columns in one table that serve as the primary key of another table. This allows joining together of information across tables. Computer science has developed, amongst other languages, the relational calculus for manipulating data in the various tables of a database. It is non-procedural in that it specifies what data to use rather than how to access that data. Relational calculus has the property of closure in that the result of applying a relational calculus expression to a table is another table. SQL is based on the relational calculus. It also includes a means of defining the tables, and creating and deleting them as well as formatting and security features. For a system to be relational, it must at least:

Represent all information as values in tables;

Have no user visible links between tables;

Support the relational calculus (or an equivalent).

7.2.4 Semantic Data Bases

If the relationships between the data changes faster than the data, it is more appropriate to store the relationships (links) between data items explicitly, rather than as tables. Such a database is known as a semantic database. These are not widely available commercially, and consume even more computing power.

7.2.5 Object Oriented Data Bases

Object-oriented databases could be considered an extension of semantic databases, where items and their behaviour are stored. For example, a SYNOP message could be stored, and the database would link it to the software that knew how to plot a surface synoptic plot model, as opposed to a contouring package. A GRIB or GRID message would be linked automatically to the contouring package but not the plotting software. Object-oriented databases are not widely available and are not well understood, and will undoubtedly be expensive! They do not necessarily impose a well-defined structure on the data.

7.3 Query Languages

Most commercially available databases have a query language for querying the contents. Such languages may be interactive, or embedded in application programs. The latest are all non-procedural in that they specify logically what data is required, rather than 'how', i.e. specifying the procedure that the database must follow to extract the data.

The Standard Query Language (SQL) is such a language that has been standardised internationally (ISO 9075:1987) and is supported by most commercially available databases. SQL is based on the relational data model, though it does not necessarily have to be used with a relational database.

7.4 Data Base Technologies Employed by Meteorological Services

Modern data base concepts and technologies are very attractive because they provide users of data, and information derived from that data, with the tools to maximize the return from their investment in that data. Were meteorological services commencing operations from the present time, given adequate resources it would be relatively easy to adopt the new technologies. Most weather services however have been providing services for a considerable time and have a legacy of data accumulated under traditional data management practices.

Traditionally, applications built and used by the meteorological community were considered in relative isolation. Data that should have been together was not. The potential for flexible enquiries and reporting was thus limited. Additionally, the same data was held in several places on several different computer files. This again lead to applications in relative isolation and the fear of interfering with the data files of an existing application as well as update synchronisation problems if copies of the files were kept at several sites. From these experiences many lessons have been learnt concerning the construction of more flexible, and efficient meteorological data bases.

In particular, the storage formats, and retrieval formats, should be as similar as possible for different databases. Then retrieval and storage programs could be reused and software simplified. Ideally, the retrieval formats should be portable, preferably across networks, to allow the distribution of applications where appropriate.

A logical retrieval interface for observational data (i.e. retrieval by element, time, location, level, etc) is applicable also to the retrieval of field data from databases. Because an entity in such a database is generally two or three magnitudes bigger in size than an observation, and two orders of magnitude fewer in numbers, it may not be appropriate to store these data in the same database as observations unless efficiency is sacrificed. (An observation is typically 100 Bytes, but a radar image may be 64KB, and a model field is similar.) Physically separate databases could, and where possible should be implemented with the same interface.

7.4.1 Relational Databases and Meteorology

As noted above, the three broad categories of meteorological data are observations, model output in the form of gridded fields, and satellite and radar data in the form of images. Unfortunately, two of these - grids and images - do not conveniently fall into any type of relational database type of field - these basically being numbers and characters. However, with the fax and image revolution, a lot of vendors are now supporting another type of field in the form of variable length bit-strings called BLOBS (Binary Large OBjects). These can be used for the storage and retrieval of gridded fields and image data. However, the database knows nothing about these fields other than their length. The concept of differencing two BLOBS, for example, (when it makes sense to do so) may not be supported by the database whereas differencing two numeric fields obviously is. The database may support users writing code to perform operations like these - called BLOB filters - and store this code centrally in the database but this facility is not universal. Observational data may be stored in the database and the database interrogated and data retrieved quite easily. The inhibiting factor here is volume.

Some database designs have used the communications standard BUFR and GRIB for storage of meteorological data. BUFR and GRIB cut down the size of data being stored and thus reduce access and/or storage time. This may be particularly relevant for NWP modelers. The packing or unpacking of data can be overlapped with storing or fetching of the next record for efficient processing. This design works especially well with GRIB where there is little variation in how the data is to be viewed (usually, gridded data is needed in its entirety for processing) but again, translating the observations into BUFR code means that the database needs to be taught the structure of the resultant bit-stream, if that is possible, to enable queries to be posed.

7.4.2 Relational Database Products and Costs

There are many commercially available products with many different features, it is beyond the scope of this Guide to review this ever changing set of products.

The prices for commercial, relational databases differ from vendor to vendor and depend on the hardware platform on which the software is being run. A rule of thumb is that costs are split 50%-50% for software-hardware for systems totalling up to US $500,000 (1992 dollars) after which the hardware tends to dominate the costs. Of course, for the definitive costs, vendors are more than happy to provide price lists.

Advantages

A common interface to the data through SQL means that both users and programmers do not have to learn more than one access method no matter how many systems they work with.

It has been suggested that productivity using relational databases is improved 5-20 times over other approaches.

The other features including security and backup that come with relational databases means programmers can concentrate on building applications.

Sites have the ability to change vendors of their relational database software with minimal impact on their systems if they adhere to standards. Standards are now being set to allow multivendor database interoperability.

Establishment of better communications standards allow wide area networking of databases as well as full binary data support rather than being restricted to character codes.

Disadvantages

Installation costs can be substantial, and as such, every last feature provided must be used to give as cost efficient as possible implementation.

Planning and design work necessary for the introduction of a new database leads to significant new overheads. Furthermore existing applications must be converted to interact with the new data base and staff must be trained to use the new technology.

There are overheads in accessing data held in the new data base structures; generally these overheads take the form of additional computer hardware needed to host the operations. Additionally, support and maintenance becomes more complex with the more complex data base environment.

Users and management must understand the benefits of relational database technology. In particular, users may no longer feel that they individually own data they routinely use. It becomes a corporate entity in the new environment. Some users may resent this.

Meteorological applications tend to be resource hungry making data access times in the new data base environments appear slower than for a traditional file structure built to service a single application. This aspect is accentuated if hardware support for the data base is marginal and data volumes are increased substantially; for example, by doubling the resolution of a numerical model which feeds the data base. Thus, meteorology will always be tending to test both the limits and robustness of both the relational model and more importantly vendors software.

A quite sophisticated infrastructure of personnel needs to be established to support all of this new technology.

7.5 Storage of Quality Control Information in Meteorological Data Bases

In general it must be expected that meteorological data processing systems will generate Quality Control (Q/C) information, and require that the data base is designed so as to allow for it to be stored in such a fashion that it is logically associated with the original data. The two major options for meeting this requirement are either to store the Q/C information in the same database, or to store it separately, but with keys to associate it with the original data.

There is a continuum of Q/C of observational data: internal consistency checking of reports, checks against climatological values, checks against forecast (background) fields, checks against surrounding observations, of the same or differing types. Checking may generate substituted values, or even completely new (bogus) observations.

If meteorological processing of the data is performed within a database, the processing software should be written in a standard portable language, such as FORTRAN 77. Alternatively, Q/C could be performed on observational data by external processes and then allow the Q/C information to be stored within the database.

The application of Q/C by the climatologists may extend over a relatively long period (some data arrives at least a month late), and therefore, some data could be entered into the database at low priority, such as at weekends or overnight. Adding this data should not seriously affect the rapid availability of data, without comprehensive Q/C information, for forecasting purposes.

The Q/C applied by NWP systems is much more time critical - the bulk of it must be done within the context of a forecast suite (i.e. within 12, 6 or even 3 hours, depending on the numerical model). The Q/C information is typically 3 to 4 times greater in volume than that produced for conventional data. The NWP suites also use satellite data, in greatly increasing quantities.

The amount of satellite data is expected to grow drastically as various new satellites transfer from research to operational status. It seems likely that volumes could easily reach 300MB per day by the mid 1990s, even when using efficient modern binary codes for exchange and storage.

It may not be appropriate to store Q/C information in a separate database as the logical links might be difficult. Applications would need to interface to two databases, increasing the cost of software and its maintenance. The approach of automatically updating a second database regularly from the first, and have the second database contain both the original data and the Q/C information, would be considerably easier and has, in effect, been followed in existing systems even though this duplicates the storage of data.

Adding Q/C information can be construed as a risk to the integrity of the GTS data. Therefore extra safeguards, such as passwords and physically separate storage for this information, should be considered. As the requirements for security and integrity features increase, the complexity of implementing them increasingly favours the use of a proprietary DBMS.

Facilities for the monitoring of the flow of the observational and associated data should be incorporated into the database. Such statistics would include percentage failure of the Q/C checks, timeliness of arrival in the database, as well as computing resources consumed. Such statistics are essential for real-time usage and are also useful for climatological data. Such meteorologically orientated statistics are unlikely to be available in commercial databases.


CHAPTER 8

THE DISTRIBUTED DATA BASES CONCEPT

8.1 Introduction

At CBS IX the "Distributed Data Bases (DDBs) Concept" was endorsed and the Working group on Data Management tasked to further develop it, and to suggest an implementation plan, in order that emerging data management problems (felt to be caused by a lack of integration between the GTS, GDPS and GOS, and by the pressure of newly emerging technologies and user groups) could be tackled and solved. This plan is to be brought before CBS for further consideration. This chapter describes the evolution of the DDBs concept to the present time.

8.2 The Relationship of the DDBs to the existing WWW infra-structure

At the present time it is believed that the most efficient delivery of basic data (observations and products) in the WWW can best be supported by an improvement and enhancement of the data management functions within the GTS. These, together with the necessary upgrade of the transmission capacity of the GTS, will lead to improving the routine data distribution to meet the future basic WWW data exchange requirements.

Beyond the exchange of the basic data there is also a demand for the provision and exchange of new data to be facilitated by the WWW. On the one hand, it is expected that the WWW can play an important role in the provision of support services to other WMO Programmes, (for example, GCOS and GOOS) on the other hand, a vast amount of auxiliary data and reference data exists for which specific data sets are needed in irregular intervals for the direct support of WWW operations. Both requirements suggest that it is timely to consider the development of a flexible, modern DDBs infra-structure. The DDBs concept should meet the requirements for a data handling system for data needed in the WWW system but not routinely exchanged on the GTS, and support the new and emerging needs for data, including those which originate outside of the WWW system.

One way of meeting these needs is to make information available concerning the nature and location of a variety of meteorological and environmental data sets and develop specific purpose databases. Such databases would help centres to better respond to specific tasks within the WWW systems and also assist centres in tasks that are not necessarily central to the traditional WWW operations. The information should be made known to the meteorological community through the provision of the corresponding metadata (data about data), without necessarily the need to make these data sets themselves available on the same network (e.g., GTS) on which the metadata are obtainable. Depending on the nature of the data requirements, different means of communication and/or telecommunications are possible for sending the requests and replies.

Because the GTS is not well suited for the implementation of an any node-to-any node request/reply system by virtue of its predominant role to meet routine requirements, and because of its prevailing architecture, it is recommended that non-routine access to the DDBs be primarily via public networks (e.g., datalinks over the telephone network, public packet switched data networks or ISDN) so that the performance of the GTS is not degraded. This introduces a "user pays" mechanism which will also serve to protect the DDBs request/reply system itself from being monopolised by extravagant users.

Should the frequency and importance of obtaining certain data reach a point where it is economically justified and internationally agreed to use the GTS, then the communications links can be upgraded accordingly in a planned manner and such data transfers moved to the GTS.

Use of an open systems architecture and of public data network facilities, whilst providing greater flexibility and a higher level of availability than fixed dedicated circuits of the present closed GTS system, will make the system more vulnerable to unauthorised access than has been the case to date. It will be necessary to mandate minimum security measures to be implemented by participating Members. In addition to these "standard" minimum measures, Members may feel the need to implement additional security features. The implementation of security measures will not be costless.

8.3 Implementation of the DDBs

Figure 8.1 provides a schematic view of the structure of the DDBs and their relationship to the GTS. This logical description has the following features:

An upper "DDB plane", in which all databases are logically inter-connected;

On the upper plane there is any node-to-any node connectivity, with bulk data transfer arrangements in place, but whose operation would not diminish the efficiency of the "basic" GTS;

The upper plane could possibly be a "managed network";

Standardisation of upper plane database structures (possibly using the relational model);

There is a gateway between the DDB and RTH and NMC;

A lower layer which is the "basic" GTS;

The basic GTS continues to store and forward essential observational data and meteorological products between adjacent nodes, with improved communications, automation of routeing tables, streamlined headers, etc.

The need for a catalogue system in any new database is fundamental. Such a catalogue system will allow that participating databases can "know" what data is held by other databases. A database catalogue contains metadata. For a catalogue to be useful there must be ways of "browsing" a database's catalogue from a remote system.

Modern, relational databases are built using a series of tables which contain an in-built system of defining the relationships between tables. This table structure lends itself to building a catalogue, or table of metadata, to meet the DDB requirements.

8.4 A Possible Implementation Strategy

A strategy is emerging for achieving the implementation of the DDBs. In essence, the strategy is to identify those elements which must be present in the DDBs and commence to build them. It is currently anticipated that prototypes will be built first to be followed by limited implementations between willing volunteers after this, and finally after consideration by CBS, adoption of the successful techniques and technologies as WMO standards. Application of such a strategy as this requires careful selection of the first demonstration implementations. Later, the operational implementation of the DDBs could be done following one of two basic models:

(a) By further development of the database systems within the existing communications and computing infra-structure of Members, to agreed standards, to form a loosely coupled (all peer) set of WWW DDB;

(b) By implementing a separate database system utilising a commercial database management system, possibly running on a standardised hardware platform (e.g., a moderate sized UNIX machine and/or application of Structured Query Language (SQL), as part of a managed network.

The second option could offer considerable savings in software development costs and may harmonise well with the plans of some centres for the upgrading or further development of their national systems. Furthermore, modern database systems are available that can handle large binary objects and are, therefore, potentially suitable for WMO DDB applications, particularly if in the longer term the DDB are to provide back-up facilities for major centres.

8.4.1 A Possible Prototype Element of the Set of DDBs

It is generally assumed that any specific DDB consists of data and/or metadata residing at a specific location responsible for this data and of the pertinent mechanisms for accessing and retrieving the data from outside of this specific DDB location. The first prototypes will ideally use similar database management systems and employ the same access methods via the same type of communication system. If possible, there will be some complementarity of functionality so that Members gain more from the coexistence of a number of databases than if each database were a separate, distinct entity.

Possible prototype DDBs can be grouped into the following three categories:

Category I Data and/or metadata germane to routine WWW operations; requests should be possible for the metadata, the data itself, or both and could be responded to via the GTS or by other communications or telecommunications means, as appropriate.

Category II Auxiliary data and related metadata needed for supporting specific tasks within the WWW system and beyond; this type of data, and corresponding metadata may reside in or outside of a WWW center; requests should be possible for the metadata, the data itself, or both, and could be responded to via the GTS or by other means, as appropriate.

Category III Metadata (only) describing data that exist in WWW centres or in other centres which are of importance to the meteorological community in general; requests should be made mainly outside of the GTS (i.e., using non-GTS means); the exchange of the metadata should also not use the GTS, but other means including, for example, common mail services.

Table 8.1 contains examples of data sets and/or metadata which would typically fit into the DDB concept. Some of these sets are already in existence in a suitable form, others will have to be generated or adapted.

 

Table 8.1 - Examples of data sets to be considered for ad hoc exchange using DDBs: (NOT AVAILABLE)


Category Data sets

_____________________________________________________________ __

I WWW data and/or metadata:

______________________________________________________________ __

II Auxiliary data sets

______________________________________________________________ __

III Metadata

______________________________________________________________ __

Any DDB constructed should have the potential to serve both the WWW operations and also users outside of the WWW system, in a similar way as the WWW system itself provides basic services to other WMO Programmes. Users in meteorological centres would benefit from gaining access to information of the types listed in Table 8.1, which could be provided in DDB through their centres' communications system, and which would not be conveniently available through other means. Centres participating in the DDB services would have to dedicate resources towards the implementation of such services, e.g., creation/adaptation of locally maintained data sets, access mechanisms to data sets residing outside, and equipment and services that enable local users to access the data sets. Before any resources could be committed, it is important to establish a full understanding of the ultimate requirements of the users and the benefits to be gained. The Working Group on Data Management is working to further identify requirements and volunteers to establish specified data bases.

8.5 The Role of the DDBs in Data Collection

The DDBs concept has been developed to provide improved data access to users who have the following characteristics:

They do not necessarily need the data in real-time;

They have access to the modern communications and computer systems necessary to handle large volumes of data quickly;

They do not necessarily need the data (or metadata) routinely, but possibly on an ad-hoc basis.

It is believed that the concept also provides potential advantages to those undertaking data collection associated with special environmental monitoring projects which are at least regional and possibly global in scale.

Currently data can be ingested into the WWW real time system at any site where there is a telex, or some computer system able to communicate with a Regional Telecommunications Hub (RTH). This data insertion can be done through slow speed lines (e.g. 50 baud) or by high speed data links. Traditionally observations were transmitted to RTHs at low speed (via telex) and collected into bulletins for exchange on the GTS.

More recently satellite collection of data (from DCPs) has led to the substantial accumulation of bulletins of data at the sites of high powered computer systems. Clearly it would be possible to establish elements of the DDBs at these sites as well as undertake the more traditional insertion of these data into the GTS as collectives.

 

Figure 8.1 - A Schematic View of the Structure of the DDBs and their Relationship to the Existing GTS Infra-structure (Not Available)


CHAPTER 9

DATA MONITORING

9.1 Introduction

For a complex system, such as the WWW, the achievement and maintenance of a high level of operational efficiency and effectiveness will only occur if there is an ongoing program of system monitoring, and if the results from this monitoring are taken into consideration by those responsible for the various components of the system. Within the WWW there are two distinct aspects of data monitoring - data quantity monitoring and data quality monitoring. Data quantity monitoring is concerned with the timely collection and receipt of observations and products, while data quality monitoring is concerned with the accuracy of the observations and products. Because the WWW is fundamentally reliant on the quality and quantity of data available within the system, data monitoring is a crucial activity.

To achieve success, the WWW needs to:

Provide sufficient observational data of acceptably high quality, collected in a timely manner;

Provide data transport facilities, within appropriate time frames, to enable observational data to be made available both to direct users and to centres responsible for the generation of products;

Generate products (analyses, forecasts, climatology, archives, etc.), within appropriate time frames, sufficient for the needs of the WWW;

Provide data transport facilities, within appropriate time frames, for the dissemination of the generated products to the users;

Generate, accumulate, and where appropriate, exchange metadata concerning the accuracy, timeliness, efficiency, and completeness of the critical components of the system listed above.

The performance of the WWW depends on the quality of the observational data, the efficiency and completeness of the data generation, collection and transport, and the quality of the generated products. Data monitoring is a means of assessing these factors of performance; the design of data monitoring procedures should aim both to identify possible deficiencies with respect to quantity and quality, and to indicate how those deficiencies might be rectified.

The role of Data Management with respect to data monitoring is that of an integrating function. Certain aspects of data monitoring need to be defined, and/or performed within the context of the GOS, the GDPS, and the GTS. All aspects of data management need to be co-ordinated within the context of the over-all WWW system. It is the function of DM to ensure that such co- ordination is achieved. This requires that:

The appropriate metadata be identified;

The need for data monitoring procedures are sufficiently defined for the working groups responsible to act;

The procedures defined by the working groups are consistent and sufficient;

That requirements of working groups on other working groups are sufficiently defined and can reasonably be met;

Data representation forms suitable for the transport and storage of monitoring information are developed;

Sufficient arrangements are made for the post-processing of monitoring information to ensure that the WWW systems efficiency is understood, areas of weakness identified, and remedial action undertaken where feasible.

The actual definition of procedures, and their implementation is primarily the responsibility of the working group concerned, taking into account advice and suggestions from the Working Group on Data Management, using the appropriate representation forms to represent monitoring data produced, and conveying to the Working Group on Data Management what co-operation and assistance are necessary from the other working groups.

9.2 The Identification of Errors and Deficiencies within the WWW

Table 9.1 illustrates some of the major WWW functions performed by components of the GOS, the GDPS, and the GTS. Data deficiencies may result either from data of poor quality, and/or from the unavailability of potentially useful data. The study of these two forms of deficiency are referred to as "data quality monitoring", and "data quantity monitoring" respectively.

Some likely causes of deficiencies with respect to data quality and quantity resulting from the major WWW functions are listed in Table 9.1 below.

It is potentially possible to identify and rectify many of these deficiencies. The monitoring required to achieve this entails the expenditure of resources to examine data, compile various statistics and system performance information (trace information, quantitative difference statistics, bias assessment, quality indicators, etc.). Some assessment can and should be performed in near-real time, whereas other evaluations can only be accomplished after the gathering of sufficient data over a long period. The aim must be continually to endeavour to develop the most cost effective means of monitoring data as near as possible to the beginning of their life cycle without the resulting processes causing excessive delay to their dissemination within acceptable time frames.

Table 9.1 - Possible source of WWW data deficiencies, as they occur in the various WWW components observing, collecting, etc.

OBSERVING
Quality Quantity
- instrumentation error - observation not made
- not made at right time
- encoding error - measuring error
- made but unable to send
COLLECTING
Quality Quantity
- formatting error - not received
- detectable validation error - received late
- not transmitted
- transmitted late
TRANSMISSION
Quality Quantity
- formatting error - not received
- received late
- not transmitted
- transmitted late
- not correctly routed
RECEPTION
Quality Quantity
- formatting error - not received
- transmission error - received late
PRE-PROCESSING
Quality Quantity
- formatting error - not received
- detectable validation error - received late
EVALUATION
Quality
- detectable validation error
- mutual inconsistency
- temporal inconsistency
- bias
- forecast error

 

9.3 The Role of the GOS

The maintenance and support of an observational system requires considerable investment. The objective of data monitoring within the GOS is to endeavour to ensure that such investment leads to a maximum return in terms of the value of the observations produced. Observational data are reduced in value:

The identification and rectification of such deficiencies, where possible, as part of the observing process is particularly effective, as it may avoid the need for remedial action at a later stage.

9.3.1 Quality of Data

It is a function of the GOS to define:

Guidance material, drawn up by the GOS, concerning the above should also reflect and take account of any expressed needs of the users of the data.

9.3.2 Quantity of Data

The collection of observational data, checking, monitoring of the completeness of reception, and subsequent assembly into suitable messages for further distribution form a time critical path within the WWW data flow. The GOS responsibility is to define:

Acceptable procedures for the timely completion of these processes;

The compilation, updating, collection and, as appropriate, exchange of metadata recording the progress of these processes;

Appropriate means to take action when non-receipt of data is detected.

It is not possible to detect and address deficiencies in the timely distribution of data by monitoring the data transport performance of the GTS alone; some delays may occur in the collection process before data are passed to the GTS for onward transmission.

9.4 The Role of the GTS

Efficient data transport is essential for an effective WWW. The first priority of the GTS is to disseminate the observational data generated within the GOS, together with such vital information as warnings of severe weather. A further important function is the dissemination of appropriate products generated within the GDPS.

Within the GTS, data are handled as collective entities termed "meteorological messages", conforming to GTS defined standards. It is not a function of the GTS to be concerned with the data within such messages or bulletins, excepting such information as may be necessary for message recognition. In consequence, monitoring within the GTS must be in terms of messages rather than in terms of the individual data items contained therein.

9.4.1 Quality of Data

It is a function of the GTS to ensure that data disseminated are correctly formatted, both before and after transmission, and to develop procedures to ensure that this is done.

9.4.2 Quantity of Data

Messages passed to the GTS for dissemination need to be routed to appropriate recipients, and delivered as soon as possible; some messages are of more importance than others, and this order of importance is reflected in the GTS system of allocation of priorities.

To ensure that deficiencies can be identified and remedied, the GTS needs to develop:

Appropriate systems to monitor performance;

The collection and exchange of performance metadata;

Methods to ensure that routing information is kept up-to-date, and regularly checked.

9.5 The Role of the GDPS

The GDPS depends on the timely delivery of sufficient observational data of an acceptable quality as input to those processes which generate analysis and forecast products. The processing of the observational data results in quality related information which may often be significant in the early detection of problems not easily detected as part of the functions of the GOS. Other quality information is significant in the longer term, indicating the existence of systematic bias in either observational data or products.

9.5.1 Quality of Data

The GDPS in conjunction with CBS has developed a system of quality monitoring for observational data based on monitoring statistics generated or co-ordinated by lead centres for various types of data.

It is the function of the GDPS to define how these procedures can be further developed and improved. Feedback of data monitoring results to the data producer is considered to be of prime importance. In particular, it is believed that, whereas currently feedback to originators in delayed mode has proved to be effective, there might be a case for some feedback to take place in near-real-time.

A further function of the GDPS is to monitor the quality of the products they produce, together with the quality of products received from other centres.

9.5.2 Quantity of Data

Within the data transport mechanism of the GTS data are handled in terms of messages. Messages may be delivered, but they may contain many observations which are "nil". Thus, it is only at the level of the GDPS that the observational contents of messages can be fully monitored.

It should be a function of the GDPS to compile, assemble, collect and, where appropriate, exchange suitable metadata to enable monitoring at the level of the individual observations and products to be achieved. To enable such monitoring to be effective it may be necessary to retain within the metadata information concerning the message in which the observation or product was found, and the time of receipt of that message.

9.6 The Role of Data Management

As an integrating component, the first priority for Data Management is to monitor the monitoring procedures developed within the other CBS Working Groups. Continual liaison is necessary to ensure that the Working Group on Data Management is aware of the needs, developments, and potential for further development of monitoring procedures within the other working groups, and is able to suggest suitable data management techniques to assist in the development of sound monitoring procedures.

Data Management is also responsible for the development of data representation forms, and must respond to the need to represent data appropriate to monitoring techniques.

Since some of the monitoring information cannot be assessed purely within the scope of any single working group, there is a case for the Working Group on Data Management, together with the WMO Secretariat, to be directly involved in the evaluation of monitoring results.


CHAPTER 10

COMPUTER SOFTWARE EXCHANGE

10.1 Introduction

The improvement of data management techniques in the WWW is largely built around the creation and implementation of computer based systems. These systems provide greater functionality than was previously possible, provide access to data and information not available through manual means, and at the same time ensure that the integrity of these data and information is preserved.

The types of data and products that can only be generated, handled, pre-processed, or displayed by means of computers include:

  1. Handling of WMO binary codes (GRIB and BUFR);

  2. Performing conversions between binary formats and character codes or between different representations of graphical data (e.g., T4-coded facsimile, CGM, raster graphics, vector graphics, etc.);

  3. Receiving and using satellite images;

  4. Participating in satellite based point-to-multipoint broadcast services;

  5. Accessing and using data generated by automated observing systems;

  6. Receiving and using numerical weather prediction products and generating value added products;

  7. Generating forecast model products.

In order to meet the data management needs associated with these data and information WWW DM maintains a registry of software Members are willing to make generally available to the meteorological community. To ensure that the software offered are of maximum utility to Members WWW DM must also work to advise on the most appropriate standards to which software should be developed and the techniques for doing so.

10.2 Software Exchange

10.2.1 The Concept

In order to improve the computerised data management capabilities of Members WMO has undertaken a number of initiatives aimed at providing various kinds of assistance in form of donations of computer hardware and applications packages, such as the MSS (Meteorological Messages Switching), graphical display software and dedicated training under the SHARE programme (Software Help, Applications, Research and Education Programme). These activities are being carried out as VCP/UNDP projects. Support of this type is rather demanding in terms of organizational requirements, engineering support and, above all, financial resources; accordingly only a relatively small number of countries can benefit from these projects in any given financial planning cycle of WMO.

Countries that are in the process of automating their meteorological services have a steadily growing requirement for meteorological applications software to meet newly emerging requests for products in their own country, and to stay abreast with procedural changes agreed by WMO (such as changes in codes or amendments in telecommunication procedures). Some have recently acquired the capability of handling the X.25 telecommunication protocol for data exchange on the GTS and wish to receive data formatted in GRIB and BUFR. It is difficult for these centers to obtain the necessary data handling software for processing binary-formatted data.

It is, therefore, important that WMO activates other resources to help Members in acquiring software and related technical assistance. CBS felt that it may be a promising and cost-effective way to encourage Members to exchange software that is already available in meteorological centers. This approach raises a number of questions pertaining to software compatibility and portability.

Three trends in computer applications will make the porting of software between different computer centers simpler. Firstly, standardization in the meteorological community is leading to the introduction of common data formats, data handling and telecommunications procedures. Many basic functions performed by computer programs in meteorological centers are essentially very similar or even identical. This is particularly obvious for real-time functions such as handling of WMO-formatted messages, handling of WMO-coded reports, plotting of station-model and contour charts and for a wide range of non-real-time data management functions. Suitable programs are available in meteorological centers of most developed countries. Secondly, on-going efforts in WMO in developing WMO-agreed standards for software designing, programming techniques and software documentation will gradually alleviate the level of incompatibility in meteorological computer solutions. Thirdly, international and industry standards spread aggressively and are now more readily accepted in the software laboratories of meteorological services than some years ago.

In addition, many computer vendors offer in their line of products conversion programs for converting a program source code from one industry-specific format to another, such as from an IBM FORTRAN dialect to a Digital Equipment FORTRAN dialect. These type of tools reduce the amount of work required to adapt computer programs from other sources.

All this lessens hardware-induced incompatibilities of computer programs and make it possible for Members to contemplate adapting other centers' operational software for use in their own operations vis-a-vis "in-house" development or commercial procurement.

10.2.2 The Objectives of WWW Software Exchange

After technical surveys by its working groups, CBS-Ext.(90) requested that the WMO Secretariat organize an exchange of computer software among WMO Members.

The objectives of the CBS Software Exchange are to:

  1. Strengthen the WWW system by making more readily available a wider range of suitable meteorological software to Members for their own application;

  2. Improve the self-sufficiency of evolving computer centers of developing countries by encouraging such Members to participate actively in the exchange of application software;

  3. Assist in spreading well-proven software packages and standardized software techniques in the meteorological community;

  4. Provide an overview of meteorological applications software offered and requested by WMO Members;

  5. Assist in planning WMO co-ordinated computer projects by connoting the most prominent and most common requests for software and/or computer support.

10.2.3 The Software Registry

The WMO Secretariat has established a catalogue containing information on offered and requested computer software collected from Members, and will publish an updated edition on a yearly basis. The first edition of this catalogue, called CBS Software Registry, was distributed to all Members in January 1991.

In order to provide a manageable structure for the exchange a framework of categories has been established under which the individual computer programs are grouped. To this end, both the range of computer hardware and the various areas of meteorological software application are cataloged in simplified categories. These sets of categories serve as coarse guidelines for basic comparability and classification of the computer programs.

The computer hardware has been grouped in five categories:

  1. Personal Computer;

  2. Advanced Graphical Workstation;

  3. Large Minicomputer;

  4. Large Mainframe Computer;

  5. Supercomputer.

The computer programs have been grouped in ten categories. They are:

  1. Process Control Programs (automatic scheduler programs to control routine production depending on external parameters such as date, time, data availability, etc.);

  2. Message Switching (programs for meteorological message switching for WMO-formatted messages or AFTN messages, dis-assembly and assembly of bulletins, serving teletype and data lines);

  3. Pre-processing and Data Handling (e.g., programs for decoding, code transformation binary <-> character codes, quality control, monitoring of data availability);

  4. Post-processing (e.g., programs for graphical representation/ visualization of data and products on printer, plotter, hard copy, graphical VDUs);

  5. Numerical Analysis Programs (e.g., programs for the generation of a gridded field of a parameter reported at irregularly distributed locations);

  6. Numerical Forecast Models (e.g., hemispherical models, window models, mesoscale and sub-mesoscale models, or models for specific forecasting tasks, mainly for short-range forecasting);

  7. Other Objective Techniques (e.g., statistical evaluation of model output);
  8. Long-term Archiving of Numerical Results (e.g., programs for packing/unpacking and storage/retrieval of numerical fields, perhaps using GRIB as storage format);

  9. Expert Systems (Computer programs that use stored information to draw conclusions about a particular case);

  10. Long-term Archiving of Observations e.g., programs for packing/unpacking and storage/retrieval of observational data, perhaps using BUFR as storage format, or SQL).

CBS decided that any exchange of software or provision of related support in the framework of this project, should be arranged as bilateral (multilateral) co-operation between donor country and recipient countries. The activities carried out in the framework of the CBS Software Exchange will be monitored by the CBS Working Group on Data Management, which will develop recommendations for its improvement for consideration by CBS, as appropriate.

The degree to which members have not used software offered in the registry indicates the strength of the forces working against standardisation. In essence these forces are:

While the weather services of Members participate in the WWW, they also meet unique national requirements which call for specialised data management and processing systems;

If the resources are available there are significant long-term advantages in the preparation of software within the centre using it - specifically, program faults can be more rapidly rectified and additions/alterations to meet changing requirements can be quickly accommodated;

As noted previously, much software is hardware specific, and members are required to employ hardware which can be supported locally.

Clearly if Members offer software which is:-

Modular in nature;

Written to internationally accepted, software development standards;

Well documented and supported by training seminars to develop user skills;

Written in such a fashion as to the "vendor" independent, and;

Meets well defined, widespread needs.

Then it is expected that the take-up rate of generally available software packages will increase. The issue to be addressed by WWW DM is; how to bring about this situation.

10.3 Software Standards

The exchange of software is assisted if the computer programs have been developed to widely accepted programming standards. At the present time a set of such standards is not used widely within the meteorological community. Gibson [1] has provided an example of such a system, named the DOCTOR programming system. It is worthwhile reviewing the aims of this system as it provides an indication of the advantages of adoption of programming standards.

DOCTOR attempts to:

Provide well presented code;

Produce source code following a standard structure;

Set up points of reference for external documentation;

Enable the inclusion within the source code of documentation which can be extracted mechanically;

Allow maximum communication between routines by storing universal variables in structured pools or common blocks;

Facilitate the recognition of variable types, and the differentiation between local variables, variables in common blocks, and dummy arguments to routines;

Provide a set of utility routines for copying vectors, resetting arrays, etc.

There are two elements to the production of computer software that can be widely used outside the laboratory in which it was generated; good documentation and the adherence to accepted coding conventions.

Gibson notes that documentation is a means of retaining the ability of code to be understood. The production of documentation is a skill at least as important as the skill required to design and generate the code itself. Good documentation increases the value of code - it assists maintenance, aids understanding, and can be invaluable if language to language re-coding should ever be necessary.

With respect to coding conventions Gibson notes that code should follow a modular structure, each module or routine fulfilling a stated purpose. Modules should be divided into numbered sections and sub-sections. Communication between modules should not involve long parameter lists - shared data should be made available through shared data pools, common blocks, etc.

Despite the apparent desirability of adopting programming standards, acceptance of such standards as advocated by Gibson has not been widespread. It is difficult to specify precisely the reasons for this, but possibly the adherence to a general set of programming guidelines is felt to limit a programmers ability to optimise code for specific environments. Perhaps also programmers do not see sufficient return for the "up-front" investment of mastering, and then systematically applying a set of rules. Whatever the reason for non-adherence to standards, there is an emerging alternative in the form of Computer-Aided Software Engineering (CASE).

10.3.1 Computer-Aided Software Engineering (CASE)

The problems faced by the meteorological community in being able to prepare and maintain the software base for a rapibly expanding, worldwide computer-based infrastructure are not unique. One possible emerging solution is CASE. There are now available a number of CASE products which offer integrated tools for the planning, analysis, design and generation of computer code for major software systems. These so-called CASE tools potentially offer increased programmer productivity given that a substantial initial investment has been made in purchasing, and mastering, the tools.

It is relatively early days in the development and implementation of CASE tools. Their usage is being explored in a number of weather services, and should they become become proven, then it is to be expected that the skill base of programmers trained to use them will become broader, the tools more cost effective and powerful and as a consequence, their use more widespread.


CHAPTER 11

ENDNOTE

1. Gibson J.K., The DOCTOR System - A DOCumenTary ORiented Programming System, ECMWF Technical Memorandum No. 52


LIST OF ACRONYMS

AIREP Colloquial name for reports from aircraft
AMDAR Colloquial name for code form FM 42 - IX Ext.
ANSI American National Standards Institute
ARFOR Colloquial name for code form FM 53 - IX
ASAP Automatic Shipboard Aerological Programme
ASDAR Aircraft-to-Satellite Data Relay
BTAB BUFR Tabular Format (under development)
BUFR FM 94-IX-BUFR Binary Universal Form for the Representation of meteorological data
CAD/CAM Computer Aided Design/Computer Aided Manufacture
CAeM Commission for Aeronautical Meteorology
CAVOK Ceiling And Visibility OK
CBS Commission for Basic Systems
CBS-X Ninth session of the Commission for Basic Systems
CCITT Consultative Committee for International Telephony and Telegraphy
Cg-X Tenth session of WMO Congress
Cg-XI Eleventh session of WMO Congress
CGI Computer Graphics Interface
CGM Computer Graphics Metafile
CLIMAT Colloquial name for code form FM 71 - VI
CSM Commission for Synoptic Meteorology
DDB Distributed Databases
DIS Draft International Standard
DM Data Management
DP Draft Proposed International Standard
FTAM File Transfer and Management
GDPS Global Data-processing System
GKS Graphics Kernel System
GKSM Graphics Kernel System Metafile
GFAF Colloquial name for code form FM 49 - IX Ext.
GOS Global Observing System
GRID Colloquial name for code form FM 47 - IX Ext.
GRIB FM 92-VII Ext-GRIB (Gridded Binary) processed data in the form of grid-point values expressed in binary form
GTS Global Telecommunication System
IAC Colloquial name for code form FM 54 - IV
ICAO International Civil Aviation Organization
IS International Standard
ISO International Standards Organisation
ITU International Telecommunications Union
Lidar Light induced detection and ranging system
McIDAS Man-computer Interactive Data Access System
MAGICS Meteorological Applications Graphics Integrated Colour System
METAR Colloquial name for code form FM 15 - IX Ext.
MMS Marine Meteorological Services
MTN Main Telecommunication Network
NCAR National Centre for Atmospheric Research
NFS Network File System
NMC National Meteorological Centre
NURBS Non-Uniform Rational Beta Splines
NWP Numerical Weather Prediction
OSF Open Software Foundation
OSI Open Systems Interconnection
PEX PHIGS Extensions to X-Windows
PHIGS Programmer's Hierarchical Interactive Graphics System
RADOF Colloquial name for code form FM 57 - IX Ext.
RADREP Colloquial name for code form FM 22 - IX Ext.
ROFOR Colloquial name for code form FM 54 - IX Ext.
RSMC Regional Specialized Meteorological Centre
SARAD Colloquial name for code form FM 87 - VIII Ext.
SATEM Colloquial name for code form FM 86 - VIII Ext.
SATOB Colloquial name for code form FM 88 - VI Ext.
SFAZI Colloquial name for code form FM 81 - I
SLTP WMO Second Long-term Plan
Sodar Sound detection and ranging system
SPECI Colloquial name for code form FM 16 - IX Ext.
SYNOP Colloquial name for code form FM 12 - IX
TAF Colloquial name for code form FM 51 - IX
TEMP Colloquial name for code form FM 35 - IX
TLTP WMO Third Long-term Plan
UI Unix International
VDI Virtual Device Interface
VDM Virtual Device Metafile
WINTEM Colloquial name for code form FM 50 - VIII Ext.
WGDM CBS Working Group on Data Management
WGDM/SGC CBS Working Group on Data Management/Sub-group on Codes
WGDM/SGDR CBS Working Group on Data Management/Sub-group on Data Representation
WMC World Meteorological Centre
WMO World Meteorological Organisation
WWW World Weather Watch
WWWDM World Weather Watch Data Management
X X-Windows