Guide to FM-94 BUFR (Chapters 4-6)

PART I -

A GUIDE TO THE CODE FORM FM-94 BUFR

TABLE OF CONTENTS

Introduction
Chapter 1. Sections of a BUFR message
Chapter 2. BUFR tables
Chapter 3. Using data replication
Chapter 4. Data compression
Chapter 5. Table c - data description operators
Chapter 6. Quirks, advanced features, and special uses of BUFR

CHAPTER 4

Data Compression

4.1 Introduction. Even though BUFR makes efficient use of space by virtue of binary numbers that take only as many bits as are necessary to hold the largest expected value, a further compression may be possible.

4.2 Method Used for Data Compression. The method employed by BUFR for data compression is similar to that used in the WMO Code FM 92 GRIB (GRidded Binary fields). Like elements from the full set of observations are collected together, their minimum values subtracted out, and the difference from the minimum are then encoded with a bit length selected to hold the largest difference from the minimum value. This is repeated for all the elements.

Using the following group of identically defined data subsets:

	station	station	pressure	temperature	dew point
	number	height
subset 1	101	296	10132	122	110
subset 2	103	291	10122	121	110
subset 3	107	310	10050	105	099
subset 4	112	295	missing	110	102
subset 5	114	350	10055	095	089
subset 6	116	325	10075	101	091

Extraction of the minimum value of each element gives:

101 291 10050 095 089

Each value can now be represented as the difference from these minimum values:

	station	station	pressure	temperature	dew point
	number	height
subset 1	0	5	82	27	21
subset 2	2	0	72	26	21
subset 3	5	19	0	10	10
subset 4	11	4	missing	15	13
subset 5	13	59	5	0	0
subset 6	15	34	25	6	2

After each difference from the minimum value has been determined for each element, determine the number of bits necessary to store the largest of the difference values for each element. For the station number the largest difference is 15 which is equivalent to 1111₂, or 4 bits. However this presents a small problem. All four bits set on, as is the case for the number 15, is properly interpreted as "missing", not as a numeric value of 15. What is done is to simply add one bit to the number needed to store the largest difference value; thus 15 gets stored in 5 bits, as 01111. It is not necessary to add one bit to the bit lengths for all the elements; it is only necessary when one of the numbers to be encoded "fills" the available space; that is, if the number is 3 to be stored in 2 bits, 7 in 3 bits, 15 in 4 bits, 31 in 5 bits, etc. A convenient way to do this and assure that there is always room for "missings" (if needed) is to add 1 to the largest difference value and figure the number of bits based on this larger-by-one value.

In the example, the station height would be placed in 6 bits; the pressure in 7 (with the "missing" indicated as 1111111), etc., as in the following table:

	station	station	pressure	temperature	dew point
	number	height
Largest difference value +1	16	60	83	28	22
Number of bits	5	6	7	5	5

Whereas in the non-compressed storage of data in Section 4 there is a continuous bit stream for all parameters for an entire observation, in the compressed form all elements of the same parameter from each observation form a continuous stream (Figure 4-1). In order to determine what the minimum value is that has to be added back to each of the following elements, and how many bits are being used for the storage of these elements, there are two additional items appearing in the compressed form of storage in Section 4 that do not appear in the non-compressed form.

These items are:

(1) the minimum value of this parameter and,

(2) the number of bits that are being used for the storage of each element.

These items of information precede the element values. The Section 4 representation for compressed data for each parameter used in the example above is:

Station number minimum value (101) occupying 10 bits as specified by the Table B data width for entry 0 01 002 followed by:

6 bits containing the count in bits (5) that each of the station numbers will occupy, followed by:

The 6 station number differences from the minimum values (0, 2, 5, 11, 13 and 15), where each value occupies 5 bits.

                           Section 4 data non-compressed
+------------------------------------------------------------------------------+
¦                                                                              ¦
¦parameter 1,parameter 2,..parameter n   parameter 1,parameter 2,..parameter n ¦
¦¦                                   ¦  ¦                                     ¦¦
¦+-----------------------------------+  +-------------------------------------+¦
¦          observation 1                         observation 2                 ¦
¦                                                                              ¦
+------------------------------------------------------------------------------+

                            Section 4 data compressed
+-----------------------------------------------------------------------------+
¦                                                                             ¦
¦minimum                              minimum                                 ¦
¦  value, bit count, parameter 1,...    value, bit count, parameter 2,...     ¦
¦¦                                  ¦ ¦                                  ¦    ¦
¦+----------------------------------+ +----------------------------------+    ¦
¦   observation 1,...observation n       observation 1,...observation n       ¦
+-----------------------------------------------------------------------------+

Figure 4-1. Comparison of non-compressed and compressed data in Section 4

After the last station number difference (15), the next 15 bits (Table B data width for entry 0 07 001) will be taken by the minimum value for station height (291) followed by the count of bits to represent the differences (6) and then each of the elements occupying 6 bits apiece (5, 0, 19, 4, 59, 34).

Continuing the process for all 5 parameters would produce within Section 4 the following bit counts:

                       station   station
                       number    height    pressure  temperature  dew point
Table B descriptor     0 01 002  0 07 001  0 10 004    0 12 004    0 12 006 
Data width to contain
minimum value             10        15        14          12          12

6 bits containing
bit count of parameter     6         6         6           6           6 

Total bits preceding
each parameter            16        21        20          18          18

data width to   
represent difference
from minimum               5         6         7           5           5

compressed data       
representation for    
6 subsets                 30        36        42          30          30

total bit count for 6
subsets including
compression bit counts    46    +   57   +    62     +    48     +    48
                                                                     = 261

261 bits are necessary to represent all 6 subsets in compressed form in Section 4.

Using the same set of values for the 6 subsets in non-compressed form there would be bit counts in Section 4 as follows:

                      station   station 
                       number    height    pressure  temperature  dew point
Table B descriptor
data width                10        15        14          12          12

total bit count 
for 6 subsets             60    +   90   +    84     +    72      +   72
                                                                     = 378

A total of 378 bits are necessary to represent all 6 subsets in non-compressed form.

There are other conditions that can occur when encoding compressed data. If all elements of a set of parameters are missing, the minimum value occupying the specified Table B data width in Section 4 shall be set to all 1's, the 6 bits specifying how many bits are used for each value will be set to 0, and the difference values will be omitted. If, for example all the dew points were missing from the 6 subsets then the number of bits to represent dew point would be reduced to only include the Table B data width for dew point (12 bits) and the 6 bits specifying the bits used for each value.

                      station   station
                       number    height    pressure  temperature  dew point
Table B descriptor     0 01 002  0 07 001  0 10 004    0 12 004    0 12 006 

data width to contain
minimum value             10        15        14          12          12

6 bits containing
bit count parameter 
will occupy                6         6         6           6           6

Total bits preceding
each parameter            16        21        20          18          18

compressed data 
(difference from  
minimum)                   5         6         7           5           0

compressed data       
representation for    
6 subsets                 30        36        42          30           0

total bit count for 6
subsets including
compression identifiers   46    +   57   +    62     +    48     +    18
                                                                      = 231

In the non-compressed form, storage of the missing dew point values would still occupy 12 bits each, with all bits set to 1.

                      station   station 
                       number    height    pressure  temperature  dew point
Table B descriptor
data width                10        15        14          12          12

total bit count 
for 6 subsets             60    +   90   +    84     +    72      +   72
                                                                      = 378

The other condition that may occur is if all the difference values are identical, then, the 6 bits specifying the count of bits for each difference value will set to 0, and difference values will be omitted. This condition would produce the same bit count as if all elements were missing.

Set of parameters missing:

minimum value occupying number of bits as indicated in

Table B set to all 1's

6 bits specifying how many bits are used for each value

set to 0

difference values omitted

Set of identical parameters:

minimum value occupying number of bits as indicated in

Table B set to minimum value (actual value for all

parameters)

6 bits specifying how many bits are used for each value

set to 0

difference values omitted

Data compression is most effective when the range of values for the parameters is small. In the example of the 6 subsets, each parameter has a difference from the minimum value, where the number of bits to represent the difference is half, or less than half, the number of bits required in non-compressed form for storage in Section 4, as indicated by the Table B entry data width. If the 6 subsets were put into a message where compression was not applied, the length of the message would be 100 octets (Figure 4-2). By applying compression, the length of the message would be reduced to 86 octets (Figure 4-3).

Using the range of values for the same 6 subsets, not realistic, but to show the effect of compression for a large data set, a total of 4267 subsets could be put into a BUFR message not exceeding 15000 octets (Figure 4-5). In non-compressed form there would only be 1898 subsets within the 15000 octet limit (Figure 4-4).

               Section     Octet in   Encoded
               Octet No.   Message    Value     Description
Section 0 
(indicator       1-4         1-4       BUFR   encoded international CCITT
section)                                      Alphabet No. 5
                 5-7         5-7        100   total message length (octets)
                   8           8          2   BUFR edition number Section 1
(identification  1-3         9-11        18   length of section (octets)
section)           4           12         0   BUFR master table
                 5-6        13-14        58   originator (U.S. Navy - FNOC)
                   7           15         0   update sequence number
                   8           16         0   indicator for no Section 2  
                   9           17         0   Table A - surface land data
                  10           18         0   BUFR message sub-type
                  11           19         2   version number of master
                                              tables
                  12           20         0   version number of local
                                              tables
                  13           21        92   year of century
                  14           22         4   month
                  15           23        18   day
                  16           24         0   hour
                  17           25         0   minute
                  18           26         0   reserved for local use by ADP
                                              centers (also needed to
                                              complete even number octets
                                              for section
Section 3
(Data            1-3        27-29        18   length of section (octets)
description        4           30         0   reserved 
section)         5-6        31-32         6   number of data subsets
                   7           33   bit 1=1   flag indicating observed data
                                    bit 2=0   flag indicating no
                                              compression
                8-17        34-43  0 01 002   WMO station no.          
                                   0 07 001   height of station
                                   0 10 004   pressure
                                   0 12 004   temperature
                                   0 12 006   dew point
                 18           44          0   needed to complete section
                                              with an even number of octets
Section 4
(Data            1-3        45-47        52   length of section (octets)
section)           4           48         0   reserved
                5-52        49-96      data   continuous bit stream of data
                                              for 6 subsets, 63 bits per 
                                              subset plus 6 bits to end on 
                                              even octet
Section 5
(End section)    1-4        97-100     7777   encoded CCITT international 
                                              Alphabet No. 5

Figure 4-2.    BUFR message of 6 subsets in non-compressed form

               Section    Octet in   Encoded
              Octet No.   Message    Value     Description
Section 0 
(indicator       1-4         1-4       BUFR   encoded international CCITT
section)                                      Alphabet No. 5
                 5-7         5-7         86   total length of message (octets)
                   8           8          2   BUFR edition number
Section 1
(identification  1-3         9-11        18   length of section (octets)
section)           4           12         0   BUFR master table
                 5-6        13-14        58   originator (U.S. Navy - FNOC)
                   7           15         0   update sequence number
                   8           16         0   indicator for no Section 2  
                   9           17         0   Table A - surface land data
                  10           18         0   BUFR message sub-type
                  11           19         2   version number of master
                                              tables
                  12           20         0   version number of local
                                              tables
                  13           21        92   year of century
                  14           22         4   month
                  15           23        18   day
                  16           24         0   hour
                  17           25         0   minute
                  18           26         0   reserved for local use by ADP
                                              centers (also needed to
                                              complete even number octets
                                              for section
Section 3
(Data            1-3        27-29        18   length of section (octets)
description        4           30         0   reserved 
section)         5-6        31-32         6   number of data subsets
                   7           33   bit 1=1   flag indicating observed data
                                    bit 2=1   flag indicating compression
                8-17        34-43  0 01 002   WMO station no.          
                                   0 07 001   height of station
                                   0 10 004   pressure
                                   0 12 004   temperature
                                   0 12 006   dew point
                  18           44         0   needed to complete section
                                              with an even number of octets
Section 4       
(Data            1-3        45-47        38   length of section (octets)
section)           4           48         0   reserved
                 5-52       49-82      data   261 continuous bits of 
                                              compressed data plus 11 bits
                                              to end on even octet
Section 5
(End section)    1-4        83-86      7777   encoded CCITT international 
                                              Alphabet No. 5

Figure 4-3.  BUFR message of 6 subsets in compressed form

               Section    Octet in   Encoded
               Octet No.   Message    Value     Description
Section 0 
(indicator       1-4         1-4       BUFR   encoded international CCITT
section)                                      Alphabet No. 5
                 5-7         5-7      15000   total length of message (octets)
                   8           8          2   BUFR edition number
Section 1
(identification  1-3         9-11        18   length of section (octets)
section)           4           12         0   BUFR master table
                 5-6        13-14        58   originator (U.S. Navy - FNOC)
                   7           15         0   update sequence number
                   8           16         0   indicator for no Section 2  
                   9           17         0   Table A - surface land data
                  10           18         0   BUFR message sub-type
                  11           19         2   version number of master
                                              tables
                  12           20         0   version number of local
                                              tables
                  13           21        92   year of century
                  14           22         4   month
                  15           23        18   day
                  16           24         0   hour
                  17           25         0   minute
                  18           26         0   reserved for local use by ADP
                                              centers (also needed to
                                              complete even number octets
                                              for section
Section 3
(Data            1-3        27-29        18   length of section (octets)
description        4           30         0   reserved 
section)         5-6        31-32      1898   number of data subsets
                   7           33   bit 1=1   flag indicating observed data
                                    bit 2=0   flag indicating no
                                              compression
                 8-17       34-43  0 01 002   WMO station no.          
                                   0 07 001   height of station
                                   0 10 004   pressure
                                   0 12 004   temperature
                                   0 12 006   dew point
                  18           44         0   needed to complete section
                                              with an even number of octets
Section 4       
(Data            1-3        45-47     14952   length of section (octets)
section)           4           48         0   reserved
                 5-52    49-14996      data   continuous bit stream of data
                                              for 1898 subsets, 63 bits per
                                              subset plus 10 bits to end on
                                              even octet
Section 5
(End section)    1-4  14997-15000      7777   encoded CCITT international 
                                              Alphabet No. 5

Figure 4-4.  BUFR message of 1898 subsets in non-compressed form   

               Section    Octet in   Encoded
               Octet No.   Message    Value     Description
Section 0 
(indicator       1-4         1-4       BUFR   encoded international CCITT
section)                                      Alphabet No. 5
                 5-7         5-7      15000   total length of message (octets)
                   8           8          2   BUFR edition number
Section 1
(identification  1-3         9-11        18   length of section (octets)
section)           4           12         0   BUFR master table
                 5-6        13-14        58   originator (U.S. Navy - FNOC)
                   7           15         0   update sequence number
                   8           16         0   indicator for no Section 2  
                   9           17         0   Table A - surface land data
                  10           18         0   BUFR message sub-type
                  11           19         2   version number of master
                                              tables
                  12           20         0   version number of local
                                              tables
                  13           21        92   year of century
                  14           22         4   month
                  15           23        18   day
                  16           24         0   hour
                  17           25         0   minute
                  18           26         0   reserved for local use by ADP
                                              centers (also needed to
                                              complete even number octets
                                              for section
Section 3
(Data            1-3        27-29        18   length of section (octets)
description        4           30         0   reserved 
section)         5-6        31-32      4267   number of data subsets
                   7           33   bit 1=1   flag indicating observed data
                                    bit 2=1   flag indicating compression
                8-17        34-43  0 01 002   WMO station no.          
                                   0 07 001   height of station
                                   0 10 004   pressure
                                   0 12 004   temperature
                                   0 12 006   dew point
                  18           44         0   needed to complete section
                                              with an even number of octets
Section 4       
(Data            1-3        45-47     14952   length of section (octets)
section)           4           48         0   reserved
                 5-52    49-14996      data   119569 continuous bits of
                                              compressed data plus 15 bits
                                              to end on even octet
Section 5
(End section)    1-4  14997-15000      7777   encoded CCITT international 
                                              Alphabet No. 5

Figure 4-5. BUFR message of 4267 subsets in compressed form

CHAPTER 5

Table C Data Description Operators

5.1 Introduction. Table C data description operators (Table 5-1) are used when there is a need to redefine Table B attributes temporarily, such as the need to change the data width, scale or reference value of a Table B entry.

5.2 Changing Data Width, Scale and Reference Value. If data from a DRIFTER observation (FM 18-IX Ext., Report of a drifting-buoy observation) were being encoded into BUFR, there are no Table B entries to correspond to latitude and longitude in thousandths of degrees. The Table B entries for latitude and longitude are high accuracy (hundred thousandths of a degree) and coarse accuracy (hundredths of a degree). There are several possible methods to handle the encoding of latitude and longitude for DRIFTER in thousandths of degrees. One method would be to choose the high accuracy Table B entries for latitude and longitude in hundred thousandths of degrees. There would be no loss of accuracy, but a lot of unused bits for each observation would be encoded in Section 4. The high accuracy latitude requires 25 bits for representation, high accuracy longitude 26 bits. To represent latitude and longitude to thousandths of degrees would require 18 and 19 bits respectively. If the extra bits from using high accuracy were not deemed a concern, this would be the easiest method, but if it were desirable to use only the bits required to represent latitude and longitude in thousandths of degrees, there are two ways for this to be accomplished. First, and the least desirable of any method, would be to create local descriptors for Table B with the appropriate scale and reference values for thousandths of degrees. This is the least desirable method because if the BUFR message were to be transmitted to another center, then the receiving center would have to have available to their BUFR decoder program the correct definition of the local descriptors. The other method would be to use the Table C data description operators 2 01 Y to change the data width of the Table B descriptor for latitude and longitude, 2 02 Y to change the scale and 2 03 Y to change the reference values.

There is now a choice to be made between temporarily changing latitude and longitude from hundredths of degrees to thousandths, or, from changing them from hundred thousandths to thousandths. It doesn't matter which is done, as the only difference between the choices will be the Y operand entries of the data description operators.

If it were decided to change the data width of latitude and longitude from hundredths to thousandths of degrees, what first must be done is to determine how many bits are necessary to represent individually latitude and longitude in thousandths of a degree. The maximum value for latitude to be represented in the data in Section 4 would be based on taking into consideration the also to be changed reference value of -9000. The new reference value will be -90000 to accommodate thousandths of degrees. The maximum value of a reported latitude to be encoded into BUFR bits is 180000. This value is arrived at by a reported latitude of 90.000 North which must then be scaled to 10³ (also to be changed from 10²) to retain the desired precision, then subtracting the reference value of -90000, producing 180000. The number of bits to accommodate 180000 is 18. To change the data width of the Table B entry for latitude (coarse accuracy) from 15 bits to 18 bits would require the Table C entry 2 01 131. The Y operand 131 is determined by the Operation Definition of adding Y-128 bits to the data width given for the element 0 05 002. The number 128 is the midpoint between 1 and 255 which is the range of values for the 8 bits of Y. Numbers between 1 and 127 will produce a negative value for changing data width, 129 to 255 a positive value.

5-1. BUFR Table C - Data Description Operators

Table
Reference	Operand	Operator Name	Operation Definition
F X
2 01	Y	Change data width	Add (Y-128) bits to the data width for each data element in Table B, other than CCITT IA5 (character) data, code or flag tables
2 02	Y	Change scale	Multiply scale given for each non-code data elements in Table B by 10^(Y-128)
2 03	Y	Change reference	Subsequent element values descriptors define new reference values for corresponding Table B entries. Each new reference value is represented by Y bits in the Data Section. Definition of new reference values in concluded by encoding this operator with Y=255. Negative reference values shall be represented by a positive integer with the left-most bit (bit 1) set to 1.
2 04	Y	Add associated	Precede each data element field with Y bits of information. This operation associates a data field (e.g. quality control infor-mation) of Y bits with each data element.
2 05	Y	Signify character	Y characters (CCITT international Alphabet No. 5) are inserted as a data field of Y x 8 bits in length
2 06	Y	Signify data width for the immediately following local descriptor	Y bits of data are described by the immediately following descriptor

The next step would be to change the scale from 10² to 10³ in order to properly decode the reported latitude which will be encoded in Section 4 with 18 bits. The WMO BUFR definition for change scale, "Multiply scale given for each non-code data element in Table B by 10^(Y-128)", is referring to the result of 10^scale. For Table B entry 0 05 002, the scale is 2. In this case it is the resultant value 100 which is to be multiplied by 10^(Y-128), not the scale 2. Thus, the data description operator to change the scale for Table B entry 0 05 002 would be 2 02 129.

To complete the necessary changes for Table B, the reference value also needs to be modified from -9000 to -90000. Here again it must be determined how many bits are necessary to accommodate the new value, as the new reference value itself is encoded into Section 4. The number of bits to accommodate 90000 (positive value) is 17. It is, however, necessary to indicate this is to be a negative value which will require an additional bit. To indicate a new reference value as negative, the left most bit of the reference value encoded into Section 4 is set to 1. The sequence of operators needed to refedine or change a reference value is:

1) the 2 03 018 "change reference values operator", which announces a change and states how many bits are set aside for the new reference value in the data section (18 in this example)

2) one or more regular (F=0) data descriptors to indicate which variable(s) are to have new reference values. There are, of course, as many 18-bit values in the data as there are data descriptors following the 2 03 018 descriptor.

In this particular case it will not be necessary to have separate Data Description operators to modify longitude data width and change of scale. The increase in number of bits for data width to accommodate longitude to thousandths of degrees is also 3. The change of scale also remains the same. There will, however, be a required change of reference value from -18000 to -180000. By following the same steps as when changing the latitude Table reference value, the Data Description operator for changing the longitude reference value would be 2 03 019 followed by the data descriptor 0 06 002, followed by the descriptor 2 03 255 to indicate the end of the list of descriptors for which reference values are being changed.

Once Data Description operators 2 01 Y, 2 02 Y and 2 03 Y have been used in Section 3, they remain in effect for the rest of whatever follows in the Section 3 data descriptions. To cancel operator 2 01, and 2 02, the additional entries must 2 01 000 and 2 02 000 must be included in Section 3. To cancel the reference value change indicated by the operator 2 03 018, there must be included in Section 3 an operator 2 03 000.

The data description operators encoded into Section 3 for DRIFTER observations would then be:


                  0 01 005    buoy/platform identifier

                  0 02 001    type of station

                  3 01 011    Table D descriptor which expands to
                                 descriptors for year, month and day

                  3 01 012    Table D descriptor which expands to 
                                 descriptors for hour and minute
                         
    +-----------  2 01 131    increase data width by 3
    ¦
    ¦    +------  2 02 129    multiply scale by 101
    ¦    ¦
    ¦    ¦   +--  2 03 018    change reference value - new value 
    ¦    ¦   ¦                   contained in 18 bits in Section 4 
    ¦    ¦   ¦
    ¦    ¦   ¦    0 05 002    new reference value applies to  
    ¦    ¦   ¦                   latitude - coarse accuracy
    ¦    ¦   ¦
    ¦    ¦   +--  2 03 255    terminate reference value definition
    ¦    ¦                       203018
    ¦    ¦
    ¦    ¦   +--  2 03 019    change reference value - new value
    ¦    ¦   ¦                   contained in 19 bits in Section 4
    ¦    ¦   ¦
    ¦    ¦   ¦    0 06 002    new reference value applies to 
    ¦    ¦   ¦                   longitude - coarse accuracy 
    ¦    ¦   ¦
    ¦    ¦   +--  2 03 255    terminate reference value definition
    ¦    ¦
    ¦    ¦
    ¦    +------  2 02 000    cancel change scale
    ¦
    +-----------  2 01 000    cancel change data width
 
                  2 03 000       Cause all redefined reference values to
                                 revert back to standard Table B values

            OTHER ADDITIONAL DATA DESCRIPTORS
            TO COMPLETE DRIFTER DESCRIPTION


The order for cancellation of nested Data Description operators follows the above pattern where the last defined is the first canceled.
If instead of changing latitude and longitude from hundredths to thousandths, it were to be changed from hundred thousandths to thousandths the following descriptions would be used:

                  0 01 005    buoy/platform identifier

                  0 02 001    type of station

                  3 01 011    Table D descriptor which expands to descriptors for year, month and day

                  3 01 012    Table D descriptor which expands to descriptors for hour and minute
                         
    +-----------  2 01 121    decrease data width by 7
    ¦
    ¦    +------  2 02 127    multiply scale by -1
    ¦    ¦
    ¦    ¦   +--  2 03 018    change reference value - new value  
    ¦    ¦   ¦                contained in 18 bits in Section 4 
    ¦    ¦   ¦
    ¦    ¦   ¦    0 05 001    new reference value applies to
    ¦    ¦   ¦                latitude - high accuracy
    ¦    ¦   ¦  
    ¦    ¦   +--  2 03 255    terminate reference value definition 203018
    ¦    ¦
    ¦    ¦   +--  2 03 019    change reference value - new value 
    ¦    ¦   ¦              contained in 19 bits in Section 4
    ¦    ¦   ¦
    ¦    ¦   ¦    0 06 001    new reference value applies to
    ¦    ¦   ¦                 longitude - high accuracy
    ¦    ¦   ¦
    ¦    ¦   +--  2 03 255    terminate reference value definition
    ¦    ¦                          
    ¦    ¦
    ¦    +------  2 02 000    cancel change scale
    ¦
    +-----------  2 01 000    cancel change data width
 
                  2 03 000    Cause all redefined reference values to revert back to standard Table B values

            OTHER ADDITIONAL DATA DESCRIPTORS
            TO COMPLETE DRIFTER DESCRIPTION

Which would be the better of the methods? Again, use of local descriptors to define latitude and longitude is not a good idea as their use may cause a BUFR message to be undecodable in some other center. Of the two other methods, using high accuracy latitude and longitude, or using Data Description operators to change latitude and longitude definitions to thousandths of degrees will each produce the same results. In terms of number of bits saved by changing to thousandths of degrees over high accuracy, a DRIFTER observation containing data equivalent to the DRIFTER code (FM 18-IX Ext. Sections 0 through Section 2) would require 214 bits per observation using high accuracy latitude and longitude. If latitude and longitude were changed by Data Description operators to thousandths of degrees then the observation would require 200 bits per observation, or a savings of 14 bits per observation, hardly worth the effort!

The preceding example does not imply that changing data width, scale and reference values should not be done, but it does point out that to do so to lower the number of bits within the data section for a given parameter is probably not that beneficial. In those instances where the Table B entries do not provide enough significance for new technologies, then the flexibility is provided within BUFR to handle those situations. If, for example, satellites were to measure latitude and longitude to millionths of degrees, then, to maintain significance of those measurements would require changing data width, scale and reference values, at least until (or if) there is a new Table B entry.

This example also shows that when changing data width, scale and reference values, a single Table D descriptor cannot be used in Section 3. The reason is that changing data width and scale apply to all descriptors in Table B until the change data width and/or change scale is canceled. Since the descriptor to be affected may be deep within the Table D expansion process, there is no way to include the Data Descriptor operators in that expansion. A change in reference value, however, can be accomplished while still using a single Table D entry. This is possible because after the entry for change reference value, 2 03 YYY, there must also be included the Table B descriptor or multiple descriptors that are to have new reference values.

5.2.1 Changing Reference Value Only. The Table B entries for geopotential, 0 07 003 and 0 10 003 have a reference value of -400, too restrictive for very low pressure systems. The Table C Data Description operator 2 03 YYY can be placed as the first descriptor in Section 3, followed by the Table B descriptor(s) to which it applies. Placing 2 03 010, followed by 0 10 003 before the Table D descriptor means that each time data is encountered in Section 4 for 0 10 003, the new reference value indicated by the count of 10 bits specified by YYY applies. Within 10 bits the limit of the new reference value as a negative number is -511. The descriptor to conclude the list of descriptors for which new reference values are supplied follows immediately, followed in turn by the Table D descriptor (Figure 5-1). In Figure 5-1, the order of the Section 3 descriptors is:

2 03 010 0 10 003 2 03 255 3 09 008

The Section 4 data will be in the order as indicated by Figure 5-1.

                                                                  SECTION 4
                                                              WIDTH IN BITS
2 03 010 --------------------------------- CHANGE REFERENCE VALUE    
                                           (ACTUAL REFERENCE VALUE
                                           IN SECTION 4) -------------    0 
0 10 003 -------------------------------- REFERENCE VALUE TO CHANGE:
                                          GEOPOTENTIAL ---------------   10
                                                                           
2 03 255 -------------------------------- TERMINATE CHANGE REFERENCE 
                                          VALUE ----------------------    0
                            + 0 01 001 -- WMO BLOCK NO. --------------    7
                 +3 01 001 -+ 0 01 002 -- WMO STATION NO. ------------   10
                 ¦
                 ¦0 02 011--------------- RADIOSONDE TYPE ------------    8
                 ¦0 02 012--------------- RADIOSONDE COMP METHOD------    4
                 ¦
        +3 01 0 8¦          + 0 04 001 -- YEAR -----------------------   12
        ¦        ¦3 01 011 -¦ 0 04 002 -- MONTH ----------------------    4
        ¦        ¦          + 0 04 003 -- DAY ------------------------    6
        ¦        ¦ 
        ¦        ¦          + 0 04 004 -- HOUR -----------------------    5
        ¦        ¦3 01 012 -+ 0 04 005 -- MINUTE ---------------------    6
        ¦        ¦          
        ¦        ¦          + 0 05 002 -- LATITUDE (coarse accuracy) -   15
        ¦        +3 01 024 -¦ 0 06 002 -- LONGITUDE(coarse accuracy) -   16
        ¦                   + 0 07 001 -- HEIGHT OF STATION ----------   15
        ¦          
        ¦        +0 20 010--------------- CLOUD COVER (TOTAL) --------    7
3 09 008¦        ¦0 08 002--------------- VERTICAL SIGNIFICANCE ------    6
        ¦        ¦0 20 011--------------- CLOUD AMOUNT ---------------    4
        ¦3 02 004¦0 20 013--------------- HEIGHT OF BASE OF CLOUD ----   11
        ¦        ¦0 20 012--------------- CLOUD TYPE Cl --------------    6
        ¦        ¦0 20 012--------------- CLOUD TYPE Cm --------------    6
        ¦        +0 20 012--------------- CLOUD TYPE Ch --------------    6
        ¦                                
        ¦1 01 000 ----------------------- DELAYED REP. 1 FACTOR ------    0
        ¦0 31 001 ----------------------- REPLICATION FACTOR ---------    8
        ¦                                
        ¦        +0 07 004--------------- PRESSURE -------------------   14
        ¦        ¦0 08 001--------------- VERTICAL SOUNDING SIG ------    7
        ¦        ¦0 10 003--------------- GEOPOTENTIAL ---------------   17
        +3 03 014¦0 12 001--------------- TEMPERATURE ----------------   12
                 ¦0 12 003--------------- DEW POINT ------------------   12
                 ¦0 11 001--------------- WIND DIRECTION -------------    9
                 +0 11 002--------------- WIND SPEED -----------------   12
                                                                           
2 03 000 -------------------------------- CAUSE REDEFINED REFERENCE
                                          VALUE TO REVERT BACK TO 
                                          STANDARD TABLE B VALUE -----    0
                                                                        --- 
                                                            TOTAL BITS  255

Figure 5-1. Change reference value of geopotential

5.3 Add Associated Field. The Data Description operator 2 04 Y permits the inclusion of quality control information of Y bits attached to each following data element. The additional YYY bits of the associated field appear in the data section as prefixes to the actual data elements. The Add Associated Field operator, whenever used, must be immediately followed by the Class 31 Data Description Operator Qualifier 0 31 021 to indicate the meaning of the associated fields.

0 31 021

Associated field significance

Code
 figure

   0    Reserved
   1    1 bit indicator of quality    0 = good
                                      1 = suspect or bad
   2    2 bit indicator of quality    0 = good
                                      1 = slightly suspect
                                      2 = highly suspect
                                      3 = bad
  3-6    Reserved
   7     Percentage confidence
   
  8-20    Reserved
   21    1 bit indicator of correction 0 = original value
                                       1 = substituted/corrected value
22-62    Reserved for local use
   63    Missing value

If quality control information were to be added to a single parameter such as pressure, Table B descriptor 0 07 004, the following sequence would appear in Section 3:

    2 04 007  0 31 021  0 07 004  2 04 000

The meaning of this sequence is:

    2 04 007 - indicator that 7 bits of data precede all following Table B entries
 
    0 31 021 - code table entry for the meaning of the 7 bits preceding the Table B entry

    0 07 004 - Table B entry for pressure

    2 04 000 - cancellation of the Add Associated Field operator The Section 4 data width for this sequence is 27 bits.  The operators 2 04 007 and 2 04 000 do not occupy any bits within Section 4.  The 27 bits are taken by 0 31 021 (6 bits) and 0 07 004 (21 bits, 7 bits of associated field plus 14 bits of pressure value)

When multiple Table B entries are preceded by 2 04 YYY as in:

    2 04 007  0 31 021  0 07 004  0 31 021  0 10 003  2 04 000  

the Add Associated Field operator 2 04 007 and the Data Description Operator Qualifier 0 31 021 both apply to the Table B descriptors 0 07 004 and 0 10 003.  The Section 4 data width for the sequence is then:

          2 04 007    0 bits
          0 31 021    6
          0 07 004    21 (7 associated bits plus bits 14 data)
          0 31 021    6 (change meaning of associated field)
          0 10 003    24 (7 associated bits plus 17 bits data)
          2 04 000    0 

Note that the associated fields are not prefixed onto the data described by 0 31 YYY descriptor.  This is a general rule: none of the Table C operators are applied to any of the Table B, Class 31 descriptors.
                              
If quality control information were to be added to the following sequence of parameters as described by the Table D descriptor  3 03 014:
 
                                                             SECTION 4   
                                                            WIDTH IN BITS
           +0 07 004----------------- PRESSURE -------------------   14
           ¦0 08 001----------------- VERTICAL SOUNDING SIG ------    7
           ¦0 10 003----------------- GEOPOTENTIAL ---------------   17
  3 03 014-¦0 12 001----------------- TEMPERATURE ----------------   12
           ¦0 12 003----------------- DEW POINT ------------------   12
           ¦0 11 001----------------- WIND DIRECTION -------------    9
           +0 11 002----------------- WIND SPEED -----------------   12
                                                                    ---
                                                                     83

By placing in Section 3 the operators 2 04 YYY and 0 31 021 immediately preceding 3 03 014, and the cancellation operator 2 04 000 following 3 03 014, the following sequence would be produced:

                                                              SECTION 4
                                                          WIDTH IN BITS
        +-- 2 04 007----------------- ADD ASSOCIATED FIELD            0
        ¦ 
        ¦   0 31 021----------------- ASSOCIATED FIELD SIG            6
        ¦                             ASSOCIATED FIELD -----------    7
        ¦   0 07 004----------------- PRESSURE -------------------   14
        ¦                             ASSOCIATED FIELD -----------    7
        ¦   0 08 001----------------- VERTICAL SOUNDING SIG ------    7
        ¦                             ASSOCIATED FIELD -----------    7
        ¦   0 10 003----------------- GEOPOTENTIAL ---------------   17
        ¦                             ASSOCIATED FIELD -----------    7
        ¦   0 12 001----------------- TEMPERATURE ----------------   12
        ¦                             ASSOCIATED FIELD -----------    7
        ¦   0 12 003----------------- DEW POINT ------------------   12
        ¦                             ASSOCIATED FIELD -----------    7
        ¦   0 11 001----------------- WIND DIRECTION -------------    9
        ¦                             ASSOCIATED FIELD -----------    7
        ¦   0 11 002----------------- WIND SPEED -----------------   12
        ¦
        +-- 2 04 000----------------- CANCEL ADD ASSOCIATED FIELD-    0
                                                                    ---
                                                                    138

Adding associated fields to a data sequence that is described by a Table D descriptor means the associated fields are placed before all data items in the sequence.  If quality control information were to be applied only to the pressure and geopotential parameters, the Table D descriptor could not be used but instead each individual parameter would have to be listed in Section 3.  


        +-- 2 04 007----------------- ADD ASSOCIATED FIELD -------    0
        ¦   0 31 021----------------- ASSOCIATED FIELD SIG -------    6
        ¦                             ASSOCIATED FIELD -----------    7
        ¦   0 07 004----------------- PRESSURE -------------------   14
        +-- 2 04 000----------------- CANCEL ADD ASSOCIATED FIELD-    0
         
            0 08 001----------------- VERTICAL SOUNDING SIG ------    7
         
        +-- 2 04 007----------------- ADD ASSOCIATED FIELD--------    0
        ¦   0 31 021----------------- ASSOCIATED FIELD SIG -------    6
        ¦           ----------------- ASSOCIATED FIELD                7
        ¦   0 10 003----------------- GEOPOTENTIAL ---------------   17
        ¦
        +-- 2 04 000----------------- CANCEL ADD ASSOCIATED FIELD-    0

            0 12 001----------------- TEMPERATURE ----------------   12
            0 12 003----------------- DEW POINT ------------------   12
            0 11 001----------------- WIND DIRECTION -------------    9
            0 11 002----------------- WIND SPEED -----------------   12
                                                                    ---
                                                                    109

If quality control information were to be add to TEMP observations as described in Figure 3-1 the following adjustments would have to be made.  The single Table D descriptor 3 09 008 could no longer be used as the expansion includes the additional Table D descriptor  3 03 014 which further expands to those parameters where quality control information would need to be inserted.  The actual order of the Section 3 descriptors would now be (Figure 5-2):

     3 01 038   3 02 004   1 13 000   0 31 001   2 04 007   0 31 021
     0 07 004   2 04 000   0 08 001   2 04 007   0 31 021   0 10 003
     2 04 000   0 12 001   0 12 003   0 11 001   0 11 002

                                                                  SECTION 4
                                                              WIDTH IN BITS
                     + 0 01 001 --- WMO BLOCK NO. --------------    7
         +3 01 001 --+ 0 01 002 --- WMO STATION NO. ------------   10
         ¦
         ¦0 02 011----------------- RADIOSONDE TYPE ------------    8
         ¦0 02 012----------------- RADIOSONDE COMP METHOD------    4
         ¦
3 01 038-¦           + 0 04 001 --- YEAR -----------------------   12
         ¦3 01 011---¦ 0 04 002 --- MONTH ----------------------    4
         ¦           + 0 04 003 --- DAY ------------------------    6
         ¦
         ¦           + 0 04 004 --- HOUR -----------------------    5
         ¦3 01 012---+ 0 04 005 --- MINUTE ---------------------    6
         ¦         
         ¦           + 0 05 002 --- LATITUDE (COARSE ACCURACY) -   15
         +3 01 024---¦ 0 06 002 --- LONGITUDE(COARSE ACCURACY) -   16
                     + 0 07 001 --- HEIGHT OF STATION ----------   15
                   
         +0 20 010----------------- CLOUD COVER (TOTAL) --------    7
         ¦0 08 002----------------- VERTICAL SIGNIFICANCE ------    6
         ¦0 20 011----------------- CLOUD AMOUNT ---------------    4
3 02 004-¦0 20 013----------------- HEIGHT OF BASE OF CLOUD ----   11
         ¦0 20 012----------------- CLOUD TYPE Cl --------------    6
         ¦0 20 012----------------- CLOUD TYPE Cm --------------    6
         +0 20 012----------------- CLOUD TYPE Ch --------------    6
                                   
1 13 000 -------------------------- DELAYED REP. 13 DESCRIPTORS     0
0 31 001 -------------------------- REPLICATION FACTOR ---------    8

2 04 007 -------------------------- ADD ASSOCIATED FIELD -------    0
0 31 021 -------------------------- ASSOCIATED FIELD SIG. ------    6
                                    ASSOCIATED FIELD -----------    7
0 07 004--------------------------- PRESSURE -------------------   14
2 04 000--------------------------- CANCEL ADD ASSOCIATED FIELD-    0

0 08 001--------------------------- VERTICAL SOUNDING SIG ------    7

2 04 007--------------------------- ADD ASSOCIATED FIELD -------    0
0 31 021--------------------------- ASSOCIATED FIELD SIG. ------    6
                                    ASSOCIATED FIELD -----------    7
0 10 003--------------------------- GEOPOTENTIAL ---------------   17
2 04 000--------------------------- CANCEL ADD ASSOCIATED FIELD-    0

0 12 001--------------------------- TEMPERATURE ----------------   12
0 12 003--------------------------- DEW POINT ------------------   12
0 11 001--------------------------- WIND DIRECTION -------------    9
0 11 002--------------------------- WIND SPEED -----------------   12
                                                                  ---
                                                      TOTAL BITS  277

Figure 5-2. Example of TEMP observations sequence using delayed replication and quality control information

5.4 Encoding Character Data. There may be occasions when it is necessary to encode character data into BUFR. An observation encoded into BUFR that originated from the character code FM 13-IX Ext. SHIP, for example, has within that code form the optional inclusion of plain language. If this character information were carried over for encoding into BUFR, the Data Description operator 2 05 Y would be used in Section 3 to indicate the inclusion of character data in Section 4 of the BUFR message. The Y operand of the Data Descriptor indicates the number of characters, encoded CCITT International Alphabet No. 5, inserted as a data field in Section 4.

The following parameters from the FM 13-IX Ext. SHIP code form:

                     +     6IsEsEsRs     +
                     ¦                 ¦
                  (  ¦  or ICING +     ¦  )
                     ¦                 ¦
                     +  plain language +

described by BUFR descriptors would be:

0 20 033 cause of ice accretion
0 20 031 ice deposit (thickness)
0 20 032 rate of ice accretion

It would have to be determined in advance how many characters would be allowed for the plain language. If only the word ICING were to be placed in Section 4, the Data Descriptor 2 05 005 would be used. If it were determined that ICING plus 25 additional characters, including spaces, were to be described then the descriptor would be 2 05 030. The data descriptors and data width in Section 4 would then be:

                                                  data width
                                                     in bits
              0 20 033   cause of ice accretion          4
              0 20 031   ice deposit (thickness)         7
              0 20 032   rate of ice accretion           3
              2 05 030   character information         240

Since an observation in FM 13-IX EXT. SHIP code would have either the parameters for ice reported, or ICING + plain language, but not both, then if there were no plain language the character information would be set to spaces. If the ICING + plain language were reported then the data for descriptors 0 20 033, 0 20 031 and 0 20 032 would be set to missing, all bits set. Since Section 3 indicates a count of how many subsets (observations) are included in Section 4, the above descriptors apply to all subsets, even if an individual observation does not contain any icing information. In that case the entire set of icing data for an observation would be set to missing and spaces.

5.5 Signifying Length of Local Descriptors. Local Descriptors were provided in BUFR to enable a data processing center the capability of describing information of any type within BUFR for the center's internal use (Figure 2-4). There does exist, however, the possibility that once data is described in BUFR it may be necessary to transmit a BUFR message to another center, where the BUFR message would contain local information. Since a receiver of the BUFR message may or not know the meaning of the local descriptor, it could be impossible to be able to decode the message, as the receiver would not know the data width in Section 4 of the local information (Figure 2-5). While it could be argued that BUFR messages containing local information should never be transmitted to another center, it may require a separate set of software to remove local information before the message is ready for transmission. To overcome this situation the Data Description operator 2 06 Y was developed to allow local information to be contained within a transmitted message and to give information to the receiver that indicates the length in bits of the local data. The meaning of the Data Description operator 2 06 Y is that the following local descriptor is describing Y bits of data in Section 4 (Figure 5-3). Knowing the width in bits of data in Section 4 then allows the receiver of the message to bypass that number of bits and allow proper decoding of Section 4.

The operator 2 06 Y can only be used when it precedes a local descriptor with F = 0. While it is within the rules of BUFR to create local descriptors with F = 3 (sequence descriptor), the Data Description operator 2 06 Y cannot be used to bypass whatever number of bits are being described by a sequence descriptor. Since a sequence descriptor expands to other descriptors and in the expansion process other local descriptors or delayed replication may be encountered, there is no way of knowing in advance how many total bits are covered by a sequence descriptor.

                                                                 SECTION 4
                                                             WIDTH IN BITS
2 06 003 -------------------------------- 3 BITS ARE DESCRIBED BY THE 
                                          FOLLOWING LOCAL DESCRIPTOR--    0

0 54 192 -------------------------------- LOCAL DESCRIPTOR -----------    3
 
                           + 0 01 001 --- WMO BLOCK NO.---------------    7
                 +3 01 001-- 0 01 002 --- WMO STATION NO.-------------   10
                 ¦
                 ¦0 02 001--------------- TYPE OF STATION ------------    2
                 ¦
        +3 01 023¦         + 0 04 001 --- YEAR -----------------------   12
        ¦        ¦3 01 011-¦ 0 04 002 --- MONTH ----------------------    4
        ¦        ¦         + 0 04 003 --- DAY ------------------------    6
        ¦        ¦
        ¦        ¦         + 0 04 004 --- HOUR -----------------------    5
        ¦        ¦3 01 012-- 0 04 005 --- MINUTE ---------------------    6
        ¦        ¦         
        ¦        ¦         + 0 05 002 --- LATITUDE (coarse accuracy) -   15
        ¦        +3 01 024-¦ 0 06 002 --- LONGITUDE(coarse accuracy) -   16
        ¦                  + 0 07 001 --- HEIGHT OF STATION ----------   15
        ¦          
        ¦                  + 0 10 004 --- PRESSURE -------------------   14
3 07 002¦        +3 02 001-¦ 0 10 051 --- PRESSURE REDUCED TO MSL ----   14
        ¦        ¦         ¦ 0 10 061 --- 3 HR PRESSURE CHANGE -------   10
        ¦        ¦         + 0 10 063 --- CHARACTERISTIC OF PRESSURE -    4
        ¦        ¦
        ¦        ¦         + 0 11 011 --- WIND DIRECTION -------------    9
        ¦        ¦         ¦ 0 11 012 --- WIND SPEED AT 10m ----------   12
        ¦        ¦         ¦ 0 12 004 --- DRY BULB TEMP AT 2m --------   12
        ¦        ¦         ¦ 0 12 006 --- DEW POINT TEMP AT 2m -------   12
        ¦        ¦3 02 003-¦ 0 13 003 --- RELATIVE HUMIDITY ----------    7
        ¦        ¦         ¦ 0 20 001 --- HORIZONTAL VISIBILITY ------   13
        ¦        ¦         ¦ 0 20 003 --- PRESENT WEATHER ------------    8
        ¦        ¦         ¦ 0 20 004 --- PAST WEATHER (1) -----------    4
        ¦        ¦         + 0 20 005 --- PAST WEATHER (2) -----------    4
        ¦        ¦
        +3 02 011¦         + 0 20 010 --- CLOUD COVER (TOTAL) --------    7
                 ¦         ¦ 0 08 002 --- VERTICAL SIGNIFICANCE
                 ¦         ¦              SURFACE OBS ----------------    6
                 ¦         ¦ 0 20 011 --- CLOUD AMOUNT ---------------    4
                 +3 02 004-¦ 0 20 013 --- HEIGHT OF BASE OF CLOUD ----   11
                           ¦ 0 20 012 --- CLOUD TYPE Cl --------------    6
                           ¦ 0 20 012 --- CLOUD TYPE Cm --------------    6
                           + 0 20 012 --- CLOUD TYPE Ch --------------    6
                                                                       ----
                                                           TOTAL BITS   270

Figure 5-3. Example of surface observations with local descriptor and data descriptor operator 2 06 Y

CHAPTER 6

Quirks, Advanced Features, and Special Uses of BUFR

J.D. Stackpole

6.1 Introduction. This chapter is a slightly disparate collection of odds and ends about BUFR: it discusses some of the advanced features that are sometimes overlooked in a casual reading of the WMO Manual, some of the special uses to which data represented in BUFR has been (or can be) put, and offers a fuller explanation of some of the rather obscure portions of the WMO description of the data representation system.

It also details some of the conventions adopted on an ad hoc basis in those (few) cases where the current specifications of BUFR are a little bit ambiguous. It is expected that what is described in this context will find its way into the published specifications all in good time.

In part, this chapter is necessary because it is turning out, with experience, that BUFR is indeed a very powerful data representation system. As people work with the system, they recognize new possibilities that were not thought of in the original design. Sometimes these new possibilities fit right in to the existing system, as though they were implicitly present from the beginning, other times they require a slight (or not so slight) augmentation of the BUFR rules and/or descriptors to implement the ideas. The latter must be done with care, of course, so as not to build any (violent) inconsistencies into BUFR. Some of the more promising proposals for change are discussed in this chapter, but are clearly indicated as such.

Also, this chapter is (unfortunately) necessary because some of the features (advanced or not) of BUFR are none too clearly spelled out in the necessarily limited confines of the WMO Manual. Experience has shown that some of the rules and regulations get overlooked and/or misinterpreted in their application. It is hoped that this chapter, and this Guide in general, will help to alleviate these sorts of problems.

BUFR sets out to do a lot; this, in turn, does lead to complexity. There is no free lunch.

As an organizing structure, each Section of a BUFR message/record will be dealt with in their regular order.

6.2 Section 0 - Indicator Section.

6.2.1 Edition Number Changes. There hasn't been any particular difficulty with this section except perhaps for the "Edition Number", currently 2, of the BUFR system. The Edition Number will change only if there is a structural change to the data representation system such that an existing and functioning BUFR decoder would fail to work properly if given a "new" record to decode. A change or augmentation to Tables A, B, D, or the code and flag tables would not involve defining a new Edition for BUFR; one would, of course, be required to change corresponding tables in a computer program but the logic of the program would not have to be changed. Changing tables is easy; changing program logic is not so easy. The former is, indeed, what BUFR is all about.

Edition changes can come about in three main ways. For one, if the basic bit or octet structure of the BUFR record was changed, by the addition of something new in one of the "fixed format" portions of the record, say, this would obviously require computer program changes to work properly. The change from Edition 1 to 2 involved just such a change - see the remarks in Section 1.2.1. These changes are expected to be kept to a bare minimum by the WMO community.

A second way that an edition change can come about is if the data description operators, in Table C, are augmented. These operator descriptors are qualitatively different from simple data descriptors: where the data descriptors just passively describe the data in the record, the operator descriptors are, in effect, instructions to the decoding program to undertake some particular action - just what actions are possible are those defined by Table C. Descriptors of type 1 (F=1), the replication operators, are also in this category - they tell the computer program to do something - but there is little room for change as they are currently defined. Clearly, if some new (and presumably useful) "operation" is defined, by inclusion of an operator in Table C, any decoding programs will have to be modified to respond properly. The descriptor 2 06 YYY (the "skip local descriptor" operator) was one such addition made in the conversion from Edition 1 to Edition 2.

Unfortunately, not all of the "operator" descriptors are collected in Table C. Some of the nominal data descriptors, in particular the "increment" descriptors found in Table B, Classes 4, 5, 6, and 7, take on the character of operators in conjunction with data replication (Regulation 94.5.4) and the operator qualifiers in Table B, Class 31. This will be expanded on further below. However, it is clear that changes or augmentations to the general process of replication, including increments, would involve defining a new Edition of BUFR.

A third change that would require a new Edition would be a change of the Regulations and/or many of the various notes scattered through the documentation. (The "notes", by the way, are as important as the "Regulations" in formally defining BUFR - they contain many of the details that flesh out the rather sparse regulations. Ignore them at your peril.) This is not particularly likely to happen - more likely will be clarifications to the Regulations or notes that will serve to make the rules more precise in (currently) possibly ambiguous cases. This may result in a tightening of a rule (or an interpretation) that may require a current "inappropriate" practice to be eliminated; whether this should be considered as requiring an Edition number change is a matter of some judgment. The WMO will be the final arbiter.

6.2.2 Maximum Size of BUFR Records. As noted elsewhere, there is no theoretical limit to the size of a BUFR message. The largest that can be accommodated by Octets 5-7 would be almost 17 mega-octets (megabytes) but a single bulletin of that size would be a bit much for the WMO Global Telecommunications System (GTS). By general international agreement, as specified in the Manual on the GTS, WMO Publication 386, single messages should be kept to less than 15,000 octets (15 kilobytes); 10,000 octets is a good safe number to use to be assured that GTS switching centers won't inadvertently truncate the bulletins as they pass them on. A new GTS specification for breaking up very large bulletins, using the new BBB parameter in the WMO Abbreviated Heading, has recently been promulgated. It is better, however, that such large records not be generated in the first place.

6.3 Section 1 - Identification Section.

6.3.1 Master Tables, Version Numbers, and Local Tables. At present there are no (known) Master Tables for BUFR other than the meteorological set published in the WMO Manual On Codes. That is not to say that such could not exist. That is one of the major strengths of BUFR: any scientific discipline interested in transmitting, storing, or even data basing information unique to it can define its own set of Tables and take advantage of meteorological experience in using the BUFR system.

As is noted elsewhere in this document, only the upper left portion of the (Class by Entry) matrix of descriptors has been defined in the current Master Table B - Classes 00 through 31, Variable number of entries in each class - in the current WMO documentation. Classes 48 through 63 are for local use - this means that any group may define anything they please for those classes; the same is true for Entries 192 through 255 in any Class. The other classes, and whatever unused entries are not spoken for in each class, are set aside for future international usage. Some of the Classes, Class 2 - Instrumentation in particular, are getting alarmingly crowded.

Elements can be added to the international portion of the tables on rather short notice by eliciting the coordinating cooperation of the WMO Working Group on Data Management (WGDM), Sub-Group on Data Representation and Codes (SGDR&C). International notification of such additions is accomplished by the World Weather Watch (WWW) Operational Newsletter. The WMO body that is parent to the WGDM, the Commission on Basic Systems (CBS), meets every two years or so and, upon CBS approval, the additions to the tables will be published by the WMO. This relatively informal method of adding to the tables is possible because the BUFR community is, at present, rather small. It is also possible because of the agreed upon convention that ONLY additions will be made to Tables B or D by this method, descriptors will neither be deleted nor changed, thus existing messages and decoding tables will not be effected as long as they have no need to make use of the new data descriptors. Changes to the Tables which involve only additions do not require that the Version number of the Tables be changed. Also, changes which are in the nature of "trivial" corrections (typographical errors, more precise definitions of terms, etc.) do not engender new Version numbers. The SGDR&C gets to define what is "trivial" and what is not. At present, the Tables stand at Version 2.

The SGDR&C meets from time to time to study and recommend changes that may involve the structure of BUFR or more substantial changes to the Tables, such as the addition of new operator descriptors, wholesale reorganization of the Tables, or the possible elimination of old and unused descriptors. This latter two steps will be taken with great care, however, so as to not make old archives of BUFR data inaccessible. Such recommendations will wend their way through the WMO system, eventually appearing as new Editions of BUFR, or Versions of the Tables, upon approval of the CBS. Because both the BUFR Edition number and the Version number of the Tables are part of the BUFR message, it is only a programming task for a decoding program to note the BUFR Edition number of a message and the Version number for the Tables and then extract the appropriate Table version from some computer files. The WMO publications will always contain the latest Version of the Tables; it is up to the various meteorological computer centers to maintain their own files of previous versions as well as their own local tables, of course.

The Local portions of the Tables can be updated, changed, augmented, etc. at will by the local group concerned. No international notice is required or expected. It is presumed that bulletins containing local descriptors will not be sent out internationally (but see the discussion of descriptor 2 06 YYY for an exception).

"Local", although not defined in the BUFR documentation, is generally taken to mean "within the processing center that is generating the BUFR messages", and not necessarily one country. The U. S. has a number of processing centers (the civilian weather service, Air Force, Navy, and other groups as well, each potentially identified by a unique processing center number and sub-number) each one of which is free to use the "local" portions of the BUFR tables as they see fit.

6.3.2 Originating Center (or Centre). The method of specifying the number of the originating center has been changed from what is described in the (current) Manual on Codes (Supplement 3, 1991). Here is a little historical background as to how things have evolved. GRIB (FM 92) was developed first and adopted a pre-existing WMO table of meteorological centers for "originating centers". It is a list of mainly large world and regional meteorological centers that could be expected to have the computer facilities required to generate GRIB bulletins if they had occasion to do so. When BUFR was developed it was realized that observational data could originate from far more locations that the GRIB table could accommodate. Thus, in BUFR, two octets were set aside for numerical specification of those locations, where GRIB used but one. A proposal was developed to enumerate those additional locations based upon International Civil Aviation Organization (ICAO) Location Indicators and this was published in the 1991 supplement as part of the BUFR specifications. Since then, however, it was realized that confusion and inconsistencies could result from separate GRIB and BUFR originating center tables and a recent proposal was accepted to construct Tables that were common to GRIB, BUFR and any other WMO code. To do this it was, in turn, necessary to drop the ICAO numbering system from BUFR. Fortunately, the two tables had not, up to now, developed any inconsistencies and the "ICAO numbers" were in very limited use. It was concluded that this change could be done without requiring a new Edition for BUFR.

The resulting system is simply that octet 6 of the Identification Section is used to identify the national (or international) originating centers, using the same common table as is in use for GRIB. This table will be coordinated and maintained by the WMO and published as part of the codes Manual. Any national sub-center numbers that may be required are to be generated by the national center in question and that number is to be placed in octet 5. The WMO has expressed a willingness to publish sub-center identification tables as supplied by the National centers.

6.3.3 Update Sequence Number. This feature does not seem to have wide use, as yet, but it is a powerful one. Note that the rule does require one to re-send an entire message if even only one element in the message is a correction of a previous message element. The "associated field" (see more on this later) is used to indicate which element(s) is(are) the corrected one(s) within the total message.

6.3.4 Optional Section 2. This section is not usually sent in international messages but it is put to use in some computer centers that use BUFR, frequently in a data base context. Some samples are given below. If it is present, the flag in octet 8 must be set, of course.

6.3.5 BUFR Message Sub-Type. This is purely a local option. As an example here are the sub-types currently in use at the National Meteorological Center, Washington. This sort of information is useful in processing the observational data after it has been decoded from BUFR. By knowing ahead of time, so to speak, in considerable detail just what sort of data is in a BUFR message, it can make the choice of subsequent processors that much easier. It also makes it possible to search through a collection of various data types, encoded in BUFR, and select out only those for which there is a special interest. This has obvious applications in a data base context.

BUFR Data Category 0: Surface data - land
Data Sub-type	Description
0	Unassigned
1	Synoptic - manual
2	Synoptic - automatic
3	Aviation - manual
4	Aviation - AMOS
5	Aviation - RAMOS
6	Aviation - AUTOB
7	Aviation - ASOS
8	Aviation - METAR
9	Aviation - AWOS
BUFR Data Category 1: Surface data - sea
Data Sub-type	Description
0	Unassigned
1	Ship - manual
2	Ship - automatic
3	Drifting buoy
4	Moored buoy
5	Land based C-MAN station
6	Oil rig or platform
7	Sea level pressure bogus
8	Moisture bogus
9	SSMI
BUFR Data Category 2: Vertical soundings (other than satellite)
Data Sub-type	Description
0	Unassigned
1	Rawinsonde - fixed land
2	Rawinsonde - mobile land
3	Rawinsonde - fixed ship
4	Rawinsonde - mobile ship
5	Dropwinsonde
6	Pibal
7	Profiler
BUFR Data Category 3: Vertical soundings (satellite)
Data Sub-type	Description
0	Unassigned
1	Geostationary
2	Polar orbiting
3	Sun synchronous
BUFR Data Category 4: Single level upper-air (other than satellite):
Data Sub-type	Description
0	Unassigned
1	Aircraft - manual
2	Aircraft - reconnaissance
3	Aircraft - automatic (ASDAR)
4	Aircraft - automatic (ACARS)
5	Aircraft - automatic (AMDAR)
BUFR Data Category 5: Single level upper-air (satellite):
Data Sub-type	Description
0	Unassigned
1	Cloud-tracked winds
2	Water-vapor-tracked winds

6.3.6 Date/Time. The Manual suggests placing the date/time "most typical for the BUFR message contents", whatever that may mean, in the appropriate octets. Obviously for synoptic observations the nominal synoptic time is appropriate. But note that the exact time of the observation can be placed in the body of the message if this is of interest or value to the users of the data. Not only that, but a collection of observation times (and exact locations) could be incorporated into one observation to indicate, for example, the times (and places) that a radiosonde balloon reached particular levels in the atmosphere. This possibility is getting serious attention as very fine mesh numerical models with frequent analysis update cycles are coming into operations. A RAOB can take an hour or more to complete its flight, and travel 40 or 50 km (or more) downwind in that time. That is clearly enough to place the high level parts of the observation into both the next analysis update cycle and at a neighboring gridpoint. Reporting this level of detail would require a major revision to the character based TEMP Code (FM 35) but BUFR can accommodate this additional information with no change whatsoever. [End of commercial for BUFR!]

Collections of satellite observations, which are inherently asynoptic, by convention will have the time of the first observation of the collection in the date/time octets. The exact times for each observation will, of course, be in the body of the message.

6.3.7 "Reserved for use ...". Here again is a playground for the local center. It is not expected that international BUFR messages will contain anything past octet 18 (and that octet will be all zeros per the rule that all Sections have an even number of octets) but there is no real damage if Section 1 is "extended" past octet 18. That is because the "Length of Section" in octets 1-3 will (should) indicate the full size of the section. Any operational decoding program worthy of the name will check the number in octets 1-3 and respond accordingly, presumably by skipping the extra material.

6.4 Section 2 - Optional Section - Examples of Data Base Keys.

6.4.1 U. S. National Meteorological Center Usage. At the U.S. National Meteorological Center (NMC) the Optional Section is being used, internally, as a very simple data base key. The actual data are stored in data subsets (see below), i.e., individual observations. For each observation/subset there is a short collection of information in Section 2, which looks like this:

Content	Element Size
Displacement from start of BUFR message to start of subset
(in units of octets)	2 octets
Latitude	2 octets
Longitude	2 octets
Day & hour	2 octets
Identification	6 octets

The first of these 14 octet packets starts in octet 5 of Section 2, with the others following without any break. This rather minimal set of information is enough to select out individual observations using location and/or time criteria. It is not necessary to decode any of the observations to find the desired ones - the displacement count tells you where to go to get each observation.

The alert reader will have noted a difficulty with the above scheme: in the BUFR system there is no requirement that data subsets each start on an exact octet or word boundary; indeed it is rather unlikely that they would, given the essentially random nature of the bit lengths used to store data elements. Yet the "displacement" is specified in terms of octets. Some sort of padding is clearly necessary, so that as the BUFR record is constructed each subset will start on a word (or half-word, or octet) boundary in whatever machine is in use. The actual padding is easy: one simply invents a local descriptor (NMC uses 0 63 255) which is specified to describe 1 bit of padding in the data section without assigning any other "meaning" to the bit. Then one places a delayed replication descriptor (1 01 000, with its associated 0 31 001 count descriptor) in front of the pad descriptor, with the delayed count giving the number of bits inserted to generate a pad of the proper length. This works but leaves one with local descriptors imbedded in the message - a problem if the message is to be sent out non-locally at some future time. It could be expensive to go through the record, remove the padding, and reconstruct a "pure" BUFR record for all the data.

But this can be resolved with the use of the "skip local descriptor" descriptor, 2 06 YYY. Just place it before the local "pad" descriptor, change the XX of the delayed replication descriptor to a value of 2, and the padded record can then be sent out without causing any problems for recipients. The whole thing would look like this:

	Descriptor	Values
Here is a fragment from an uncompressed		.
BUFR record (ignore blank lines)	ddd1	vvv1
	ddd2	vvv2
	ddd3	vvv3
end of "real" data subset ------>	ddd4	vvv4
Delayed rep. of two descriptors	1 02 000	-
n times; n is the number of bits in the pad, which follows the 8 bits containing the n value	0 31 001	n
Skip local descriptor	2 06 001	-
Local pad descriptor	0 63 255	(one bit)

And that does it.

Another solution, of course, to the padding problem to create a new international padding descriptor. But since "padding" is machine dependent it seems better to leave the padding up to the local center and not make a regular practice of exchanging padded BUFR messages.

6.4.1.1 BUFR as a Data Base Storage System. Once the observations/subsets are lined up on octet (or word) boundaries it becomes quite feasible to use BUFR records as a (simple) data base storage format. One restriction applies: all the data subsets must be the same size (i.e., no delayed replications - see below) and not be compressed. A common use of a data base system is to extract one particular data element, temperature, say, from all the available observations, for specific time and geographic ranges. To do so with "lined up" BUFR records all that is necessary is to decode the first subset and take note of the relative location of the temperature data in that subset. Then one simply extracts the temperature information from the relative location in the other subsets without having to (expensively) unpack the entire record.

Of course, this does not allow for all the features of a full relational data base management system. But it may well be sufficient for some more limited uses. It does have the advantage that data can be shared from center to center, and used in similar data base systems, without the necessity of decoding the data (or extracting it from an RDBMS) and re-encoding the data to transmit it in a reasonably efficient format. It already is in a reasonably efficient transmission format. It may be necessary to redefine the "pad" on a different machine, but that can be done without unpacking or repacking the entire record.

6.5 Section 3 - Data Description Section.

6.5.1 Data Subsets. "Data subsets" are variously defined in the current BUFR documentation. Conceptually, one subset is a collection of "related meteorological data", quoting from the Manual. Continuing: "For observational data, each subset usually corresponds to one observation", where "observation", in this context, could mean one surface synoptic observation of a number of specific elements, one radiosonde ascent, one profiler sounding, one satellite derived sounding with radiances perhaps, or the like. No examples of non-observational data subsets are given, but a typical one would be a message consisting of a collection of numerical model forecasts of "soundings" at grid-points or other specific locations. Each forecast sounding (pressure, temperature, wind, relative humidity, whatever, at the many levels of the model) would then be one data subset.

A more precise (if slightly tautological) "operational" definition shows up later on in Regulation 94.5.2: "A data subset shall be defined as the subset of data described by one single application of this collection of descriptors." In this context, the "collection of descriptors" means ALL the descriptors included in Section 3 of the BUFR message. In other words, one pass through the complete collection of descriptors will allow one to decode one data subset from Section 4. One then loops back in the descriptor list for as many times as the data subsets count calls for. All of the data, in Section 4, are properly described by repeated use of the same set of descriptors.

This does not imply that the data subsets are themselves identical in format. The use of delayed replication, as in a collection of RAOBs with varying numbers of significant levels, could cause variations in format (octet count) among data subsets. But they are still considered "subsets" in that the same set of descriptors will properly describe each individual set. The use of the delayed replication descriptor is what makes this possible, and is what delayed replication was designed for.

As noted in Chapter 5, certain descriptor operators, from Table C, can be used to redefine reference values, data lengths, scale factors, and add associated fields. There is also a group of descriptors which "remain in effect until superseded by redefinition" (more on them below). By common practice, ALL of these redefinitions or "remain in effect" properties are canceled when one cycles back to reuse a set of descriptors for a new data subset. You wipe the slate clean and start as though it was the first time. This rule is NOT specifically stated in the Manual at present, but presumably will be in the next update.

Of course, data subsets can be identical in format, i.e., have the same number of octets in each subset. This will always be the case if delayed replication is avoided. In this case one can compress the data, as described in Chapter 4, and gain considerable efficiency. Chapter 4, in the interest of avoiding overwhelming detail, doesn't mention that it is perfectly possible to compress data elements to which have been attached associated fields. The catch is that every data element has to have an associated field attached to it for the systematic compression to be possible. This may cut into the efficiency of the compression and should be considered before undertaking such a project.

Even though data subsets may be compressed and, as a result, the individual elements in each data subset are all reordered, the data subset concept still holds. The data subset count must be included in the correct location, and must be correct, of course. It is impossible to decompress a message without that information; and even if the data are not compressed the count is necessary to retrieve all the data subsets in a given message.

A final note about subsets: It is possible, within the BUFR framework, to account for many subsets by the device of placing a replication operator just in front of the set of descriptors that define one subset and have that replication include the count of all the subsets. This in effect reduces the data down to just one subset in that one would no longer cycle back and reuse the complete set of descriptors (now including the replication descriptor). This is NOT a recommended procedure. It is far better to have the subset count "up front", so to speak, in octets 5-6 of Section 3 if for no other reason that it gives the user an indication of how much data he will have to contend with before the decoding gets under way.

6.5.2 Observed or "other data". A brief note: the "other data" flagged in octet 7 has been taken to mean forecast information, such as a collection, from a numerical model, of forecast "soundings" of wind, temperature, humidity, whatever, at the various internal layers or levels of the model, at a collection of grid points or interpolated locations. The time significance qualifier (0 08 021) is used to indicate that the hours associated with each sounding are indeed forecast hours. The initial time of the forecast is given as an unqualified date/time group, and it is in the message prior to the 0 08 021 descriptor.

"Other data" need not be limited to forecasts, of course. Statistical, climatological, quality control information, etc. would all fall under the general category of "not observations". This lack of specificity is not of very great concern as the descriptors in the body of the message take care of the precise definition of just what information is in the BUFR record.

6.5.3 Data Descriptors. Here is where we shall discuss some of the advanced, tricky, quirky, or special features about descriptors. Perforce, there will be collateral discussions of the data which those descriptors set out to describe. Much of what is discussed here is in the nature of meta-rules about descriptors, in that it deals with the proper interpretation of some special descriptors and interpretation of special combinations of descriptors.

Descriptors, in isolation, are rather straight-forward: one descriptor describes one piece of data, one to one (or in the case of Class D descriptors, one to many). The special rules discussed here go beyond that - some are, in effect, the rules that an application program needs to "know", given that a set of (presumably decoded) data, with associated descriptors, is presented to it. The application program has to "know" the "meaning" of these special descriptors, or patterns of descriptors, to handle the data properly and deliver to the end user what the constructor of the BUFR message intended. Some of the meta-rules are also in the nature of operator descriptors that the BUFR decoding program itself has to "know" in order to reconstruct the original data. Of course, the creator of such BUFR messages has to know and follow the rules as well.

Perhaps all this generalization will come clearer when we deal with specific examples.

6.5.3.1 Descriptors for "Coordinates". The descriptors in Classes 00 through 09 (with 03 and 09 at present reserved for future use) have a special meaning added to them over and above the specific data elements that they describe. They (or the data they represent) "remain in effect until superseded by redefinition". By this is meant that the data in these classes serve as coordinates (in a general sense) for all the following observations. Once you encounter an 0 04 004 (which describes the "hour") one must assume that the hour (a time coordinate) applies to all the following observations, until either another

0 04 004 descriptor is encountered or you reach the end of the data subset.

Obviously the familiar coordinates (two horizontal dimensions - Classes 05 and 06 - a vertical dimension - 07 - and time - 04) are in this sub-category of descriptors, but so are some features that one might not think of as "coordinates", other than in a general sense. Forms of "identification" of the observing platform (block and station number, aircraft tail number, etc.) are "coordinates" in this sense, in that they most certainly apply to all the observations taken from that platform and they "remain in effect until superseded by redefinition". The instrumentation that is used to take the measurements (Class 02) also falls in the same category - it applies to all the actual observations because all the observations were made with that particular instrument. (A lot of the instrumentation class deals with details of radar - there seems a lot more to say about such equipment than, say, a thermometer. But if reporting details about the thermometer [mercury vs. alcohol vs. bimetallic strips, say] became important this information could be added to Class 2 without difficulty.)

A source of confusion can arise by noting that some parameters (height and pressure, for example) appear twice in the Tables: in Class 07 and again in Class 10. Which table descriptor is appropriate depends on the nature of the measurement that involves these parameters. A radiosonde, which measures wind, temperature, and humidity (and geopotential height by calculation) as a function of pressure, would report the pressure values using Class 07 (the vertical coordinate or independent variable) and the other parameters from the non-coordinate classes (10 for geopotential, 11, 12, and 13 for the others). An aircraft radar altimeter, on the other hand, might measure pressure (and use Class 10 to report the value) as a function of height (Class 07).

Yet another kind of "coordinate" is imbedded in Class 8 - Significance Qualifiers. These are a way of reporting various qualitative pieces of information about the (following) data elements, beyond their numeric values, that can be important to the user of the data. A problem of how to "cancel" significance has come up - there are cases where it makes no sense to have a particular kind of significance "remain in effect" for the rest of the message (or to the end of the data subset) but there is no explicit way to cancel it. A convention has been more or less agreed to that sending a "missing" from the appropriate table has the effect of canceling whatever significance was previously established from that table. Presumably, this convention will become a rule (or footnote) in a future printing of the BUFR manual.

There is an exception to the "remain in effect until redefined" rule: when two identical descriptors, from Classes 04 to 07, are placed back to back, that is to be interpreted as defining a range of coordinates. In this way an area, a volume, a span of time, or all three together, can be defined as needed. If the same descriptor shows up later on in the message, then that appearance does indeed redefine that particular coordinate value even if the original coordinates defined a range. The others still remain in effect.

Unfortunately some coordinate-like information has appeared in a Table outside the Class 00-09 range - it escaped somehow. Class 25 - Processing information, largely dealing (again!) with radar information, contains information that by its nature "remains in effect until superseded". It should be considered as a "coordinate" class and most likely will get such an official designation in the future. This will not involve any changes to the structure of BUFR or the tables, only a change in interpretation, or "meaning", of the data elements.

There is not much a general BUFR decoder program can do with this "coordinate " information, other than decode it and pass the information on to some follow-on applications program. As noted in the introduction to this sub-section, it is up to the applications program (or the human reading a decoded message) to supply the interpretation and the meaning of what is there, and then to act accordingly. Some of the interpretation is straightforward, almost second nature. "Obviously" the station identification applies to the following observations made at that station; "obviously" this pressure level is where the RAOB measured the wind and temperature; perhaps not so obvious is the fact that two consecutive azimuth values define a sector in which a hurricane is located. Making the "obvious" explicit with rules, regulations, and footnotes is part of what BUFR is all about. The developers of BUFR made every effort to EXCLUDE as much "self-evident" information as possible and instead require that "meaning" be specified by definite rules - that is, in part, what makes the system so powerful. [End of second commercial!]

6.5.3.2 Replication, Increments and "Run-length encoding". As described in Chapter 3, replication (a descriptor with F=1) is pretty straightforward. Even delayed replication is no real problem (except to someone writing a program to do it correctly). In either case, you just replicate the following X descriptors Y times ("Y" can be either part of the descriptor or found in the data section) and that is it. This allows you to encode and describe a potentially very large amount of data with relatively few descriptors. Very powerful feature.

The only slightly tricky matter is to keep mind that the 0 31 YYY descriptor that follows the delayed (Y=0) replication descriptor (1 XX 000) is not included in the count of descriptors to be replicated, the XX part of 1 XX YYY. Indeed the descriptors of Class 31 hold a unique position in BUFR. With one (partial) exception, they are never used in isolation, but always in conjunction with some other descriptor in order to "complete" the latter's function. The exception is 0 31 021 - it can be used alone to redefine the meaning of a previously established associated field. Class 31 descriptors are not included in the replication counts for replication descriptors (nor are they replicated), and their characteristics are not altered by any of the operator descriptors in Table C, even those that change a characteristics of every (other) Table B descriptor. They are "Teflon" descriptors: they stick to other descriptors but nothing sticks to them.

A rather ingenious "extension" to the delayed replication concept has come into use recently. This is one of those "unrecognized possibilities" of BUFR mentioned previously. The idea is simple: set up delayed replication but have the replication count (in the data section) be equal to zero. By a simple extension of the rules, this clearly means that the "following X descriptors shall be replicated zero times", that is, they don't get used at all, they should be skipped over - there is nothing in the data section corresponding to them. This is quite useful in that it allows one to set up a standard or all inclusive set of descriptors for a variety of observation types but then tailor the use of the descriptors, by setting the replication count to 1 or 0, to fit the actual data in hand. It is considerably more efficient than filling in the "missing" data (all 11111 bits) in the locations in the data section where there is no real observation. A particular example of this is in "vertical soundings", whether generated by RAOBs, satellites, profilers, dropsondes, etc. They all share a basic common structure but some lack whole classes of data - satellite soundings have no winds, for example. The use of "zero count replication" allows one to set up a single set of descriptors for all of these observations with a net saving of space over either setting a lot of "missings" in the data or maintaining a library of different sounding descriptor sets.

The current descriptors allow zero count replication without any changes in current tables. However, to save a little more space, the NMC (Washington) people have defined a 0 31 000 descriptor with a 1-bit data length. This allows a replication count of 1 or 0, all that is needed. This is not yet officially recognized (even though it is within the international portion of the table), but there seems little reason to doubt that it soon will be. It is a very useful idea.

When we turn to the few descriptors that define increments, and in particular discuss the use of increments in conjunction with replication, things get a little complex. The rules get quite precise and have to be adhered to closely.

Increments by themselves are not so bad. One first establishes the value of a coordinate that is capable of being incremented. Normally, that coordinate value would "remain in effect until superseded" by the appearance of the same descriptor with a new data value. But the appearance of a descriptor for an increment associated with that coordinate will also change the value of the coordinate by the amount found in the data section. The increment descriptor must be in the same class as the data to be incremented and must have the same units. In the current BUFR tables there is no built-in way to associate an increment uniquely with the descriptor/value that is capable of being incremented. This is unfortunate as it means the decoder program must have special rules encoded for each increment descriptor; it would be better to devise a general rule to associate increments with the thing (or things) to be incremented. This is a project for the future.

A sample is the best way to indicate the descriptor sequence when increments and replication are combined:

Descriptor	Interpretation
0 04 004	Sets the value of the hour at one increment LESS than the "starting" value.
dddd	assorted data may be placed here
dddd	without influencing the replication to come
0 04 014	sets the value of the increment in hours and increments the hour
1 XX 000	set up (delayed) replication of "next" XX descriptors
0 31 001	replication count (not included in the span of replication XX)
	XX descriptors to be replicated

Regulation 94.5.4.3 says that when the increment descriptor just proceeds the replication operator, as in this example, the incrementing action takes place right along with the replication. Every time the descriptors are replicated the hour (in the example) gets incremented, too. Note also, that the hour gets incremented right away, before the first pass through the XX descriptors. That's why the initial hour value (0 04 004) was given a value one increment's worth less than the hour value needed for the first iteration.

There is a refinement to this: it is legitimate to place Table C Operator Descriptors between the increment descriptor and the associated replication operator without altering the rule that the incrementing is associated with the replication. This is to allow for (temporary) redefinition of the data width, scale, whatever, of the descriptors within the XX span of replication (and following unless the changes are canceled), if necessary. The class C descriptors cannot be placed after the replication count descriptor as they would then be subject to the replication which might not work very well, nor can the class C descriptors be placed prior to the increment descriptor itself as that means the increment descriptor would have its characteristics changed, also not a good thing. Hence the refinement to the rule. (Don't forget the other rule, that Class 31 descriptors are not subject to change by Table C descriptors.)

Another feature of replication is "run length encoding". This is enabled by replication followed by the 0 31 011 (or 0 31 012) descriptor. Basically all it says is that in addition to replicating the descriptors a number of times, the data elements present in the data (as described by the set of descriptors to be replicated) should be replicated as well. This is useful, of course, when the original data, as it exists prior to BUFR encoding, contains long runs of identical values, or long runs of identical sets of data elements. This is a familiar and very straightforward form of data compression that can greatly increase the efficiency of data representation in special cases. Of course, the run length encoding replication can be coupled with incrementing of a coordinate; indeed it most likely would be as there is commonly a need to specify the locations of the string of replicated values.

6.5.3.3 The Associated Field. Associated fields are generally for the purpose of "saying something" extra about the particular data element with which they are associated. The most common use is in the arena of "quality control", where some sort of "confidence" indication is given. Other applications are possible and can be established by additions to Code Table 0 31 021.

Creating (or dealing with) an associated field in a message is a two step process. The first is to establish the field and set the number of bits that will precede all the data elements following the appearance of the associated field operator (2 04 YYY). YYY is that number. If 255 bits is not enough (good grief, why?) you can keep adding more bits by repeating the operator. You can also generate compound associated fields by repeating the operator if what you have to "say" about the data elements is complicated.

The second step is to define the meaning of those bits, i.e., how they are to be interpreted by a user of the data. This is done by immediately following each 2 04 YYY descriptor with the usual Class 31 descriptor, 0 31 021, which, by reference to the Code table 0 31 021, establishes that meaning. A little care is required here. Code Table 0 31 021 gives a (small) number of significance code figures (all taking up 6 bits in the data) for different size associated fields; obviously one must be consistent in setting an associated field length and identifying the meaning of the bits in the field.

Once an associated field is established, those extra bits must be (are assumed to be) prefixed to every following data element, until the associated field is canceled. If the quality information has no meaning for some of those following elements, but the field is still there, there is at present no explicit way to indicate "no meaning" within the currently defined meanings. One must either redefine the meaning of the associated field in its entirety (by including 0 31 021 in the message with a data value of 63 - "missing value") or remove the associated field bits by the "cancel" operator: 2 04 000. If multiple or compound associated fields have been defined, each must be canceled separately.

6.5.3.4 Changing Descriptors "On the Fly". A set of descriptors are defined in Class 00 which are used to describe descriptors. These have not had much international (or non-local) use to the best of my knowledge but their purpose, of course, is to send new international (or local) descriptors to interested parties for use prior to some official publication. But another "new possibility" has been suggested, one that would seem to have considerable potential value. This "new possibility" is not defined in the current BUFR specifications and, as will be obvious, would require a new Edition number for BUFR as it would require changes in the logic of a decoding program.

The suggestion is simple: it should be considered legitimate to send any descriptor, or collection of descriptors (new or currently defined, international or local), imbedded in a message which otherwise contains data. Then the new descriptor(s), or the redefined old one(s), may then be actually used in the remainder of that message/record. This affords a method of introducing new data on the fly, so to speak, or to change specific descriptor characteristics more selectively that can be done at present with Table C (operator) descriptors. Implementing this would, perforce, require that the decoding program recognize the new descriptor and then either add it to some internal table or use it to alter portions of existing tables. Either option would require new rules to be promulgated and old decoders to be altered. It doesn't seem to be a very complicated modification. This temporary change to a descriptor would only hold for the one record in which the change is introduced. The next BUFR record would be assumed to contain only "standard" (i.e., published) descriptors until such time as more new ones are introduced.

6.5.3.5 BUFR Records in Archives. A simple extension of the "new possibility" rule in the previous section makes it possible to alleviate a big concern about using BUFR records in long-term archives, that is, the necessity to retain BUFR Tables through a number of possible versions for an indefinite time span. The suggestion again is simple and rather obvious. In any file of (presumably many) BUFR records, the first such BUFR record should contain nothing but a collection of all the descriptors that will be used in all the other records in the file. Such a record would have a Table A data category value of 11. The "new rule", then, would be that the descriptors in the first record should be used for decoding all the many records in the file. Individual records could also have redefinitions of descriptors, as above, but they would hold for only the one record. This is really not a rule about the structure of BUFR per se, but is more of a suggestion for good data management where BUFR records and files are involved. Presumably such BUFR archive files would remain intact and only be exchanged in toto.

This archive suggestion would not involve any changes to BUFR itself (and hence no change to the Edition number) if the construction of Tables B, C and D, based on what is found in the first Table A = 11 record, was done externally to the decoding process. If the temporary change/addition to a descriptor was allowed that would introduce a new Edition to BUFR.

PART I APPENDIX

REFERENCES

Soderman, D. and Gibson, J.K. "The Specification for FM 94 BUFR". FM 94 BUFR Collected Papers and Specification. ECMWF, February 1988.

Stackpole, J. "Binary Universal Form for Data Representation (WMO Code FM 94 BUFR)". FM 94 BUFR Collected Papers and Specification. ECMWF, February 1988.

World Meteorological Organization Manual on Codes, Volume 1, International Codes, Part A - Alphanumeric Codes. 1988 Edition, Suppl. No. 2 (VII.1991)

World Meteorological Organization. Manual on Codes, Volume 1, International Codes, Part B - Binary Codes. 1988 Edition, Suppl. No. 3 (VIII.1991)