ADAPS Big/Little Endian Compatability
Design Document

Dave Lloyd
Computer Services Branch
Software Engineering Services

February 20, 2002



Signatures

Prepared by: Dave Lloyd
  Software Engineer,
  Raytheon, ITSS
Concurred by: Tim Beckmann
  LAS Project Lead 
  Raytheon, ITSS

Document History

Number Date and Sections            Notes                                                                                        
1 February 20, 2002 Document Created
2    
3    
4    
5    
6    

Contents

Introduction

Functional Summary

Insure that all data words larger than one byte in length are in the correct byte sequence for the hardware. The byte ordering will be done as soon as practical after reading and prior to writing buffers to disk. All data on disk is expected to have big endian byte ordering regardless of the hardware on which it was written.

Comments

None. (at this time)

Background

For reasons beyond the scope of this document there are two ways that hardware systems store data type with sizes greater than one byte. These systems are termed big endian and little endian. The term big endian is used because the big end, or most significant half, of the value is stored first in memory or disk. Conversely, the little end is stored first on little endian systems. For ADAPS to be compatable with both systems, the acquisition, archive and level 1b I/O libraries must intercept and rearrange the byte ordering before use or storage. All data read by the I/O libraries is expected to be stored using big endian format with the exception of a few data formats received from external cooperators. All data written by the ADAPS I/O libraries will be stored in big endian order.

The LAS I/O libraries have been modified for this purpose by using macros to efficiently swap entire buffers of data. This approach will not work for the ADAPS acquisition and archive formats nor the NOAA level 1b formats. These formats have headers with mixed data types making efficient byte-swapping difficult. Nine formats with mixed data types are defined by NOAA: HRPT minor frame, TIP minor frame, TBM header, ARS header, Data Set Header (DSH) for pre-KLM series satellites, DSH for KLM series satellites, data record header for pre-KLM series satellites, data record header for KLM series satellites and one data format that packs three 10-bit data words into a 32-bit word. In addition to the NOAA defined headers, there are twelve headers that are prepended to data sent from other sources. There is also one additional minor frame format. The archive format is a combination of the standard HRPT minor frame, TIP minor frame and the packed data formats.

To keep the modifications to a minimum, the byte-swapping software should intercept and modify buffers at the lowest possible level. Ideally, byte-swapping should occur immediately before writing data to and after reading data from disk.

A proof-of-concept prototype macro was developed to swap bytes in a buffer of mixed data types. Two issues came to light in testing:

  1. Preprocessor definitions were used for both the LAS and ADAPS macros. Also the ADAPS macro embedded the LAS macro. Both macros declared an 'i' indexing variable. The outer scope variable was invisible in the inner scope and the macro didn't work. Therefore care must be taken to ensure there are no variable conflicts.

  2. Since there are no guarantees that individual data words will start on word boundaries, a simple looping macro using pointers within a buffer may not work. The word 'may' was used because it appears that little endian platforms don't care about word boundaries. On a big endian platform, such as sg1, a bus error and core dump can occurr when trying to access a word from other than a word boundary.

Overall Design

At each point within the acquisition I/O (acqio), the archive I/O (archio), and the level 1b I/O (l1bio) libraries where a buffer is read or about to be written, include a macro call to swap the bytes. The macro will be conditionally expanded to actual code when compiled on little endian platforms or to nothing on big endian platforms. Input to the macro will be a pointer to the buffer needing manipulation and an array defining the swap operations. The swap operation array will contain a set of three values for each block of data that must be byte-swapped and a count of the sets. Each set will consist of a zero-based index that is the starting location of the data block, the number of words and the word size. When the macro finishes the original buffer will have been modified. Packing and unpacking of data will be handled by c_pack32() and c_unpck32() functions. No changes need to be made to the pack/unpack routines except to byte-swap the data by calling these routines with the swap flag set as appropriate.

Detail Design

  1. Definition of swap code
  2. 1.1 The operation arrays and swap macro will reside in the file $ADAPSINCLUDE/adapsendian.h.

    1.2 Operation array. The operation array is an integer array comprised of 3-element operation sets and a count of the operation sets. The first element in the array will be the count followed by the operation sets. Each operation set will contain three elements in the following order:

The definition of a block of words is: one or more words that have the same size and are located in continuous memory locations. Words may be of differing data types as long as the word size is equal.

Example:

    /* Definition of NOAA KLM DSH record
    ------------------------------------*/
    #define KLMDATASETHEADER  \
	{ \
	17,          /* count of operation sets */ \
	72,  2, 9,   /* operation set  1        */ \
	163, 4, 101, /* operation set  2        */ \
	...
	424, 2, 132  /* operation set 17        */ \
	}

1.3 Byte-swapping macro.

Inputs:

Output:

Care should be taken when adding the macro to code before a write to disk. If the data buffer will be used subsequent to the write a temporary buffer should be allocated, byte-swapped, written and deallocated.

Two options are available based on the findings during prototype evaluation. Opinions as to which one to use will be solicited during the Design Document peer review. Option 2 was deemed the safest procedure to implement.

Option 1 - ignore word boundary issue:

Algorithm:

Example:

    #define swapbytes_in_mixed_buffer( buf, arr )                         \
	{                                                                 \
	int j;                                                            \
	for ( j = 1; j < (arr)[0] * 3; j += 3 )                           \
	    swapbytes_in_buffer((buf)[(arr)[j]], (arr)[j+1], (arr)[j+2]); \
	}
Note: the indexing variable in the example is not a good choice based on the findings during prototyping. The LAS macros have been modified to include the '_macro' value to the names in its namespace. With this in mind, j_macro is also not a good choice.

Option 2 - don't ignore the boundary issue: same as Option 1 except replace the LAS endian macros with ADAPS macros that don't use integer, long or real variables to make assignments to the data buffer. Instead, do actual byte swapping using unsigned char variables. There would probably be some efficiency loss with this option.

  • Implementation of swap code
  • 2.1 The operation arrays, defined in $ADAPSINCLUDE/adapsendian.h, will be conditionally compiled by enclosing the definitions with: 2.2 The macros will be inserted in the I/O libraries as needed at the lowest level possible. Ideally the macros will be utilized immediately after a read or just prior to a write. This will not be possible in all cases since the data type is not always known at such low levels. To avoid passing the operation arrays to low level functions, macros will be inserted where practical. Example macro call:
    2.3 Within sections of code that don't have unique data types, a conditional block of code will be added to make the data type determination and select the proper operation array.

    2.4 Packed data records must be handled using a conditional statement prior to calling the macro. There are a few data formats the are received from external cooperators that are not stored in big endian format. The file $ADAPSTABLES/station.ids indicates these formats in column S. A swap flag is set in the ACQDESC structure based on swap column. The structure is passed to the pack/unpack functions. The pack/unpack functions were developed to swap the bytes when run on a big endian system. For these functions to operate properly on a little endian system, the meaning of the swap flag must be reversed in order to maintain the correct byte sequence. Example:

    where
    acq->swap_flg,
    is the ACQDESC structure swap flag.

  • Special cases
  • 3.1 There are places in the code where defining an operation array and calling the swap macro may be uneccessary. (See acqio/acqhead.c for an example) In cases where only a few values are obtained from a buffer just read and the buffer is subsequently thrown away, simple byte manipulation can be used. Determination of whether or not to use the macro will be at the discretion of the developer.

    3.2 Situations that were not discovered during initial analysis will be handled on a case by case basis at the discretion of the developer. This is also meant to cover modules that don't use the I/O libraries. If such modules are encountered, the developer should try to modify the module to use the I/O libraries.


     

    ACRONYMS

     
    Acronym Description
    ADAPS AVHRR Data Acquisition and Processing System
    ARS  Archive Retrieval System 
    AVHRR Advanced Very High Resolution Radiometer
    DSH  Data Set Header 
    HRPT  High Resolution Picture Transmission 
    I/O  Input Output 
    LAS Land Analysis System
    NOAA National Oceanic and Atmospheric Administration 
    TBM  Terabit Memory 
    TIP  TIROS Information Processor 
    TIROS  Television Infrared Observation Satealite