GISS Global Temperature Anomalies : 1880 - 2006

The data sets used here are available by anonymous FTP from ftp://data.giss.nasa.gov/pub/gistemp/download/. For the animation on this page, I used the 1200 km smoothed surface temperature data set, SBBX.Tsurf1200 and the sea surface temperatures from SBBX.SSTHadR2.

∆T>5 5≥∆T>4 4≥∆T>3 3≥∆T>2 2≥∆T>1
1≥∆T>0 0≥∆T>-1 -1≥∆T>-2 -2≥∆T>-3 -4≥∆T

Decoding the Data Files

File header

The data in the files above are saved in binary. This makes it tricky for mere mortals such as myself to play with the data. Integers in the data set are saved in big endian format and real numbers follow IEEE754 single precision semantics.

The documentation of the contents of the data files is far from complete. There are still items in the data whose purpose I do not know and it is entirely possible that I misunderstood something. Please note that the procedure outline here is not endorsed by NASA and any errors are mine.

For a comparison what a difference good documentation makes, you might want to take a look at the Health and Retirement Study web site and see how much information is available to track individuals, families and households over time. Then, head on over to the GISS temperature web site and compare what hard documentation is readily available in terms of what actually goes into creating their pretty pictures. Given both the financial and opportunity costs created worldwide by policies which use such temperature information, one would hope both the data and the documentation would be much more readily accessible.

Each binary file contains 8001 records. At the beginning and end of each record is a 32 bit integer (stored in big endian format) that gives the length of that record in bytes. For example, looking at SBBX.Tsurf1200, we see that the file starts with:

0000000: 0000 0070 0000 0600 0000 0001 0000 0006  ...p............
0000010: 0000 0600 0000 0608 0000 0758 0000 270f  ...........X..'.
0000020: ffff d8f1 4748 434e 2056 3220 5465 6d70  ....GHCN V2 Temp
0000030: 6572 6174 7572 6541 4e4f 4d20 2843 2920  eratureANOM (C) 
0000040: 2043 5220 3132 3030 4b4d 2020 2020 2020   CR 1200KM      
0000050: 2031 3838 302d 7072 6573 656e 7420 2020   1880-present   
0000060: 2020 2020 2020 2020 2020 2020 2020 2020                  
0000070: 2020 2020 0000 0070 0000 1820 0000 0600      ...p... ....

The first four bytes tells me that the record is 112 bytes long. The record ends with the same sequence of bytes. According to sbbx2nc.f the next 32 bytes correspond to eight 32-bit integers. Here is what I have been able to discern:

Value Interpretation
0x00006000 (1536 decimal): Number of months in the next record
0x00000001 (1 decimal): Unknown
0x00000006 (6 decimal): Unknown
0x00000600 (1536 decimal): Number of months in the data set
0x00000608 (1544 decimal): Unknown
0x00000758 (1880 decimal): First month in the data set is January of 1880
0x0000270f (9999 decimal): Missing values are coded as 9999
0xffffd8f1 (4294957297 decimal, quiet NaN floating point): Unknown

The next 80 bytes correspond to the string GHCN V2 TemperatureANOM (C) CR 1200KM 1880-present padded with spaces at the end to make it 80 characters long. After that, 0x00000070 is repeated, and the first observation record starts.

Observations

The first four bytes of an observation record again is its length in bytes. This length excludes the eight bytes that come from the length information before and after the block. So, for TSurf1200, the first observation record is 6176 (0x00001820) bytes. The first 28 of these correspond to seven integers. Here is what I have been able to discern:

Value Interpretation
0x00000600 (1536 in decimal): Number of months in the next record
0x00001910 (6416 in decimal): Southern latitude of gridbox in 0.01 degrees
0x00001997 (6551 in decimal): Northern latitude of gridbox in 0.01 degrees
0xffffb9b0 (-17999 in decimal): Western longitude of gridbox in 0.01 degrees
0xffffbd34 (-17099 in decimal): Eastern longitude of gridbox in 0.01 degrees
0x0000001b (27 in decimal): Unknown
0x000054a0 (21644 in decimal): Unknown
0x42eec619 (119.38691 in floating point): Unknown

Then comes a 1536 element array of 32-bit IEEE754 floating point numbers. I am using Perl 5.8.8 on an Intel laptop and the following is the platform dependent way of converting these sequences of bytes to corresponding floating point numbers:

sub little_endian_int_to_ieee_single_float {
    unpack f => pack H8 => sprintf '%8.8x', unpack V => $_[0];
}

For a while, I used a more portable method outlined in my post on comp.lang.perl.misc but gave up on it when Bob Walton suggested the method above.

There are 8000 such records. Some contain very few months. Some contain 1536. Note that 1536 months from January 1880 spans to December 2007 so it is impossible for any record to actually have 1536 non-missing values.

The following program decodes a file in this format and saves the output in a text file with pipe separated columns. The columns are year, month, south, north, west, east and temperature anomaly in degrees celsius. The temperature anomaly is, I think, deviation of gridbox temperature from its 1950-1980 average.

Script : giss-text.pl
#!/usr/local/bin/perl
use strict;
use warnings;

use File::Slurp;
use File::Spec::Functions qw( catfile );

*int_to_float = \&little_endian_int_to_ieee_single_float;

my @EDGE_LABELS = qw( south north west east );
my @MONTHS      = qw( jan feb mar apr may jun jul aug sep oct nov dec );

my ($input_file) = @ARGV;
die "Missing filename\n" unless defined $input_file;

my ($output_file) = ($input_file =~ /^SBBX\.([^.]+)/ );
$output_file .= '.txt';

open my $input, '<', $input_file
	or die "Cannot open input: '$input_file': $!";
    
binmode $input;

open my $output, '>', $output_file
    or die "Cannot open output: '$output_file': $!";

my $header = read_record( $input );

my @header_keys = qw( start_year max_months months_in_next_record missing_val );
my %time_series = map { $_ => undef } @header_keys;
@time_series{ @header_keys } = (unpack 'N*', substr $header, 0, 32)[5, 3, 0, 6];

print "$_ = $time_series{ $_ }\n" for @header_keys;

my $grid_num = 0;

while ( my $record = read_record( $input ) ) {
    ++ $grid_num;
    write_grid_data( $grid_num => \$record );
    print STDERR "$grid_num." unless $grid_num % 500;
    last if $grid_num == 8000;
}

close $output or die "Cannot close output: '$output_file': $!";

close $input or die "Cannot close input: '$input_file': $!";

print STDERR "\nDone\n";

sub write_grid_data {
    my ($n, $record_ref) = @_;
    my $header_bin = substr($$record_ref, 0, 28);
    my @header = unpack 'N*', $header_bin;

    $time_series{ months_in_this_record } 
        = $time_series{ months_in_next_record };

    $time_series{ months_in_next_record } = $header[0];

    my %edges;
    @edges{ @EDGE_LABELS } = map { int_to_degrees( $_ ) } @header[1 .. 4];

    my $year = $time_series{ start_year };
    my $last_month = $time_series{ months_in_this_record } - 1;

    my @temperatures = map { int_to_float( $_ ) }
                       substr( $$record_ref, 32 ) =~ /(.{4})/gms;

    my $output_buffer;
    for my $month ( 0 .. $last_month ) {
        $output_buffer .= sprintf(
            "%4.4d|%s|%.2f|%.2f|%.2f|%.2f|%.2f\n",
            $year,
            $MONTHS[ $month % 12 ],
            @edges{ @EDGE_LABELS },
            $temperatures[ $month ],
        );
        if ( $month ) {
            ++ $year unless $month % 12;
        }
    }

    print $output $output_buffer;
    return;
}

sub read_record {
    my ($fh) = @_;
    
    my $chunk;
    my $record;
    
    eval {
    	# read record length at the beginning of record
        
    	my $r = read $fh, $chunk, 4;
        die "Read error: $!" unless defined $r;
        my $len1 = unpack 'N', $chunk;
        die "Done\n" unless $len1;
        
        # read record;
        $r = read $fh, $record, $len1;
        die "Read error: $!" unless defined $r;
        
        # read record length at the end of record
        
        $r = read $fh, $chunk, 4;
        die "Read error: $!" unless defined $r;
        my $len2 = unpack 'N', $chunk;
        
        unless ( $len1 == $len2 ) {
        	die "Record length mismatch";
        }
   };
   
   if ( [email protected] ) {
       warn "[email protected]\n";
       return;
   }
   return $record;
}

sub little_endian_int_to_ieee_single_float {
    unpack f => pack H8 => sprintf '%8.8x', unpack V => $_[0];
}

sub int_to_degrees {
    my ($x) = @_;
    return $x/100 unless $x & 0x80000000;
    return ($x - 0xffffffff)/100;
}
__END__

Output and Visualization

My original plan was to import the resulting text file to an embedded SQL database using SQLite but after hours of trying, I could not find a reasonably fast way of getting about 18 million rows of data. This is probably because I am dense, but I gave up after it failed after having run for three and a half hours. Even the generated database at 1.8Gb and containing only 3/4 of all observations was not very easy to work with.

So, instead of altering the original script, I wrote a new one that split the observations into separate text files for each year/month combination (and, in the process, got rid of observations with missing values. Here’s the script:

Script : giss-split-year-month.pl
#!/usr/local/bin/perl

use strict;
use warnings;

use File::Slurp;
use File::Spec::Functions qw( catfile );
use Readonly;

Readonly::Hash my %months => qw( 
    jan 1 feb 2 mar 3 apr  4 may  5 jun  6 
    jul 7 aug 8 sep 9 oct 10 nov 11 dec 12 
);

my @input_files = @ARGV;
die "No input files\n" unless @input_files;

mkdir 'by-date';

for my $input_file ( @input_files ) {
    warn "Processing '$input_file' ...\n";
    eval {
        split_year_month( $input_file );
    };

    warn "[email protected]\n" if [email protected];
}

sub split_year_month {
    my ($input_file) = @_;

    open my $input, '<', $input_file
        or die "Cannot open '$input_file': $!";

    my ($src) = ( $input_file =~ /\A([^.]+)/ );

    my $num;

    while ( my $line = <$input> ) {
        chomp $line;
        my ( $year, $month, $south, $north, $west, $east, $celsius )
            = split /\|/, $line;
        next if $celsius == 9999;
        my $filename = sprintf '%4.4d-%2.2d.txt', $year, $months{ $month };
        append_file ( (catfile 'by-date' => $filename), \ "$line|$src\n" );
        ++ $num;
        print STDERR "$num.." unless $num % 500;
    }

    close $input or die "Cannot close '$input_file': $!";
}
__END__

Now that I had these 1524 files (every month between January 1880 and December 2006) plotting them was easier. To this end, I looked for a equi-rectangular world map. On such a map every degree on the planet corresponds to the same number of pixels on the image independent of the coordinates. I chose to go this way because, frankly, I was exhausted, and I did not want to learn how to do any other, possibly more appropriate, projections. Shortly after posting the original video on Google , I found maps created by Dave Pape and used his etopo-landmask world map as the basis of the animation. I think the resulting image with the solid white background gives a better result both in terms of visualization of anomalies and grids with no data.

Here is the driver script I used for plotting:

Script : giss-timeline-graphs.pl
#!/usr/local/bin/perl

use strict;
use warnings;

use GD;
GD::Image->trueColor( 1 );

use FindBin qw( $Bin );
use File::Find;
use File::Slurp;
use File::Spec::Functions qw( catfile );

use Image::Size;
use Readonly;

use lib qw( C:/Home/asu1/Src/giss/lib );
use My::Plotter;

Readonly::Scalar my $MAP_FILE => catfile( $Bin => 'etopo-landmask.png' );

mkdir 'graph';
mkdir catfile( qw( graph timeline ) );

find( \&plot, 'by-date' );

print STDERR "\nDone\n";

my $files = 0;

sub plot {
    return unless /\A(\d{4})-(\d{2})\.txt\z/;
    my ($year, $month) = ($1, $2);

    my $plotter = My::Plotter->new( $MAP_FILE );

    open my $input, '<', catfile( $Bin, $File::Find::name )
        or die "Cannot open '$File::Find::name': $!";

    while( my $obs = <$input> ) {
        my ( undef, undef, $south, $north, $west, $east, $celsius )
            = split /\|/, $obs;
        $plotter->plot(
            {
                south   => $south,
                north   => $north,
                west    => $west,
                east    => $east,
                celsius => $celsius,
            }
        );
    }

    close $input or die "Cannot close '$File::Find::name': $!";

    my $output_file = catfile( 
        $Bin => graph => timeline => "$year-$month.png" 
    );

    write_file( 
        $output_file, 
        { binmode => ':raw' }, 
        $plotter->to_png($year, $month) 
    );

    ++ $files;

    print STDERR "$files .." unless $files % 25;

    return;
}
__END__

Finally, here is the module that actually does the plotting. Note that I used transparency to make the map outlines show through.

Module : My/Plotter.pm
package My::Plotter;

use strict;
use warnings;

use Carp;
use GD;
use Image::Size;

sub new {
    my $class = shift;

    my ($bg) = @_;
    croak "Can't find background image: '$bg'" unless -f $bg;

    my %self = ( bg => $bg );

    $self{gd} = GD::Image->new( $bg );
    croak "Can't load background image: '$bg'" unless defined $self{gd};

    @self{qw( bg_x bg_y)} = imgsize $self{bg};

    $self{x_factor} = $self{bg_x} / 360;
    $self{x_center} = $self{bg_x} / 2;
    $self{y_factor} = $self{bg_y} / 180;
    $self{y_center} = $self{bg_y} / 2;

    $self{colors} = {
         '5' => $self{gd}->colorAllocateAlpha( 0xff, 0x00, 0x00, 0x50 ) ,
         '4' => $self{gd}->colorAllocateAlpha( 0xff, 0x66, 0x00, 0x50 ) ,
         '3' => $self{gd}->colorAllocateAlpha( 0xff, 0x99, 0x00, 0x50 ) ,
         '2' => $self{gd}->colorAllocateAlpha( 0xff, 0xcc, 0x00, 0x50 ) ,
         '1' => $self{gd}->colorAllocateAlpha( 0xff, 0xff, 0x66, 0x50 ) ,
         '0' => $self{gd}->colorAllocateAlpha( 0xff, 0xff, 0xcc, 0x50 ) ,
        '-1' => $self{gd}->colorAllocateAlpha( 0xcc, 0xff, 0xff, 0x50 ) ,
        '-2' => $self{gd}->colorAllocateAlpha( 0x66, 0xff, 0xff, 0x50 ) ,
        '-3' => $self{gd}->colorAllocateAlpha( 0x00, 0xcc, 0xff, 0x50 ) ,
        '-4' => $self{gd}->colorAllocateAlpha( 0x00, 0x66, 0xff, 0x50 ) ,
        '-5' => $self{gd}->colorAllocateAlpha( 0x00, 0x00, 0xff, 0x50 ) ,
    };

    $self{color_limits} = [ sort { $b <=> $a } keys %{ $self{colors} } ];

    bless \%self => $class;
}

sub plot {
    my $self = shift;
    my %args = %{ $_[0] };
    my @coords = (
        $self->{x_center} + $args{west}  * $self->{x_factor},
        $self->{y_center} - $args{north} * $self->{y_factor},
        $self->{x_center} + $args{east}  * $self->{x_factor},
        $self->{y_center} - $args{south} * $self->{y_factor},
    );

    return $self->{gd}->filledRectangle( 
        @coords, 
        $self->get_color( $args{ celsius } )
    );
}

sub get_color {
    my $self = shift;

    my ($celsius) = @_;
    my @limits = @{ $self->{color_limits} };

    for my $i ( @limits ) {
        return $self->{colors}->{$i} if $celsius > $i;
    }

    return $self->{colors}->{ $limits[-1] };
}

sub to_png {
    my $self = shift;
    my ($year, $month) = @_;
    my $black = $self->{gd}->colorResolve( 0, 0, 0 );
    $self->{gd}->string( gdGiantFont, 600, 340, "$year/$month", $black );

    return $self->{gd}->png;
}

"My::Plotter"
__END__

Here is the the frame for February 1887 reduced in both dimensions by 50%:

[ World Temperature Anomalies, January 1881 ]

The Video

Once I had the sequence of 1524 frames (one frame per month, from January 1880 to December 2006), I used VirtualDub to convert it to an MPEG4 encoded AVI at 4 frames per second (that is, every second in the animation corresponds to four months).

FYI, it takes about 40 minutes to generate all the frames and about two minutes to encode them on a low end laptop. The resulting video file is about 60 Mb.