GSODR

Adam H. Sparks

Introduction

The GSOD or Global Surface Summary of the Day (GSOD) data provided by the US National Centers for Environmental Information (NCEI) are a valuable source of weather data with global coverage. However, the data files are cumbersome and difficult to work with. {GSODR} aims to make it easy to find, transfer and format the data you need for use in analysis and provides six main functions for facilitating this:

When reformatting data either with get_GSOD() or reformat_GSOD(), all units are converted from United States Customary System (USCS) to International System of Units (SI), e.g., inches to millimetres and Fahrenheit to Celsius. Data in the R session summarise each year by station, which also includes vapour pressure and relative humidity elements calculated from existing data in GSOD.

For more information see the description of the data provided by NCEI, https://www.ncei.noaa.gov/data/global-summary-of-the-day/doc/readme.txt.

Using get_GSOD()

Find Stations in or near Toowoomba, Queensland, Australia

{GSODR} provides lists of weather station locations and elevation values. It’s easy to find all stations in Australia.

library("GSODR")

load(system.file("extdata", "isd_history.rda", package = "GSODR"))

# create data.frame for Australia only
Oz <- subset(isd_history, COUNTRY_NAME == "AUSTRALIA")

Oz
## Key: <STNID>
##             STNID             NAME     LAT     LON ELEV(M)   CTRY  STATE
##            <char>           <char>   <num>   <num>   <num> <char> <char>
##   1: 110010-99999         WOLFSEGG  48.100  13.667   615.6     AU       
##   2: 110030-99999 RIED IM INNKREIS  48.217  13.483   443.0     AU       
##   3: 110050-99999 SCHAERDING/SUBEN  48.400  13.433   330.0     AU       
##   4: 110080-99999         ROHRBACH  48.567  14.000   602.0     AU       
##   5: 110090-99999    WELS/FLUGFELD  48.183  14.033   318.0     AU       
##  ---                                                                    
## 272: 958070-99999   KINGSTONE AERO -35.717 137.517     6.0     AU       
## 273: 958150-99999          MUNKORA -36.100 140.317    28.0     AU       
## 274: 958230-99999  PADTHAWAY SOUTH -36.650 140.517    35.0     AU       
## 275: 958310-99999 WALPEUP RESEARCH -35.117 142.000   105.0     AU       
## 276: 958450-99999    MT GELLIBRAND -38.233 143.783   262.0     AU       
##         BEGIN      END COUNTRY_NAME  ISO2C  ISO3C
##         <int>    <int>       <char> <char> <char>
##   1: 19730715 20250726    AUSTRALIA     AU    AUS
##   2: 19520103 19971225    AUSTRALIA     AU    AUS
##   3: 20010807 20010807    AUSTRALIA     AU    AUS
##   4: 19761101 20250726    AUSTRALIA     AU    AUS
##   5: 19340502 20050814    AUSTRALIA     AU    AUS
##  ---                                             
## 272: 19970101 20250726    AUSTRALIA     AU    AUS
## 273: 20030401 20250726    AUSTRALIA     AU    AUS
## 274: 20030401 20250726    AUSTRALIA     AU    AUS
## 275: 20010915 20250726    AUSTRALIA     AU    AUS
## 276: 20010918 20250726    AUSTRALIA     AU    AUS
# Look for a specific town in Australia
subset(Oz, grepl("TOOWOOMBA", NAME))
## Empty data.table (0 rows and 12 cols): STNID,NAME,LAT,LON,ELEV(M),CTRY...

Download a Single Station and Year Using get_GSOD()

Now that we’ve seen where the reporting stations are located, we can download weather data from the station Toowoomba, Queensland, Australia for 2010 by using the STNID in the station parameter of get_GSOD().

tbar <- get_GSOD(years = 2010, station = "955510-99999")
str(tbar)
## Classes 'data.table' and 'data.frame':   365 obs. of  47 variables:
##  $ STNID           : chr  "955510-99999" "955510-99999" "955510-99999" "955510-99999" ...
##  $ NAME            : chr  "TOOWOOMBA AIRPORT" "TOOWOOMBA AIRPORT" "TOOWOOMBA AIRPORT" "TOOWOOMBA AIRPORT" ...
##  $ CTRY            : chr  "AS" "AS" "AS" "AS" ...
##  $ COUNTRY_NAME    : chr  "AMERICAN SAMOA" "AMERICAN SAMOA" "AMERICAN SAMOA" "AMERICAN SAMOA" ...
##  $ ISO2C           : chr  "AS" "AS" "AS" "AS" ...
##  $ ISO3C           : chr  "ASM" "ASM" "ASM" "ASM" ...
##  $ STATE           : chr  "" "" "" "" ...
##  $ LATITUDE        : num  -27.6 -27.6 -27.6 -27.6 -27.6 ...
##  $ LONGITUDE       : num  152 152 152 152 152 ...
##  $ ELEVATION       : num  642 642 642 642 642 642 642 642 642 642 ...
##  $ BEGIN           : int  19980301 19980301 19980301 19980301 19980301 19980301 19980301 19980301 19980301 19980301 ...
##  $ END             : int  20250726 20250726 20250726 20250726 20250726 20250726 20250726 20250726 20250726 20250726 ...
##  $ YEARMODA        : Date, format: "2010-01-01" "2010-01-02" ...
##  $ YEAR            : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ MONTH           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ DAY             : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ YDAY            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ TEMP            : num  21.2 23.2 21.4 18.9 20.5 21.9 21.3 20.9 21.9 22.3 ...
##  $ TEMP_ATTRIBUTES : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ DEWP            : num  17.9 19.4 18.9 16.4 16.4 18.7 17.4 17.1 16.2 14.9 ...
##  $ DEWP_ATTRIBUTES : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ SLP             : num  1013 1010 1012 1016 1016 ...
##  $ SLP_ATTRIBUTES  : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ STP             : num  942 939 941 944 944 ...
##  $ STP_ATTRIBUTES  : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ VISIB           : num  NA NA 14.3 23.3 NA NA NA NA NA NA ...
##  $ VISIB_ATTRIBUTES: int  0 0 6 4 0 0 0 0 0 0 ...
##  $ WDSP            : num  4.3 3.7 7.6 8.7 7.5 6.3 7.8 7.5 6.8 6.3 ...
##  $ WDSP_ATTRIBUTES : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ MXSPD           : num  6.7 5.1 10.3 10.3 10.8 7.7 8.7 8.7 8.2 7.2 ...
##  $ GUST            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MAX             : num  25.8 26.5 28.7 24.1 24.6 26.8 26.1 26.5 27.4 28.7 ...
##  $ MAX_ATTRIBUTES  : chr  NA NA NA NA ...
##  $ MIN             : num  17.8 19.1 19.3 16.9 16.7 17.5 19.1 18.5 17.8 17.7 ...
##  $ MIN_ATTRIBUTES  : chr  NA NA "*" "*" ...
##  $ PRCP            : num  1.52 0.25 19.81 1.02 0.25 ...
##  $ PRCP_ATTRIBUTES : chr  "G" "G" "G" "G" ...
##  $ SNDP            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ I_FOG           : num  0 0 1 0 0 1 1 0 1 1 ...
##  $ I_RAIN_DRIZZLE  : num  0 0 1 0 0 0 0 0 0 0 ...
##  $ I_SNOW_ICE      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_HAIL          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_THUNDER       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_TORNADO_FUNNEL: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EA              : num  2 2.2 2.2 1.9 1.9 2.2 2 1.9 1.8 1.7 ...
##  $ ES              : num  2.5 2.8 2.5 2.2 2.4 2.6 2.5 2.5 2.6 2.7 ...
##  $ RH              : num  81.5 79.2 85.7 85.4 77.3 82.1 78.5 78.9 70.1 62.9 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Using nearest_stations() to Download Multiple Stations at Once

Using the nearest_stations() function, you can find stations closest to a given point specified by latitude and longitude in decimal degrees. This can be used to generate a vector to pass along to get_GSOD() and download the stations of interest.

Warning messages will be generated as not all stations have data for the requested year.

tbar_stations <- nearest_stations(LAT = -27.5598,
                                  LON = 151.9507,
                                  distance = 50)$STNID

tbar <- get_GSOD(years = 2010, station = tbar_stations)
## Warning: 
## This station, 945510-99999, only provides data for years 1956 to 1997.
## Please send a request that falls within these years.
## Warning: 
## This station, 949999-00170, only provides data for years 1971 to 1984.
## Please send a request that falls within these years.
## Warning: 
## This station, 949999-00183, only provides data for years 1983 to 1984.
## Please send a request that falls within these years.
str(tbar)
## Classes 'data.table' and 'data.frame':   1095 obs. of  47 variables:
##  $ STNID           : chr  "945520-99999" "945520-99999" "945520-99999" "945520-99999" ...
##  $ NAME            : chr  "OAKEY" "OAKEY" "OAKEY" "OAKEY" ...
##  $ CTRY            : chr  "AS" "AS" "AS" "AS" ...
##  $ COUNTRY_NAME    : chr  "AMERICAN SAMOA" "AMERICAN SAMOA" "AMERICAN SAMOA" "AMERICAN SAMOA" ...
##  $ ISO2C           : chr  "AS" "AS" "AS" "AS" ...
##  $ ISO3C           : chr  "ASM" "ASM" "ASM" "ASM" ...
##  $ STATE           : chr  "" "" "" "" ...
##  $ LATITUDE        : num  -27.4 -27.4 -27.4 -27.4 -27.4 ...
##  $ LONGITUDE       : num  152 152 152 152 152 ...
##  $ ELEVATION       : num  407 407 407 407 407 ...
##  $ BEGIN           : int  19730430 19730430 19730430 19730430 19730430 19730430 19730430 19730430 19730430 19730430 ...
##  $ END             : int  20250726 20250726 20250726 20250726 20250726 20250726 20250726 20250726 20250726 20250726 ...
##  $ YEARMODA        : Date, format: "2010-01-01" "2010-01-02" ...
##  $ YEAR            : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ MONTH           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ DAY             : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ YDAY            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ TEMP            : num  23.4 26.2 24.5 21.6 22.6 24.7 24 23.3 24.4 25.1 ...
##  $ TEMP_ATTRIBUTES : int  16 16 16 16 16 16 16 16 16 16 ...
##  $ DEWP            : num  18.4 19.4 19.4 16.8 16.9 18.7 17.1 17.1 15.7 13.6 ...
##  $ DEWP_ATTRIBUTES : int  16 16 16 16 16 16 16 16 16 16 ...
##  $ SLP             : num  1012 1009 1011 1015 1015 ...
##  $ SLP_ATTRIBUTES  : int  16 16 16 16 16 16 16 16 16 16 ...
##  $ STP             : num  967 964 966 969 969 ...
##  $ STP_ATTRIBUTES  : int  16 16 16 16 16 16 16 16 16 16 ...
##  $ VISIB           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ VISIB_ATTRIBUTES: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ WDSP            : num  4.3 4.1 6.1 7.5 4.4 4.3 5.8 6.2 5.6 4.5 ...
##  $ WDSP_ATTRIBUTES : int  16 16 16 16 16 16 16 16 16 16 ...
##  $ MXSPD           : num  7.2 6.2 8.7 9.8 7.7 6.2 8.2 9.3 7.7 7.2 ...
##  $ GUST            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MAX             : num  28.5 31.2 33.6 27.1 27.8 30.4 30 30.5 31.9 33.2 ...
##  $ MAX_ATTRIBUTES  : chr  NA NA NA NA ...
##  $ MIN             : num  19.5 20.5 21.3 18.8 18.4 18.6 20.6 18.6 17.2 16.2 ...
##  $ MIN_ATTRIBUTES  : chr  NA NA "*" "*" ...
##  $ PRCP            : num  0.51 0 3.3 0 0 0 0 0.25 0 0 ...
##  $ PRCP_ATTRIBUTES : chr  "G" "G" "G" "G" ...
##  $ SNDP            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ I_FOG           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_RAIN_DRIZZLE  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_SNOW_ICE      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_HAIL          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_THUNDER       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_TORNADO_FUNNEL: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EA              : num  2.1 2.2 2.2 1.9 1.9 2.2 1.9 1.9 1.8 1.6 ...
##  $ ES              : num  2.9 3.4 3.1 2.6 2.7 3.1 3 2.9 3.1 3.2 ...
##  $ RH              : num  73.5 66.2 73.3 74.2 70.2 69.3 65.3 68.2 58.4 48.9 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Plot Maximum and Minimum Temperature Values

Using the first data downloaded for a single station, 955510-99999, plot the temperature for 2010.

library("ggplot2")
library("tidyr")

# Create a dataframe of just the date and temperature values that we want to
# plot
tbar_temps <- tbar[, c("YEARMODA", "TEMP", "MAX", "MIN")]

# Gather the data from wide to long
tbar_temps <-
  pivot_longer(tbar_temps, cols = TEMP:MIN, names_to = "Measurement")

ggplot(data = tbar_temps, aes(x = YEARMODA,
                              y = value,
                              colour = Measurement)) +
  geom_line() +
  scale_color_brewer(type = "qual", na.value = "black") +
  scale_y_continuous(name = "Temperature") +
  scale_x_date(name = "Date") +
  ggtitle(label = "Max, min and mean temperatures for Toowoomba, Qld, AU",
          subtitle = "Data: U.S. NCEI GSOD") +
  theme_classic()
plot of chunk Ex5

plot of chunk Ex5

Using reformat_GSOD()

You may have already downloaded GSOD data or may just wish to use your browser to download the files from the server to you local disk and not use the capabilities of get_GSOD(). In that case the reformat_GSOD() function is useful.

There are two ways, you can either provide reformat_GSOD() with a list of specified station files or you can supply it with a directory containing all of the “STATION.csv” station files or “YEAR.zip” annual files that you wish to reformat.

Note Any .csv file provided to reformat_GSOD() will be imported, if it is not a GSOD data file, this will lead to an error. Make sure the directory and file lists are clean.

Reformat a List of Local Files

In this example two STATION.csv files are in subdirectories of user’s home directory and are listed for reformatting as a string.

y <- c("~/GSOD/gsod_1960/20049099999.csv",
       "~/GSOD/gsod_1961/20049099999.csv")
x <- reformat_GSOD(file_list = y)

Reformat all Local Files Found in Directory

In this example all STATION.csv files in the sub-folder GSOD/gsod_1960 will be imported and reformatted.

x <- reformat_GSOD(dsn = "~/GSOD/gsod_1960")

Using get_updates()

{GSODR} provides a function, get_updates(), to retrieve the changelog for the GSOD data and return it in order from newest to oldest changes to the data set.

Following is an example how to use this function.

{r Ex17, eval=TRUE, message=FALSE}' get_updates()

Using get_inventory()

{GSODR} provides a function, get_inventory() to retrieve an inventory of the number of weather observations by station-year-month for the beginning of record through to current.

Following is an example of how to retrieve the inventory and check a station in Toowoomba, Queensland, Australia, which was used in an earlier example.

inventory <- get_inventory()

inventory
##   *** FEDERAL CLIMATE COMPLEX INTEGRATED SURFACE DATA INVENTORY ***  
##    This inventory provides the number of weather observations by  
##    STATION-YEAR-MONTH for beginning of record through July 2025  
## Key: <STNID>
##                STNID                NAME    LAT    LON ELEV(M)   CTRY  STATE
##               <char>              <char>  <num>  <num>   <num> <char> <char>
##      1: 008415-99999                <NA>     NA     NA      NA   <NA>   <NA>
##      2: 010010-99999 JAN MAYEN(NOR-NAVY) 70.933 -8.667       9     NO       
##      3: 010010-99999 JAN MAYEN(NOR-NAVY) 70.933 -8.667       9     NO       
##      4: 010010-99999 JAN MAYEN(NOR-NAVY) 70.933 -8.667       9     NO       
##      5: 010010-99999 JAN MAYEN(NOR-NAVY) 70.933 -8.667       9     NO       
##     ---                                                                     
## 154815:   A51256-451                <NA>     NA     NA      NA   <NA>   <NA>
## 154816:   A51256-451                <NA>     NA     NA      NA   <NA>   <NA>
## 154817:   A51256-451                <NA>     NA     NA      NA   <NA>   <NA>
## 154818:   A51256-451                <NA>     NA     NA      NA   <NA>   <NA>
## 154819:   A51256-451                <NA>     NA     NA      NA   <NA>   <NA>
##            BEGIN      END COUNTRY_NAME  ISO2C  ISO3C  YEAR   JAN   FEB   MAR
##            <int>    <int>       <char> <char> <char> <int> <int> <int> <int>
##      1:       NA       NA         <NA>   <NA>   <NA>  2020     0     0    14
##      2: 19310101 20250726       NORWAY     NO    NOR  2020   736   695   744
##      3: 19310101 20250726       NORWAY     NO    NOR  2021   686   562   729
##      4: 19310101 20250726       NORWAY     NO    NOR  2022   549   513   292
##      5: 19310101 20250726       NORWAY     NO    NOR  2023   738   657   715
##     ---                                                                     
## 154815:       NA       NA         <NA>   <NA>   <NA>  2021  2085  1992  2217
## 154816:       NA       NA         <NA>   <NA>   <NA>  2022  2203  1937  2204
## 154817:       NA       NA         <NA>   <NA>   <NA>  2023  2006  1988  2172
## 154818:       NA       NA         <NA>   <NA>   <NA>  2024  2223  1956  2215
## 154819:       NA       NA         <NA>   <NA>   <NA>  2025  2179  1986  2016
##           APR   MAY   JUN   JUL   AUG   SEP   OCT   NOV   DEC
##         <int> <int> <int> <int> <int> <int> <int> <int> <int>
##      1:     0     0     0     0     0     0     0     0     0
##      2:   717   744   718   743   742   718   694   708   740
##      3:   710   733   654   726   717   712   737   714   630
##      4:    98     0     0   137     0   292   709   708   724
##      5:   713   735   666   735   726   693   729   698   741
##     ---                                                      
## 154815:  1975  2146  2092  2227  2170  2080  2163  2120  2168
## 154816:  2144  2218  2119  2224  2209  2137  1743  2126  2201
## 154817:  1993  2063  2088  2189  2182  2147  2199  2120  2197
## 154818:  2152  2221  2004  2210  2124  1977  2104  1885  2165
## 154819:  2133  2213  2109  1833     0     0     0     0     0
subset(inventory, STNID %in% "955510-99999")
##   *** FEDERAL CLIMATE COMPLEX INTEGRATED SURFACE DATA INVENTORY ***  
##    This inventory provides the number of weather observations by  
##    STATION-YEAR-MONTH for beginning of record through July 2025  
## Key: <STNID>
##           STNID              NAME    LAT     LON ELEV(M)   CTRY  STATE    BEGIN
##          <char>            <char>  <num>   <num>   <num> <char> <char>    <int>
## 1: 955510-99999 TOOWOOMBA AIRPORT -27.55 151.917     642     AS        19980301
## 2: 955510-99999 TOOWOOMBA AIRPORT -27.55 151.917     642     AS        19980301
## 3: 955510-99999 TOOWOOMBA AIRPORT -27.55 151.917     642     AS        19980301
## 4: 955510-99999 TOOWOOMBA AIRPORT -27.55 151.917     642     AS        19980301
## 5: 955510-99999 TOOWOOMBA AIRPORT -27.55 151.917     642     AS        19980301
## 6: 955510-99999 TOOWOOMBA AIRPORT -27.55 151.917     642     AS        19980301
##         END   COUNTRY_NAME  ISO2C  ISO3C  YEAR   JAN   FEB   MAR   APR   MAY
##       <int>         <char> <char> <char> <int> <int> <int> <int> <int> <int>
## 1: 20250726 AMERICAN SAMOA     AS    ASM  2020   246   232   248   238   248
## 2: 20250726 AMERICAN SAMOA     AS    ASM  2021   485   483   742   720   743
## 3: 20250726 AMERICAN SAMOA     AS    ASM  2022   743   672   739   716   739
## 4: 20250726 AMERICAN SAMOA     AS    ASM  2023   738   663   730   715   737
## 5: 20250726 AMERICAN SAMOA     AS    ASM  2024   741   691   626   662   714
## 6: 20250726 AMERICAN SAMOA     AS    ASM  2025   737   650   741   711   732
##      JUN   JUL   AUG   SEP   OCT   NOV   DEC
##    <int> <int> <int> <int> <int> <int> <int>
## 1:   348   493   492   480   496   475   496
## 2:   716   744   737   719   744   720   726
## 3:   716   728   742   716   726   713   726
## 4:   701   733   729   700   730   710   744
## 5:   703   719   707   708   743   619   711
## 6:   706   619     0     0     0     0     0

Using update_internal_isd_history()

{GSODR} uses internal databases of station data from the NCEI to provide location and other metadata, e.g. elevation, station names, WMO codes, etc. to make the process of querying for weather data faster. This database is created and packaged with {GSODR} for distribution and is updated with new releases. Users have the option of updating these databases after installing GSODR. While this option gives the users the ability to keep the database up-to-date and gives GSODR’s authors flexibility in maintaining it, this also means that reproducibility may be affected since the same version of {GSODR} may have different databases on different machines. If reproducibility is necessary, care should be taken to ensure that the version of the databases is the same across different machines.

The database file isd_history.rda can be located on your local system by using the following command, paste0(.libPaths(), "/GSODR/extdata")[1], unless you have specified another location for library installations and installed {GSODR} there, in which case it would still be in GSODR/extdata.

To update GSODR’s internal database of station locations simply use update_station_list(), which will update the internal station database according to the latest data available from the NCEI.

update_internal_isd_history()

Notes

WMO Resolution 40. NOAA Policy

The data summaries provided here are based on data exchanged under the World Meteorological Organization (WMO) World Weather Watch Program according to WMO Resolution 40 (Cg-XII). This allows WMO member countries to place restrictions on the use or re-export of their data for commercial purposes outside of the receiving country. Data for selected countries may, at times, not be available through this system. Those countries’ data summaries and products which are available here are intended for free and unrestricted use in research, education, and other non-commercial activities. However, for non-U.S. locations’ data, the data or any derived product shall not be provided to other users or be used for the re-export of commercial services.

Appendices

Appendix 1: GSODR Final Data Format, Contents and Units

{GSODR} formatted data include the following fields and units:

Appendix 2: Map of Current GSOD Station Locations

GSOD Station Locations. Data comes from US NCEI GSOD and CIA World DataBank II

GSOD Station Locations. Data comes from US NCEI GSOD and CIA World DataBank II

References

Alduchov, Oleg A., and Robert E. Eskridge. 1996. “Improved Magnus Form Approximation of Saturation Vapor Pressure.” Journal of Applied Meteorology 35 (4): 601–9.