|
FAQ
|
|
The answers given below to the Frequently Asked Questions are intended to
be brief rather than comprehensive. For more details, see the
documentation provided under Project info.
|
|
|
|
Why the SACA&D project?
|
|
The objective of SACA&D is to combine collation of daily series of
observations at meteorological stations, quality control, analysis of
extremes and dissemination of both the daily data and the analysis
results. Integration of these activities in one project proves to be
essential for success. New versions of the daily dataset will be issued at
regular intervals.
top
|
|
What basic data, series and stations are used?
|
|
The SACA dataset consists of daily station series obtained from
climatological divisions of National Meteorological and Hydrological
Services and station series maintained by observatories and research
centres throughout Southeast Asia. For details of the individual data
providers see the
participants list
. A comprehensive overview of all available data is provided in the
data dictionary
.
The series are quality controlled and flags (“OK”, “suspect” or
“missing”) for individual data are attached. Homogeneity testing has
resulted in classification of series in “useful”, “doubtful” or
“suspect”. Note that these categories only hold for the particular
time intervals for which the tests were applied. It is recommended to use
the results of the homogeneity tests for selecting appropriate series and
time intervals. The series have not been homogenized in the sense that
values are changed.
top
|
|
Why more than one definition for min., mean & max. temperature, etc.?
|
|
Different countries estimate daily average temperatures using different
methods and formulae. Also, the time intervals for observing minimum and
maximum temperature differ and so does the time interval for 24h
accumulated rainfall. Each series is therefore labeled with the
appropriate element id.
top
|
|
What does blend and update mean?
|
|
The series collected from participating countries generally do not contain
data for the most recent years. This is partly due to the time that is
needed for data quality control and archiving at the home institutions of
the participants, and partly the result of the efforts required to include
the data in the SACA database. To make available for each station a time
series that is as complete as possible, we have included an automated
update procedure that relies on the daily data from SYNOP messages that
are distributed in near real time over the Global Telecommunication System
(GTS). In this procedure the gaps in a daily series are also infilled with
observations from nearby stations, provided that they are within 25km
distance and that height differences are less than 50m.
The download options under
daily data
allow to select
Blend and update
= Yes or No. In case a blended series is chosen, information on the
underlying series that are used in the blending process is provided.
Note that only the
blended
series are further analysed in SACA&D.
top
|
|
Why doesn't SACA&D use the WMO station numbers as id's?
|
|
WMO station numbers are not used as unique identifier for the daily SACA
series, because not all stations with data have assigned WMO numbers.
top
|
|
What quality control and homogeneity procedures are applied?
|
|
Series of the best possible quality are provided for SACA&D by the
participating institutions. In addition, common quality control procedures
are applied to all series using various algorithms (see Project info >
ATBD
). These quality control procedures lead to flags (“OK”, “suspect”
or “missing”) assigned to individual data.
Although data validation has been careful, it can never be excluded that
some errors remain undetected. The risk for such errors is greatest in the
recent data that stem from synoptical messages, because these data did not
undergo the validation process in the participating institutions.
Apart from errors at individual days, changes in observation practices may
have introduced inhomogeneities of non-climatic origin in long time
series. These inhomogeneities may severely affect the assessment of
changes in extremes. For evaluation of the homogeneity of the time series
in SACA&D a two step testing procedure was followed (see Project info >
ATBD
). First, four common homogeneity tests were applied to evaluate the daily
series in fixed time periods using the testing variables: (1) the annual
mean of the diurnal temperature range DTR ( = maximum temperature -
minimum temperature), (2) the annual mean of the absolute day-to-day
differences of the diurnal temperature range vDTR and (3) the wet day
count RR1 (threshold 1 mm). Second, the test results were condensed for
each series into three classes: useful-doubtful-suspect. The four common
homogeneity tests are: Standard Normal Homogeneity test, BuisHand Range
test, PETtitt test and von NEUmann ratio test.
Note that the above homogeneity analysis is subject to further research,
as there is no well established testing procedure for daily data. Also, an
open question is how to apply the test results. This is dependent on the
particular application. For the indices of extremes analysed in SACA&D we
have choosen to present trend results only for the series that are useful
or doubtful, but in other cases other choices may be made (see e.g. the
publications section). There is a clear need for additional research on
techniques for homogenisation of daily data in order to create high
quality daily datasets for the assessment of extremes without abandoning
entire series or throwing out real extremes. This is of particular
importance in areas where the density of stations with long daily data
series is already low.
top
|
|
Why are values slightly different from the file that I downloaded earlier?
|
|
All the files on this website are frequently updated to include the latest
available observations. Updating includes not only adding the most recent
data, but also the inclusion of any late reports of earlier dates. In
addition, the older series may have changed, because of improved data
quality control or data archaeology by the data providing institutions.
top
|
|
How to obtain daily data that are not available for public download at
this website?
|
|
The SACA&D website makes available all daily series for which the
conditions of use do allow publication. For some stations, we are only
allowed to use the daily series for the analysis of extremes within the
SACA&D project without releasing them. These stations do appear in the
data dictionary and the
indices section
of the website as well as in the
publications
, but they are absent from the
daily data section
. Please direct your inquiries to obtain these data directly to the NMHS
of the respective country.
top
|
|
How are the smoothed lines in the indices plots calculated?
|
|
The red smoothed line in the plots is calculated using the lowess smoother
function with parameters: f=1/5, iter=3, using Fortran open-source code
from
wsc@research.bell-labs.com
, W. S. Cleveland, Bell Laboratories, Murray Hill NJ 07974.
References:
Cleveland, W.S. (1979). Robust locally weighted regression and smoothing
scatterplots. J.Amer.Statist.Assoc., 74, 829-836.
Cleveland, W.S. (1981). LOWESS: A program for smoothing scatterplots by
robust locally weighted regression. The American Statistician, 35, 54.
top
|
|
What procedure is used to calculate the trends?
|
|
Trends are calculated by calculating a least-squares optimal linear fit
using NAG's E02ADF routine.
References:
Numerical Algorithms Group website
references in the NAG Fortran Library Routine Document E02ADF
top
|
|
Why do some stations not appear on the trend map, although a time series
plot is available?
|
|
For a trend value to be calculated, that station must hold valid index
data for at least 80% of the period for which the trend is calculated. For
example, for a trend period 1901-1999 (99 years), at least 80 years must
have valid data. Also, the homogeneity test result for the underlying
series must be 'useful' or 'doubtful' for this period. If the test result
is 'suspect' or less than 80% of the trend period holds valid index data,
the trend for that station is not calculated and therefore not plotted on
the trend map. A time series plot is produced if
any
valid index data is available for the station in question, with the only
restriction that index values for an individual year are only calculated
if no more than 3% of the days are missing.
top
|
|
Why do some values differ from the values I obtained from the NMHSs?
|
|
SACA makes use of two kind of data sources: data that are issued by the
national meteorological offices or other participants (the so called
participant data) and data from synoptical messages. The difference
between these two kinds of data is that data from the participants is
generally validated, whereas synoptical data is not validated. In SACA&D
synoptical messages are temporarily used to extend data series, to make
the series as actual as possible. But as soon as participant data become
available, the synoptical data are replaced.
Non-validated synoptical data can be distinguished from validated
participant data by the first figure from the source ID (SOUID) given in
each data file: a source starting with
9
represents non-validated synoptical data, whereas a source starting with
1
indicates validated participant data.
top
|