Frequently Asked Questions

	FAQ
	The answers given below to the Frequently Asked Questions are intended to be brief rather than comprehensive. For more details, see the documentation provided under Project info.
	Why the SACA&D project? What basic data, series and stations are used? Why more than one definition for min., mean & max. temperature, etc.? What does blend and update mean? Why doesn't SACA use the WMO station numbers as id's? What quality control and homogeneity procedures are applied? Why are values slightly different from the file that I downloaded earlier? How to obtain daily data that are not available for public download at this website? How are the smoothed lines in the indices plots calculated? What procedure is used to calculate the trends? Why do some stations not appear on the trend map, although a time series plot is available? Why do some values differ from the values I obtained from the national meteo office(s)?
	Why the SACA&D project?
	The objective of SACA&D is to combine collation of daily series of observations at meteorological stations, quality control, analysis of extremes and dissemination of both the daily data and the analysis results. Integration of these activities in one project proves to be essential for success. New versions of the daily dataset will be issued at regular intervals. top
	What basic data, series and stations are used?
	The SACA dataset consists of daily station series obtained from climatological divisions of National Meteorological and Hydrological Services and station series maintained by observatories and research centres throughout Southeast Asia. For details of the individual data providers see the participants list . A comprehensive overview of all available data is provided in the data dictionary . The series are quality controlled and flags (“OK”, “suspect” or “missing”) for individual data are attached. Homogeneity testing has resulted in classification of series in “useful”, “doubtful” or “suspect”. Note that these categories only hold for the particular time intervals for which the tests were applied. It is recommended to use the results of the homogeneity tests for selecting appropriate series and time intervals. The series have not been homogenized in the sense that values are changed. top
	Why more than one definition for min., mean & max. temperature, etc.?
	Different countries estimate daily average temperatures using different methods and formulae. Also, the time intervals for observing minimum and maximum temperature differ and so does the time interval for 24h accumulated rainfall. Each series is therefore labeled with the appropriate element id. top
	What does blend and update mean?
	The series collected from participating countries generally do not contain data for the most recent years. This is partly due to the time that is needed for data quality control and archiving at the home institutions of the participants, and partly the result of the efforts required to include the data in the SACA database. To make available for each station a time series that is as complete as possible, we have included an automated update procedure that relies on the daily data from SYNOP messages that are distributed in near real time over the Global Telecommunication System (GTS). In this procedure the gaps in a daily series are also infilled with observations from nearby stations, provided that they are within 25km distance and that height differences are less than 50m. The download options under daily data allow to select Blend and update = Yes or No. In case a blended series is chosen, information on the underlying series that are used in the blending process is provided. Note that only the blended series are further analysed in SACA&D. top
	Why doesn't SACA&D use the WMO station numbers as id's?
	WMO station numbers are not used as unique identifier for the daily SACA series, because not all stations with data have assigned WMO numbers. top
	What quality control and homogeneity procedures are applied?
	Series of the best possible quality are provided for SACA&D by the participating institutions. In addition, common quality control procedures are applied to all series using various algorithms (see Project info > ATBD ). These quality control procedures lead to flags (“OK”, “suspect” or “missing”) assigned to individual data. Although data validation has been careful, it can never be excluded that some errors remain undetected. The risk for such errors is greatest in the recent data that stem from synoptical messages, because these data did not undergo the validation process in the participating institutions. Apart from errors at individual days, changes in observation practices may have introduced inhomogeneities of non-climatic origin in long time series. These inhomogeneities may severely affect the assessment of changes in extremes. For evaluation of the homogeneity of the time series in SACA&D a two step testing procedure was followed (see Project info > ATBD ). First, four common homogeneity tests were applied to evaluate the daily series in fixed time periods using the testing variables: (1) the annual mean of the diurnal temperature range DTR ( = maximum temperature - minimum temperature), (2) the annual mean of the absolute day-to-day differences of the diurnal temperature range vDTR and (3) the wet day count RR1 (threshold 1 mm). Second, the test results were condensed for each series into three classes: useful-doubtful-suspect. The four common homogeneity tests are: Standard Normal Homogeneity test, BuisHand Range test, PETtitt test and von NEUmann ratio test. Note that the above homogeneity analysis is subject to further research, as there is no well established testing procedure for daily data. Also, an open question is how to apply the test results. This is dependent on the particular application. For the indices of extremes analysed in SACA&D we have choosen to present trend results only for the series that are useful or doubtful, but in other cases other choices may be made (see e.g. the publications section). There is a clear need for additional research on techniques for homogenisation of daily data in order to create high quality daily datasets for the assessment of extremes without abandoning entire series or throwing out real extremes. This is of particular importance in areas where the density of stations with long daily data series is already low. top
	Why are values slightly different from the file that I downloaded earlier?
	All the files on this website are frequently updated to include the latest available observations. Updating includes not only adding the most recent data, but also the inclusion of any late reports of earlier dates. In addition, the older series may have changed, because of improved data quality control or data archaeology by the data providing institutions. top
	How to obtain daily data that are not available for public download at this website?
	The SACA&D website makes available all daily series for which the conditions of use do allow publication. For some stations, we are only allowed to use the daily series for the analysis of extremes within the SACA&D project without releasing them. These stations do appear in the data dictionary and the indices section of the website as well as in the publications , but they are absent from the daily data section . Please direct your inquiries to obtain these data directly to the NMHS of the respective country. top
	How are the smoothed lines in the indices plots calculated?
	The red smoothed line in the plots is calculated using the lowess smoother function with parameters: f=1/5, iter=3, using Fortran open-source code from wsc@research.bell-labs.com , W. S. Cleveland, Bell Laboratories, Murray Hill NJ 07974. References: Cleveland, W.S. (1979). Robust locally weighted regression and smoothing scatterplots. J.Amer.Statist.Assoc., 74, 829-836. Cleveland, W.S. (1981). LOWESS: A program for smoothing scatterplots by robust locally weighted regression. The American Statistician, 35, 54. top
	What procedure is used to calculate the trends?
	Trends are calculated by calculating a least-squares optimal linear fit using NAG's E02ADF routine. References: Numerical Algorithms Group website references in the NAG Fortran Library Routine Document E02ADF top
	Why do some stations not appear on the trend map, although a time series plot is available?
	For a trend value to be calculated, that station must hold valid index data for at least 80% of the period for which the trend is calculated. For example, for a trend period 1901-1999 (99 years), at least 80 years must have valid data. Also, the homogeneity test result for the underlying series must be 'useful' or 'doubtful' for this period. If the test result is 'suspect' or less than 80% of the trend period holds valid index data, the trend for that station is not calculated and therefore not plotted on the trend map. A time series plot is produced if any valid index data is available for the station in question, with the only restriction that index values for an individual year are only calculated if no more than 3% of the days are missing. top
	Why do some values differ from the values I obtained from the NMHSs?
	SACA makes use of two kind of data sources: data that are issued by the national meteorological offices or other participants (the so called participant data) and data from synoptical messages. The difference between these two kinds of data is that data from the participants is generally validated, whereas synoptical data is not validated. In SACA&D synoptical messages are temporarily used to extend data series, to make the series as actual as possible. But as soon as participant data become available, the synoptical data are replaced. Non-validated synoptical data can be distinguished from validated participant data by the first figure from the source ID (SOUID) given in each data file: a source starting with 9 represents non-validated synoptical data, whereas a source starting with 1 indicates validated participant data. top