Data required for quality control ================================= Raw data observed at meteorological stations may contain error data by instrument failure or during the process of recording or digging up the data. Error data or data that requires attention is as follows. .. image:: SRC/qc.jpg :width: 500 The most frequently occuring data are outliers. This includes missing data and data that was output with misligned digit when digging up the data from tape. Remove outliers --------------- **[Step 1]** An easy way to remove outliers is to set up a data range and replace them with the missing values. For example, we set the temperature range from -30 C to +30 oC for beavers and -40 C to +45 oC for other stations. This method, however, does not remove outliers in the range. Therefore, another (or additional) way to remove outliers is to identify the data from the 5 sigma standard deviation of the mean. **[Step 2]** Calculate the 3-hour mean and 10-day standard deviation, and remove data that exceeds the +/- 5 sigma standard deviation from the mean. Before (left) and after (right) quality control in Step 2. Removed the orange outliers that deviated from the -5 sigma standard deviation (green line). .. image:: SRC/TAIRall.5min.sample.pre.2015-10.png :width: 300 .. image:: SRC/TAIRall.5min.sample.2015-10.png :width: 300 Sample program ~~~~~~~~~~~~~~ This program calculates the mean and standard deviation and replace outliers out of the criteria (+/- 5 sigma standard deviation from mean) with the missing value. .. literalinclude:: programs/DQC_std.sampleprogram.py :language: python :lines: 272-285 :emphasize-lines: 1-2, 8 Source file: :download:`DQC_std.sampleprogram.py ` Remove spikes ------------- **[Step 3]** The spike data are due to abrupt changes in the data over a short time, which can be identified by the large temporal trends of the data. We remove the spikes where only one sensor is recorded and the other two show missing values. Here, the criteria for temporal tendency is above 8 oC in 5 minutes. .. literalinclude:: programs/DQC_dTdt.sampleprogram.py :language: python :lines: 255-272 :emphasize-lines: 4, 12, 15 Source file: :download:`DQC_dTdt.sampleprogram.py ` **[Step 4]** An additional method to eliminate spikes is to determine if nine data (three sensors and previous, current, and next time step) exceed the treshold. .. literalinclude:: programs/DQC_9variances.sampleprogram.py :language: python :lines: 257-298 :emphasize-lines: 15, 26, 29 Source file: :download:`DQC_9variances.sampleprogram.py ` Before (left) and after (right) quality control in Step 4: Removed major noisy data .. image:: SRC/TAIRall.5min.sample.v04.pre.2015-10.png :width: 300 .. image:: SRC/TAIRall.5min.9var.sample.2015-10.png :width: 300