Reading time-series station data

This page provides how to read hourly time-series station data and create timestamps from the days of the year using Python.

The sample Python program for reading the hourly data is as follows.

Setup the modules

Importing the Python module to access the script from another Python file or module.

import csv
import glob
import pandas as pd
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import calendar
import shutil
import os
from datetime import datetime, date

Read the CSV data

The first step is to read the CSV data.

       #print(' ----- Read CSV file ------')
       df = pd.read_csv(indir+filename \
          , delimiter = ',', na_values = ['NAN','"NAN"'],header=None)
       print(' ----------------------------')

‘indir’ is the directory where the data are stored, and ‘filename’ is the file name.

The original dataset contains several types of missing values, such as NAN, “NAN”, and “””NAN”””. Add two kinds of NAN and “NAN” to the missing values information in the read module. (Python seems to recognize “NAN” and “””NAN””” as the same format).

Generate timestamps

Next, generating timestamps using the days of the year and hour information.

       timestamp = dayofY_toTimeStamp(df)

def dayofY_toTimeStamp(df):

    newdate = pd.to_datetime(df.loc[:,'Year'].values*10000000+df.loc[:,'Day_of_Year'].values*10000+df.loc[:,'HrMin'], format='%Y%j%H%M')

    return newdate

dayofY_toTimeStamp is the subroutine to pick up the date and time (YY-MM-DD HH:MM:SS).

Create new data frame

Create a new empty data frame with the timestamp index, add some data columns from the original data, and concatenate other datasets along wth the columns.

       # --- create a new dataframe
       newdf = pd.DataFrame(index=timestamp)
       newdf.index.names = ["TIMESTAMP"]

       newdf['RECORD']=range(0,len(timestamp))
       newdf['SiteNum']=stid.values

       # Extract data from header name "Year" to "XMTPWR"
       tmp = df.loc[:,"Year":"XMTPWR"]
       tmp["TIMESTAMP"] = timestamp
       tmp.set_index('TIMESTAMP', inplace=True)

       # Concatenating two datasets along with the column (axis=1)
       newdf = pd.concat([newdf,tmp],axis=1)

Save the data

Save the data to a CSV file with adding required header information.

       #----
       # save the data
       #----
       newdf.to_csv(tmpdir+filename,index=True,na_rep='NAN')

       # add additional header information
       fla = tmpdir+os.path.splitext(filename)[0]
       addheaders(fla,fla+'.2',head[0],0)
       addheaders(fla+'.2',fla+'.3',head[2],2)
       addheaders(fla+'.3',outdir+'/'+newfile,head[3],3)

‘tmpdir’ is the temporal directory, and ‘newfile’ is the final data.

Sample program for creating a new dated data set (YY-MM-DD HH:MM:SS)

This program is designed to read all files under multiple directories.

Source file: read_files.py