Reading time-series station data
This page provides how to read hourly time-series station data and create timestamps from the days of the year using Python.
The sample Python program for reading the hourly data is as follows.
Setup the modules
Importing the Python module to access the script from another Python file or module.
import csv
import glob
import pandas as pd
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import calendar
import shutil
import os
from datetime import datetime, date
Read the CSV data
The first step is to read the CSV data.
#print(' ----- Read CSV file ------')
df = pd.read_csv(indir+filename \
, delimiter = ',', na_values = ['NAN','"NAN"'],header=None)
print(' ----------------------------')
‘indir’ is the directory where the data are stored, and ‘filename’ is the file name.
The original dataset contains several types of missing values, such as NAN, “NAN”, and “””NAN”””. Add two kinds of NAN and “NAN” to the missing values information in the read module. (Python seems to recognize “NAN” and “””NAN””” as the same format).
Generate timestamps
Next, generating timestamps using the days of the year and hour information.
timestamp = dayofY_toTimeStamp(df)
def dayofY_toTimeStamp(df):
newdate = pd.to_datetime(df.loc[:,'Year'].values*10000000+df.loc[:,'Day_of_Year'].values*10000+df.loc[:,'HrMin'], format='%Y%j%H%M')
return newdate
dayofY_toTimeStamp is the subroutine to pick up the date and time (YY-MM-DD HH:MM:SS).
Create new data frame
Create a new empty data frame with the timestamp index, add some data columns from the original data, and concatenate other datasets along wth the columns.
# --- create a new dataframe
newdf = pd.DataFrame(index=timestamp)
newdf.index.names = ["TIMESTAMP"]
newdf['RECORD']=range(0,len(timestamp))
newdf['SiteNum']=stid.values
# Extract data from header name "Year" to "XMTPWR"
tmp = df.loc[:,"Year":"XMTPWR"]
tmp["TIMESTAMP"] = timestamp
tmp.set_index('TIMESTAMP', inplace=True)
# Concatenating two datasets along with the column (axis=1)
newdf = pd.concat([newdf,tmp],axis=1)
Save the data
Save the data to a CSV file with adding required header information.
#----
# save the data
#----
newdf.to_csv(tmpdir+filename,index=True,na_rep='NAN')
# add additional header information
fla = tmpdir+os.path.splitext(filename)[0]
addheaders(fla,fla+'.2',head[0],0)
addheaders(fla+'.2',fla+'.3',head[2],2)
addheaders(fla+'.3',outdir+'/'+newfile,head[3],3)
‘tmpdir’ is the temporal directory, and ‘newfile’ is the final data.
Sample program for creating a new dated data set (YY-MM-DD HH:MM:SS)
This program is designed to read all files under multiple directories.
Source file: read_files.py