EOF Analysis#
Empirical Orthogonal Function (EOF) Decomposition#
EOF analysis decomposes spatial-temporal datasets into orthogonal spatial patterns (EOFs) and their corresponding time series (principal components or PCs). This is useful for identifying dominant modes of variability like ENSO or the Pacific/North American pattern.
This method uses latitude-weighted anomalies, optional standardization, and the GeoCAT EOF routines which are compatible with xarray and masked data.
📌 Function: compute_eof_analysis()
#
def compute_eof_analysis(datA, latlonEOF=(30.0, 50.0, -125.0, -100.0), neval=5, normalize=False, min_coverage=0.9):
# Ensure longitude is 0-360
if datA.lon.min() < 0:
datA = datA.assign_coords(lon=((datA.lon + 360) % 360))
lon_min = latlonEOF[2] if latlonEOF[2] >= 0 else latlonEOF[2] + 360
lon_max = latlonEOF[3] if latlonEOF[3] >= 0 else latlonEOF[3] + 360
# Flip latitude if needed
if datA["lat"][0] > datA["lat"][-1]:
datA = datA.isel(lat=slice(None, None, -1))
# Subset data
datA_eof = datA.sel(lat=slice(latlonEOF[0], latlonEOF[1]), lon=slice(lon_min, lon_max))
# Latitude weighting
rad = np.pi / 180.0
clat = np.sqrt(np.cos(datA_eof["lat"] * rad))
clat = clat.broadcast_like(datA_eof)
datY = datA_eof * clat
if normalize:
dat_std = datY.std(dim="year")
datY = datY / dat_std
datY = datY.transpose("year", "lat", "lon")
# Mask grid points with insufficient coverage
valid_count = datY.notnull().sum(dim="year")
required_count = int(datY.sizes["year"] * min_coverage)
mask = valid_count < required_count
datY = datY.where(~mask)
# Ensure 'time' dimension for GeoCAT
datY = datY.rename({'year': 'time'})
# Perform EOF analysis
eofs = gccomp.eofunc_eofs(datY, neofs=neval, meta=True)
pcs = gccomp.eofunc_pcs(datY, npcs=neval, meta=True)
# Normalize PCs and restore year dim
pcs = pcs.rename({'time': 'year'})
pcs = pcs / pcs.std(dim='year')
pcs.attrs['varianceFraction'] = eofs.attrs.get('varianceFraction', None)
print(f"Percentage Variance Explained: {pcs.attrs['varianceFraction']}")
return pcs, eofs
⚙️ Parameters#
datA
: Input gridded anomalies (dimensions:year
,lat
,lon
)latlonEOF
: Latitude/longitude domain of analysisneval
: Number of EOF modes to returnnormalize
: If True, standardize spatial anomalies before decomposition (correlation matrix EOFs)min_coverage
: Minimum fraction of valid data required per grid point
📌 Example: Covariance and Correlation Matrix EOFs#
# Define EOF domain (Western US)
latlonEOF = (30.0, 50.0, -125.0, -100.0)
# EOF Based on Covariance Matrix
pcs_cov, eofs_cov = compute_eof_analysis(
dat_YR, latlonEOF=latlonEOF, neval=5,
normalize=False # Use raw anomalies
)
# EOF Based on Correlation Matrix
pcs_std, eofs_std = compute_eof_analysis(
dat_YR, latlonEOF=latlonEOF, neval=5,
normalize=True # Use standardized anomalies
)
🔍 Note: Covariance EOFs emphasize high-variance regions. Correlation EOFs treat all locations equally.
This method is useful for understanding large-scale coherent patterns and reducing dimensionality for further analysis (e.g., regression, clustering).