Statistical Methods for Discrete Response, Time-Series, and Panel Data

Master of Information and Data Science (MIDS)

Course Overview

Course Description

This course covers a range of statistical techniques to model cross-sectional data with unordered and ordered categorical response, count response, univariate time-series data, multivariate timeseries data, longitudinal (or panel) data, and multi-level data from data science perspective. It teaches how to choose from a set of statistical techniques for a given question and to make tradeoffs between model complexity, ease of interpreting results, and implementation complexity in real-world applications.

Learning Objectives

In this class, you will:

Discrete Response Models
- Bernoulli, Binomial, Multinomial, and Poisson probability distributions
- Maximum likelihood estimation
- Profile likelihood ratio test
- Inference for the probability of an event and the use of Wald, Wilson, Agresti-Coull, and Clopper-Pearson confidence intervals
- Odds, relative risks, and odd ratios
- Binary logistic regression model
- Multinomial logistic regression model
- Poisson regression model
- Hypothesis testing for regression parameters
- Log-odds of an event and its relationship to binary logistic regression models
- Probability of an event in the context of binary logistic regression models
- Variable (nonlinear) transformation and interactions
- Contingency tables and the associated inference procedures
- Test for independency
- Model specification
- Model evaluation
- Model selection
Time Series Models
- Common time series patterns
- Autocorrelation and partial autocorrelation
- Notions and measures of stationarity
- Exploratory time series data analysis
- Time series regression
- Akaike’s Information Criterion (and its bias corrected version) and Bayesian Information Criterion (BIC)
- Model selection based on out-of-sample forecast error
- Time series smoothing and filtering techniques
- Stationary and non-stationary time series processes
- Stationary Autoregressive (AR), Moving Average (MA), and Mixed Autoregressive Moving Average (ARMA) processes
- ARIMA model
- Seasonal ARIMA model
- Estimation, diagnostic checking of model residuals, assumption testing, statistical inference, and forecasting
- Regression with autocorrelated errors
- Autoregressive Integrated Moving Average (ARIMA) Model
- Unit roots, Dickey-Fuller (ADF) test, and Phillips-Perron tests
- Spurious regression and Co-integration
- Vector Autoregressive (VAR) Models
Statistical Models for Panel (or Longitudinal) Data
- Exploratory panel data analysis
- Pooled OLS regression model
- First-differenced regression model
- Distributed lag model
- Fixed-effect regression model
- Random-effect regression model
- Linear mixed-effect model

Course Requirements

Passing DataSci W203 with at least a B+ and having a very solid understanding of the probability and mathematical statistic concepts and techniques and linear regression modeling
Hands-on experience in R
Working knowledge of calculus and linear algebra

SYLLABUS

Syllabus