Statistical Methods for Discrete Response, Time-Series, and Panel Data

Master of Information and Data Science (MIDS)

Course Overview

Course Description

This course covers a range of statistical techniques to model cross-sectional data with unordered and ordered categorical response, count response, univariate time-series data, multivariate timeseries data, longitudinal (or panel) data, and multi-level data from data science perspective. It teaches how to choose from a set of statistical techniques for a given question and to make tradeoffs between model complexity, ease of interpreting results, and implementation complexity in real-world applications.

Learning Objectives

In this class, you will:

  • Discrete Response Models

    • Bernoulli, Binomial, Multinomial, and Poisson probability distributions
    • Maximum likelihood estimation
    • Profile likelihood ratio test
    • Inference for the probability of an event and the use of Wald, Wilson, Agresti-Coull, and Clopper-Pearson confidence intervals
    • Odds, relative risks, and odd ratios
    • Binary logistic regression model
    • Multinomial logistic regression model
    • Poisson regression model
    • Hypothesis testing for regression parameters
    • Log-odds of an event and its relationship to binary logistic regression models
    • Probability of an event in the context of binary logistic regression models
    • Variable (nonlinear) transformation and interactions
    • Contingency tables and the associated inference procedures
    • Test for independency
    • Model specification
    • Model evaluation
    • Model selection
  • Time Series Models

    • Common time series patterns
    • Autocorrelation and partial autocorrelation
    • Notions and measures of stationarity
    • Exploratory time series data analysis
    • Time series regression
    • Akaike’s Information Criterion (and its bias corrected version) and Bayesian Information Criterion (BIC)
    • Model selection based on out-of-sample forecast error
    • Time series smoothing and filtering techniques
    • Stationary and non-stationary time series processes
    • Stationary Autoregressive (AR), Moving Average (MA), and Mixed Autoregressive Moving Average (ARMA) processes
    • ARIMA model
    • Seasonal ARIMA model
    • Estimation, diagnostic checking of model residuals, assumption testing, statistical inference, and forecasting
    • Regression with autocorrelated errors
    • Autoregressive Integrated Moving Average (ARIMA) Model
    • Unit roots, Dickey-Fuller (ADF) test, and Phillips-Perron tests
    • Spurious regression and Co-integration
    • Vector Autoregressive (VAR) Models
  • Statistical Models for Panel (or Longitudinal) Data

    • Exploratory panel data analysis
    • Pooled OLS regression model
    • First-differenced regression model
    • Distributed lag model
    • Fixed-effect regression model
    • Random-effect regression model
    • Linear mixed-effect model

Course Requirements

  • Passing DataSci W203 with at least a B+ and having a very solid understanding of the probability and mathematical statistic concepts and techniques and linear regression modeling

  • Hands-on experience in R

  • Working knowledge of calculus and linear algebra

SYLLABUS

Syllabus