Empirical Bayes small area prediction

Empirical Bayes small area predictionunder a zero-inflated lognormal model 
 with correlated random effectsXiaodan LyuJoint work with Dr.Emily Berg04 February 20201 / 20

Basic Setting

Predictions for small domains are our primary interest.
- Standard survey estimators are often unreliable given small sample sizes. (Rao, Molina, 2015)
- Use model to incorporate auxiliary information (known for the whole population).

2 / 20

Basic Setting

Predictions for small domains are our primary interest.
- Standard survey estimators are often unreliable given small sample sizes. (Rao, Molina, 2015)
- Use model to incorporate auxiliary information (known for the whole population).
Data motivation (CEAP RUSLE2)
- skewed data with heavy zeros
- potential correlation between the positive part and the binary part

2 / 20

area mean, quantiles around 15% zeros skewed income data, wine production

Data Model

Suppose response variable $y_{i j}^{*}, i = 1, . ., D; j = 1, . ., N_{i}$ satisfies $y_{i j}^{*} = y_{i j} δ_{i j}$

Positive part: $\log (y_{i j}) = β_{0} + z_{1, i j}^{'} β_{1} + u_{i} + e_{i j}$ where $e_{i j} \overset{i . i . d}{\sim} N (0, σ_{e}^{2})$ .
Binary part: $Pr (δ_{i j} = 1) = p_{i j}$ , $g (p_{i j}) = α_{0} + z_{2, i j}^{'} α_{1} + b_{i}$ where $g (\cdot)$ is the parametric link function to be specified.

3 / 20

Data Model

Lognormal extension
- Mathematically tractable and computationally simple.
- SAE lognormal model - Berg and Chandra (2013).
- SAE zero-inflated lognormal - Lyu (2018).
Correlated random effects
- $(\begin{matrix} u_{i} \\ b_{i} \end{matrix}) \overset{i i d}{\sim} BVN (0, Σ_{u b}), Σ_{u b} = (\begin{matrix} σ_{u}^{2} & ρ σ_{u} σ_{b} \\ ρ σ_{u} σ_{b} & σ_{b}^{2} \end{matrix})$
- "plug-in" approach of Chandra, Chambers (2016) => implicitly assume independence
- Bayesian approach of Pfeffermann et al (2018) => requires specifying prior distributions

4 / 20

linear mixed model used for the positive part

Empirical Bayes general

The minimum mean squared error (MMSE) predictor of $ζ_{i} = \frac{1}{N_{i}} \sum_{j = 1}^{N_{i}} y_{i j}^{*}$ is ${\hat{ζ}}_{i}^{M M S E} = \frac{1}{N_{i}} {\sum_{j \in s_{i}} y_{i j}^{*} + \sum_{j \in {\bar{s}}_{i}} {\hat{y}}_{i j}^{* M M S E}}$ where ${\hat{y}}_{i j}^{* M M S E} = E_{θ} (y_{i j}^{*} | y_{s_{i}}^{*})$

Let $\hat{θ}$ be a consistent estimator of model parameter $θ$ . ${\hat{ζ}}_{i}^{E B} = \frac{1}{N_{i}} {\sum_{j \in s_{i}} y_{i j}^{*} + \sum_{j \in {\bar{s}}_{i}} {\hat{y}}_{i j}^{* E B}}$ where ${\hat{y}}_{i j}^{* E B} = {\hat{y}}_{i j}^{* M M S E} (\hat{θ})$

5 / 20

We replace the true $θ$ with an estimator to obtain the EB predictor nonsampled units ${\bar{s}}_{i}$

Conditional Distribution

Result 1:

$u_{i} | b_{i}, y_{s_{i}}^{*} \sim N ({\tilde{μ}}_{u_{i}}, {\tilde{σ}}_{u_{i}}^{2})$ where ${\tilde{μ}}_{u_{i}} = γ_{i} {\bar{\tilde{r}}}_{i} + (1 - γ_{i}) [ρ \frac{σ_{u}}{σ_{b}} b_{i}], {\tilde{σ}}_{u_{i}}^{2} = γ_{i} σ_{e}^{2} / {\tilde{n}}_{i}$ and $γ_{i} = (1 - ρ^{2}) σ_{u}^{2} [(1 - ρ^{2}) σ_{u}^{2} + σ_{e}^{2} / {\tilde{n}}_{i}]^{- 1}, {\tilde{n}}_{i} = \sum_{j \in s_{i}} δ_{i j}$ , ${\bar{\tilde{r}}}_{i} = {\tilde{n}}_{i}^{- 1} \sum_{j \in s_{i}} δ_{i j} {\log (y_{i j}) - β_{0} - z_{1, i j}^{'} β_{1}}$

Remark: when $ρ = 0$ , $γ_{i} = σ_{u}^{2} / (σ_{u}^{2} + σ_{e}^{2} / {\tilde{n}}_{i})$ , $u_{i} | y_{s_{i}}^{*} \sim N (γ_{i} {\bar{\tilde{r}}}_{i}, γ_{i} σ_{e}^{2} / {\tilde{n}}_{i})$ .

6 / 20

Conditional Distribution (Cont'd)

Result 2:

$f (b_{i} | y_{s_{i}}^{*}) \propto π_{s_{i}} (b_{i}) ϕ (\frac{b_{i} - m_{i}}{\sqrt{v_{i}}})$ where $ϕ (\cdot)$ is the probability density function of standard normal distribution, $π_{s_{i}} (b_{i}) = \prod_{j \in s_{i}} [p_{i j} δ_{i j} + (1 - p_{i j}) (1 - δ_{i j})]$ and $p_{i j} (b_{i}) = g^{- 1} (α_{0} + α_{1} z_{i j} + b_{i})$ . $(m_{i}, v_{i}) = (\frac{ρ σ_{u} σ_{b}}{σ_{u}^{2} + σ_{e}^{2} / {\tilde{n}}_{i}} {\bar{\tilde{r}}}_{i}, \frac{1 - ρ^{2}}{1 - (1 - γ_{i}) ρ^{2}} σ_{b}^{2}) .$ Remark: when $ρ = 0$ , $(m_{i}, v_{i}) = (0, σ_{b}^{2})$ .

7 / 20

Conditional Distribution (Cont'd)

Result 3:

$E (y_{i j}^{*} | y_{s_{i}}^{*}, θ) = h_{i j} (θ) E (p_{i j} (b_{i}) η (b_{i}) | y_{s_{i}}^{*})$ where $h_{i j} (θ) = \exp (β_{0} + x_{i j}^{'} β_{1} + γ_{i} {\bar{\tilde{r}}}_{i} + {\tilde{σ}}_{u_{i}}^{2} / 2 + σ_{e}^{2} / 2)$ and $η (b_{i}) = \exp ((1 - γ_{i}) ρ σ_{u} / σ_{b} b_{i}) .$

👉 ${\hat{y}}_{i j}^{* E B} = E (y_{i j}^{*} | y_{s_{i}}^{*}, \hat{θ})$

Remark: The proposed predictor can accommodate any parametrized link (transformation) function for the positive part and the binary part.

8 / 20

Parameter Estimation

Given $ρ \neq 0$ , how to get a consistent estimator for model parameter $θ = (β^{'}, α^{'}, σ_{u}^{2}, σ_{b}^{2}, σ_{e}^{2}, ρ)^{'}$ ? 😣

9 / 20

Parameter Estimation

Given $ρ \neq 0$ , how to get a consistent estimator for model parameter $θ = (β^{'}, α^{'}, σ_{u}^{2}, σ_{b}^{2}, σ_{e}^{2}, ρ)^{'}$ ? 😣

Full likelihood function: $\begin{array}{rcl} L_{i} (θ) & = & \frac{(1 - γ_{i})^{1 / 2} (v_{i} / σ_{b}^{2})^{1 / 2}}{(2 π σ_{e}^{2})^{{\tilde{n}}_{i} / 2}} \exp (\frac{γ_{i} {\bar{\tilde{r}}}_{i}^{2}}{2 σ_{e}^{2} / {\tilde{n}}_{i}} + \frac{m_{i}^{2}}{2 v_{i}} - \frac{\sum_{j} {\tilde{r}}_{i j}^{2}}{2 σ_{e}^{2}}) \\ \int π_{s_{i}} (b_{i}) \frac{1}{\sqrt{v_{i}}} ϕ (\frac{b_{i} - m_{i}}{\sqrt{v_{i}}}) d b_{i} \end{array}$ Remark:

A good starting value and the gradient vector $\partial L / \partial θ$ help optim find the optimizer much faster. ✌️
Profiling out $ρ$ takes much longer time to find the MLE.

9 / 20

first approach also allows inference for simultaneous confidence interval or hypothesis test

MSE Estimator

Formally, the MSE of an EB predictor is $M S E ({\hat{ζ}}_{i}^{E B}) = M_{1 i} (θ) + M_{2 i} (θ)$ where $M_{1 i} (θ) = E ({\hat{ζ}}_{i}^{M M S E} - ζ_{i})^{2} = O (1), as D \to \infty$ and $M_{2 i} (θ) = E ({\hat{ζ}}_{i}^{E B} - {\hat{ζ}}_{i}^{M M S E})^{2} = o (1), as D \to \infty$ Recall that ${\hat{ζ}}_{i}^{M M S E} = E [ζ_{i} | y_{s_{i}}^{*}, θ], {\hat{ζ}}_{i}^{E B} = E [ζ_{i} | y_{s_{i}}^{*}, \hat{θ}]$

10 / 20

$M_{2 i} (θ)$ results from the parameter estimation process.

MSE Estimator (Cont'd)

So the MSE of the optimal predictor is $M_{1 i} (θ) = E (E [ζ_{i} | y_{s_{i}}^{*}, θ] - ζ_{i})^{2} = E [V (ζ_{i} | y_{s_{i}}^{*}, θ)]$ We propose a One-Step MSE estimator defined by $V (ζ_{i} | y_{s_{i}}^{*}, \hat{θ}) = \frac{1}{N_{i}^{2}} \sum_{j \in {\bar{s}}_{i}} \sum_{k \in {\bar{s}}_{i}} [E {y_{i j}^{*} y_{i k}^{*} | y_{s_{i}}^{*}, \hat{θ}} - E {y_{i j}^{*} | y_{s_{i}}^{*}, \hat{θ}} E {y_{i k}^{*} | y_{s_{i}}^{*}, \hat{θ}}]$ where $E {y_{i j}^{*} y_{i k}^{*} | y_{s_{i}}^{*}} = h_{i j} h_{i k} \exp ({\tilde{σ}}_{u_{i}}^{2}) E [p_{i j} p_{i k} η (2 b_{i}) | y_{s_{i}}^{*}], j \neq k$ and $E {y_{i j}^{*} y_{i k}^{*} | y_{s_{i}}^{*}} = h_{i j}^{2} \exp ({\tilde{σ}}_{u_{i}}^{2} + σ_{e}^{2}) E [p_{i j} η (2 b_{i}) | y_{s_{i}}^{*}], j = k$

11 / 20

Alternate MSE Estimators

Denote original parameter estimate $\hat{θ}$ , bootstrap population $Y^{* (b)}$ , bootstrap sample $y^{* (b)}$ , parameter estimate ${\hat{θ}}^{(b)}$
For $b = 1, \dots, B$ , we obtain $Y^{* (b)} ⟶ {\bar{y}}_{N_{i}}^{* (b)}$ $y^{* (b)}, \hat{θ} ⟶ {\hat{\bar{y}}}_{N_{i}}^{* (b) M M S E}$ $y^{* (b)}, {\hat{θ}}^{(b)} ⟶ {\hat{\bar{y}}}_{N_{i}}^{* (b) E B}$

${\hat{MSE}}_{i}^{Boot} = B^{- 1} \sum_{b = 1}^{B} ({\hat{\bar{y}}}_{N_{i}}^{* (b) EB} - {\bar{y}}_{N_{i}}^{* (b)})^{2}$
${\hat{MSE}}_{i}^{Semi-Boot} = {\hat{M}}_{1 i} + {\hat{M}}_{2 i}^{Boot}$ where a bootstrap estimator of $M_{2 i}$ is defined by ${\hat{M}}_{2 i}^{Boot} = B^{- 1} \sum_{b = 1}^{B} ({\hat{\bar{y}}}_{N_{i}}^{* (b) EB} - {\hat{\bar{y}}}_{N_{i}}^{* (b) MMSE})^{2}$

12 / 20

Simulation Setting

Number of areas: $D = 60$
Sample rate: $(N_{i}, n_{i}) = (71, 5), (143, 10), (286, 20)$ for every 20 areas so that $(N, n) = (10000, 700)$
logit link for the binary part
One-dimensional covariates: $z_{i j} \sim N (4.45, 0.055)$
$β = (- 13, 2)^{'}, α = (- 20, 5)^{'}$ , $(σ_{u}^{2}, σ_{e}^{2}, σ_{b}^{2}) = (0.22, 1.23, 0.52)$
$ρ \in {- 0.9, - 0.6, - 0.3, 0, 0.3, 0.6, 0.9}$
Compare the proposed EB predictor with the EB(0) predictor ${\hat{ζ}}_{i}^{E B} ∣_{ρ = 0}$

Remark: the EB(0) predictor (Lyu 2018) is based on ${\hat{θ}}_{0}$ obtained from fitting the two parts separately

13 / 20

Simulation Results

${R D M S E}_{i} = \frac{M S E ({\hat{ζ}}_{i}^{E B (0)}) - M S E ({\hat{ζ}}_{i}^{E B})}{M S E ({\hat{ζ}}_{i}^{E B})}$

Table 1. Relative difference of MSE of the EB(0) predictor
compared with the EB predictor based on 1000 simulations.

14 / 20

Simulation Results

${R B M S E}_{i} = \frac{{\hat{M S E}}_{i} - M S E ({\hat{ζ}}_{i})}{M S E ({\hat{ζ}}_{i})}, {C I}_{i} = {\hat{ζ}}_{i} \pm 1.96 \sqrt{{\hat{M S E}}_{i}}$

Table 2. Relative bias and coverage rate of nominal 95% confidence intervals when

ρ = 0.9

for different MSE estimators based on 1000 simulations and 100 bootstrap samples.

15 / 20

Estimating the leading term directly seems to improve the quality of CIs based on "Boot" when B = 100

Application to CEAP data

Data Description

Response

RUSLE2: sheet and rill erosion from cropland tons/yr

Auxilliary information

On a cropland grid at a spatial resolution of 56m

logR 💦: log-scale county-level Rainfall factor 👈 NRI

logK 🗻: log-scale erosion factor 👈 Soil Survey

logS 📐: log-scale slope gradient factor 👈 Soil Survey

crop_type 🌱: corn, soybean, sprwht or wtrwht 👈 CDL

16 / 20

Application to CEAP data

Data Description

Parameter Estimates

Based on the observed information matrix, 95% confidence interval for $ρ$ is $(0.67, 0.84)$ .

17 / 20

Application to CEAP data

Data Description

Parameter Estimates

EB Predictions

Figure 1. Cartogram of the EB predicted county means of cropland RUSLE2 in South Dakota. Smaller shrinkage indicates smaller coefficient of variance.

18 / 20

Eastern South Dakota: more cropland, more samples, more positive response Two counties have no sample, many counties have less than 5 samples, some county at most 30 samples. mean SE around 0.005

Conclusion

A frequentist approach has been proposed to fit zero-inflated lognormal model with correlated random effects in small area prediction.
When the true correlation deviates far from 0, the EB(0) predictor (Lyu 2018) has moderately larger MSE than the EB predictor (without model mis-specification).
The data analysis of the cropland CEAP RUSLE2 measurements collected in South Dakota shows the model assumptions are reasonable.

19 / 20

20 / 20

Appendix A: CEAP RUSLE2 Residual Analysis

Positive part fitted value: marginal 👉 $x_{i j}^{'} \hat{β}$ , conditional 👉 $x_{i j}^{'} \hat{β} + \hat{E} [u_{i} | y_{s_{i}}^{*}]$

marginal residual (top):

(log (y_{i j}) - x_{i j}^{'} \hat{β}) / \sqrt{{\hat{σ}}_{u}^{2} + {\hat{σ}}_{e}^{2}}

conditional residual (bottom): $(log (y_{i j}) - x_{i j}^{'} \hat{β} - \hat{E} [u_{i} | y_{s_{i}}^{*}]) / \sqrt{{\hat{σ}}_{e}^{2}}$

20 / 20

Appendix A: CEAP RUSLE2 Residual Analysis

Binary part: Hosmer-Lemeshow test with ${\hat{p}}_{i j} = E [p_{i j} (b_{i}) | y_{s_{i}}^{*}]$

20 / 20

Appendix B: Simulation results

Relative bias and coverage rate of nominal 95% confidence intervals
for the EB and the EB(0) predictor based on 1000 simulations.

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Empirical Bayes small area prediction

under a zero-inflated lognormal model with correlated random effects

Xiaodan Lyu

Joint work with Dr.Emily Berg

04 February 2020

Basic Setting

Basic Setting

Data Model

Data Model

Empirical Bayes general

Conditional Distribution

Result 1:

Conditional Distribution (Cont'd)

Result 2:

Conditional Distribution (Cont'd)

Result 3:

Parameter Estimation

Parameter Estimation

MSE Estimator

MSE Estimator (Cont'd)

Alternate MSE Estimators

Simulation Setting

Simulation Results

Simulation Results

Application to CEAP data

Data Description

Response

Auxilliary information

Application to CEAP data

Data Description

Parameter Estimates

Application to CEAP data

Data Description

Parameter Estimates

EB Predictions

Conclusion

Appendix A: CEAP RUSLE2 Residual Analysis

Appendix A: CEAP RUSLE2 Residual Analysis

Appendix B: Simulation results

Basic Setting

Help

under a zero-inflated lognormal model
with correlated random effects