TUTORIAL ON QUANTILE REGRESSION

Transcripción

TUTORIAL ON QUANTILE REGRESSION
TUTORIAL ON QUANTILE REGRESSION
Xuming He
Department of Statistics, University of Illinois at Urbana-Champaign
Ying Wei
Department of Biostatistics, Columbia University
ENAR 2005
TUTORIAL ON QUANTILE REGRESSION – p. 1
Motivation for quantile regression
• Regression — to obtain a summary of the relationship
between a response variable y and a set of covariates x.
• Least squares regression captures how the mean of y
changes with x.
• Sometimes, a single mean curve is not informative enough.
• Conditional quantile functions provide a more complete
view.
TUTORIAL ON QUANTILE REGRESSION – p. 2
10
8
12
Motivation for quantile regression
6
8
0.9
4
Y
6
Y
0.75
4
0.5
0.9
2
0.75
2
0.25
0.1
0.0
0.2
0.4
0.6
0.8
0
0.5
0.25
0.1
1.0
X
The distribution of y may be
asymmetric around the
mean.
0.6
0.8
1.0
1.2
1.4
1.6
X
Heteroscedasticity may exist
in the data.
TUTORIAL ON QUANTILE REGRESSION – p. 3
Example 1 – Risk factors for low birthweight
• Low birthweight is known to be associated with
◦ higher infant mortality (Abreveya, 2001).
◦ higher health-care cost (Lewit et al., 1995).
◦ a wide range of subsequent health problems (Hack et
al., 1995).
◦ long-term educational attainment and even labor market
outcomes (Corman and Chaikind, 1998, and Currie and
Hyson, 1999).
• Investigate the factors influencing birthweight, especially the
ones that may help reduce the incidence of low birthweight
infants.
TUTORIAL ON QUANTILE REGRESSION – p. 4
Example 1 – Risk factors for low birthweight
• The research question can be rephrased as exploring the
covariate effects on the lower quantiles of birthweight.
• Potential covariates include
◦ Mother’s education
◦ Mother’s prenatal care
◦ Mother’s age
◦ Mother’s weight gain
◦
...
• Covariate effects on lower quantiles may differ from those
on the mean or median birthweight.
• Reference: Abreveya (2001) and Koenker and Hallock
(2001).
TUTORIAL ON QUANTILE REGRESSION – p. 5
Example 2 – Growth charts
20
95%
75%
15
25%
10
5%
5
Weight (kg)
50%
0.0
0.5
1.0
1.5
• Display the distribution of a
certain measurement within a
certain range of time for a
certain population.
2.0
Age
TUTORIAL ON QUANTILE REGRESSION – p. 6
Example 2 – Growth charts
20
95%
75%
B
5%
measurements from an
individual subject in the
context of population values.
10
15
25%
• Focused on the entire
5
Weight (kg)
50%
• Used to screen the
A
0.0
0.5
1.0
1.5
2.0
distribution, especially the
extreme quantiles.
Age
TUTORIAL ON QUANTILE REGRESSION – p. 7
Example 2 – Growth charts
• Widely used in medical sciences:
Pediatrics:
to monitor an infant’s growth status by screening his/her
height, weight and head circumference. (Hamill et al.,
1979)
HIV research:
to monitor CD4 lymphocyte counts in uninfected children
born to HIV-1 infected women.
Genetics:
to help determine the gene frequency of the most
common mutations of the HFE gene causing hereditary
hemochromatosis.(Koziol et al., 2001)
TUTORIAL ON QUANTILE REGRESSION – p. 8
Outlines
• Quantile regression
◦
◦
◦
Estimation
Properties
Computational aspects
• Software in SAS and R
• Examples (continue)
• Inference in quantile regression
• Additional topics
◦ Beyond the linear models
◦
Quantile regression with censored data
TUTORIAL ON QUANTILE REGRESSION – p. 9
Univariate Sample Quantile via Optimization
A random sample {y1 , y2 , ..., yn },
Median = argminξ
The τ th sample quantile = argminξ
where
X
X
|yi − ξ|;
ρτ (yi − ξ),
τ
τ−1
ρτ (u) = u(τ − 1(u<0) ).
0
TUTORIAL ON QUANTILE REGRESSION – p. 10
Linear Regression Quantile (Koenker and Bassett, 1978)
• A linear model for the τ th quantile
yi = x⊤
i β τ + ei ,
i = 1, . . . , n,
(1.1)
where the τ th quantile of ei is zero.
• Underlying assumption of model (1.1):
Qτ (Y |X) = X ⊤ β τ
• Estimator of β τ
β̂ τ = argminβ ∈Rp
n
X
i=1
ρτ (yi − x⊤i β).
• Regression Quantile: Q̂τ (Y |X) = X ⊤ β̂ τ .
TUTORIAL ON QUANTILE REGRESSION – p. 11
Basic properties of regression quantiles
• Scale equivariant: for any a > 0,
β̂ τ (ay, X) = aβ̂ τ (y, X),
β̂ τ (−ay, X) = aβ̂ 1−τ (y, X).
• Regression Shift: for any γ ∈ Rp ,
β̂ τ (y + Xγ, X) = β̂ τ (y, X) + γ.
• Reparameterization of design: for any |A| =
6 0,
β̂ τ (y, AX) = A−1 β̂ τ (y, X).
• Equivariant to monotone transformation,
Qτ (h(Y )|X) = h(Qτ (Y |X)).
TUTORIAL ON QUANTILE REGRESSION – p. 12
Asymptotic properties of regression quantiles
• Asymptotics for the univariate sample quantile.
Let {y1 , y2 , ..., yn } be an i.i.d. random sample with
distribution F , and
X
ρτ (yi − ξ),
ξˆτ = argminξ
i
then
√
τ (1 − τ )
ˆ
,
n(ξτ − ξτ ) ∼ N 0, 2 −1
f (F (τ ))
where f = F ′ .
TUTORIAL ON QUANTILE REGRESSION – p. 13
Asymptotic properties of regression quantiles
• Asymptotics for linear models.
yi = x⊤
i βτ + ei ,
where ei ∼ Fi and Fi−1 (τ ) = 0
then the join distribution
√
n Vn−1/2 (β̂τk − βτk )m
k=1 ∼ N (0, In ) ,
(1.2)
where
◦ Hn (τ ) = n−1
P
xi x⊤
i fi (0),
and Dn =
n−1
P
xi x⊤
i ;
◦ Vn = [(τk ∧ τl − τk τl )Hn (τk )−1 Dn Hn (τl )−1 ]m .
k,l=1
• Reference: Koenker and Bassett (1978), He and Shao
(1996).
TUTORIAL ON QUANTILE REGRESSION – p. 14
Computational aspects
• β̂ τ can be calculated by means of linear programming,
which is efficient in computation.
◦ Primal formulation as a Linear Program
2n
}.
min{τ 1⊤ u+(1−τ )1⊤ v|y = Xb+u−v, (b, u, v) ∈ Rp ×R+
◦ Dural formulation as a Linear Program
max{y ⊤ d|X ⊤ d = (1 − τ )X ⊤ 1, d ∈ [0, 1]n }.
◦ Solutions are characterized by an exact fit to p
observations.
TUTORIAL ON QUANTILE REGRESSION – p. 15
Existing algorithms and practical recommendations
• Simplex method (Koenker and D’Orey, 1987 and 1993)
◦ Start at a random vertex, and search around the edge of
a circumscribed polygon, until the optimum is found.
◦ For moderate data size.
• Interior Point method (Portnoy and Koenker, 1997)
◦ Start from the interior of the circumscribed polygon to
avoid crossing the boundary.
◦ Computationally efficient for large data size.
TUTORIAL ON QUANTILE REGRESSION – p. 16
Existing algorithms and practical recommendations
• Interior point method with preprocessing (Portnoy and
Koenker, 1997)
◦ For very large datasets, e.g., n > 105 .
• Smoothing method (Chen, 2004)
◦ Search the optimized solution by smoothing the
objective function ρτ (·).
TUTORIAL ON QUANTILE REGRESSION – p. 17
Estimation in SAS
• Package: SAS/STAT PROC QUANTREG
◦ Available as an experimental procedure for SAS 9.1
(Windows only);
◦ Downloadable from
http://www.sas.com/statistics
• Basic syntax
PROC QUANTREG DATA = sas-data-set <options>;
BY variables;
CLASS variables;
MODEL response = independents </ options>;
RUN;
TUTORIAL ON QUANTILE REGRESSION – p. 18
Estimation in SAS
• To specify the quantile level:
◦ Use the option QUANTILE in the MODEL statement
MODEL Y = X / QUANTILE = <number list | ALL>;
choice of quantile(s)
example
a single quantile
QUANTILE = 0.25
several quantiles
QUANTILE = 0.25 0.5 0.75
entire quantile process
QUANTILE = ALL
◦ Default value is 0.5, the median.
TUTORIAL ON QUANTILE REGRESSION – p. 19
Estimation in SAS
• To specify the algorithm:
◦ Use the option ALGORITHM in the PROC QUANTREG
statement
method
option
Simplex
ALGORITHM = SIMPLEX
Interior point
ALGORITHM = INTERIOR
Interior point with preprocessing
ALGORITHM = INTERIOR PP
Smoothing
ALGORITHM = SMOOTHING
◦ By default, the Simplex method is used.
TUTORIAL ON QUANTILE REGRESSION – p. 20
Estimation in SAS
• Default Output
— report the name of the data set and the
response variable, the number of covariates, the number
of observations, algorithm of optimization and the
method for confidence intervals.
Model information
— report the sample mean and standard
deviation, sample median, MAD and interquartile range
for each variable included in the MODEL statement.
Summary statistics
— report the quantile level to be
estimated and the optimized objective function.
Quantile objective function
— report the estimated coefficients and
their 95% confidence intervals.
Parameter Estimates
TUTORIAL ON QUANTILE REGRESSION – p. 21
Estimation in R
• R package quantreg ( contributed by Koenker).
◦ Official releases of R and the install package of quantreg
are available at
http://lib.stat.cmu.edu/R/CRAN/
• The syntax of rq().
library(quantreg)
rq(formula, tau=.5, data, weights,
na.action, method="br", ...)
TUTORIAL ON QUANTILE REGRESSION – p. 22
Estimation in R
• Arguments
formula model statement, e.g.
y ∼ x1 + x2
tau the quantile(s) to be estimated:
◦ If tau is a single number ∈ (0,1), return a single
quantile function.
◦ If tau is a vector ∈ (0,1), return several quantile
functions.
◦ If tau 6∈ (0,1), return the entire quantile process.
TUTORIAL ON QUANTILE REGRESSION – p. 23
Estimation in R
method
• method="br" Simplex method (default).
• method="fn" the Frisch−Newton interior point method.
• method="pfn" the Frisch−Newton approach with
preprocessing.
TUTORIAL ON QUANTILE REGRESSION – p. 24
Example 1: Exploring the risk factors of low birthweight
• Data (NATALITY1997) :
◦ National Center for Health Statistics (NCHS).
◦ Provide information on live births in the United States
during calender year 1997, including infant birthweight.
◦ Also provide
– socioeconomic and demographic characteristics of
mothers;
– Mother’s prenatal care and pregnancy history
TUTORIAL ON QUANTILE REGRESSION – p. 25
Example 1: Exploring the risk factors of low birthweight
• Data (NATALITY1997) used in our example:
◦ Subset: live singleton birthes; mothers between the ages
of 18 and 45, either black or white, residing in the U.S.
◦ Sample size: 198,377 observations.
◦ More detailed description of the data is available at
http://www.cdc.gov/nchs/births.htm
TUTORIAL ON QUANTILE REGRESSION – p. 26
A quantile regression model for birthweight
Model:
Qτ (Birthweight) =
Boy + Married + Black
+ High School + Some College + College
+ No Prenatal + Prenatal Second + Prenatal Third
+ Smoker + Ciga Per Day
+ Mother’s Age + Mother’s Age 2
+ Mother’s weight gain + Mother’s weight gain 2
Choice of quantile levels
τ = (0.05, 0.1, 0.15, 0.2, 0.25, ..., 0.8, 0.85, 0.9, 0.95)
TUTORIAL ON QUANTILE REGRESSION – p. 27
SAS codes for the birthweight model
ods graphics on;
PROC QUANTREG DATA=LIBRARY.natality;
MODEL weight = Boy Married Black ...
Mother_Wt_Gain2/QUANTILE = 0.05
0.1 ... 0.9 0.95 PLOT = QUANTPLOT;
RUN;
ods graphics off;
TUTORIAL ON QUANTILE REGRESSION – p. 28
Output from SAS / PROC QUANTREG
TUTORIAL ON QUANTILE REGRESSION – p. 29
Output from SAS / PROC QUANTREG
TUTORIAL ON QUANTILE REGRESSION – p. 30
Output from SAS / PROC QUANTREG
TUTORIAL ON QUANTILE REGRESSION – p. 31
Output from SAS / PROC QUANTREG
TUTORIAL ON QUANTILE REGRESSION – p. 32
Some conclusions for Example 1
• Heteroscedasticity exists in the Natality data.
• Mother’s race, education and prenatal care have much more
significant impact on lower quantiles of birthweight than the
upper quantiles.
• These effects will be underestimated by least squares
regression.
TUTORIAL ON QUANTILE REGRESSION – p. 33
Example 2: Growth charts
• Data: Berkeley data of Tuddenham and Snyder (1954)
• Number of subjects: 136
• Measurements: longitudinal measurements of height and
weight collected
◦ quarterly between ages 0 and 3;
◦ yearly between ages 2 and 8;
◦ semi-yearly between ages 8 and 21.
TUTORIAL ON QUANTILE REGRESSION – p. 34
Growth charts
Unconditional Reference Centiles −− Boy’s weight, 0−18 years
80
0.95
0.9
0.75
0.5
60
0.25
0.1
40
20
0
Weight (kg)
0.05
0
5
10
15
Age (years)
TUTORIAL ON QUANTILE REGRESSION – p. 35
Conditional growth chart
• Screening an individual subject given his/her prior path or
other information (Wei, Pere, Koenker and He, 2004).
• e.g., the rapid weight gain during infancy could lead to
obesity later in the childhood (Stettler et al., 2002).
• A simple AR(2) model
Qτ (Wt ) = βτ 0 + βτ 1 Wt−1 + βτ 2 Wt−2 + βτ 3 Ht ,
where
◦ Wt is the current weight at time t.
◦ Wt−1 and Wt−2 are two prior weights at time t − 1 and
t − 2, respectively.
◦ Ht is the current height at time t.
TUTORIAL ON QUANTILE REGRESSION – p. 36
10
8
Wt = 9.8kg
6
Wt−1 = 7.2kg
Wt−2 = 5.4kg
Ht = 70.6cm
4
Weight (Kg)
12
14
An illustrative example
0.0
0.2
0.4
0.6
0.8
1.0
Age (year)
TUTORIAL ON QUANTILE REGRESSION – p. 37
Example of conditional growth chart (cont.)
• Model:
Qτ (W0.75 ) = β0 + β1 W0.5 + β2 W0.25 + β3 H0.75
The resulting estimated conditional quantiles
τ
0.05
0.1
0.25
0.5
0.75
0.9
0.95
Q̂τ
8.04
8.18
8.38
8.74
9.05
9.36
9.55
• Conclusion:
His current weight = 9.8 > 9.55 = 0.95th conditional quantile.
=⇒
The subject gained an unusual amount of weight at age
0.75 given his prior weights and current height.
TUTORIAL ON QUANTILE REGRESSION – p. 38
Comparison with Least Squares Analysis
• Gaussian model (Cole, 1994)
Zt = β0 + β1 Zt−1 + Ut
where Ut ∼ N (0, σ 2 ).
where Zt is the Box-Cox transformed weight, i.e.,
( λ
Wt −1
λ 6= 0 ;
λ ,
Zt =
ln(Wt ), λ = 0.
• The conditional quantiles
Q̃τ (Zt ) = β̂0 + β̂1 Zt−1 + σΦ−1 (τ ).
• Marginal normality =⇒ Joint normality ?
TUTORIAL ON QUANTILE REGRESSION – p. 39
Normal Q−Q Plot
0
1
2
1
−1
−2
−2
−1
0
1
2
−2
−1
0
1
Theoretical Quantiles
Theoretical Quantiles
t = 9.5
t−1 = 9
t = 9.5 ; t−1 = 9
normality of Z(t)
normality of Z(t−1)
Normal Q−Q Plot
−1
0
1
2
−2
−1
0
1
2
0 e+00
2
−1 e−04
Sample Quantiles
1 e−04
Theoretical Quantiles
Sample Quantiles
−2
0
Sample Quantiles
230
Sample Quantiles
190
−1
0.9714 0.9718 0.9722 0.9726
Sample Quantiles
−2
0.9714 0.9718 0.9722 0.9726
210
230
210
190
Sample Quantiles
2
3
250
normality of Z(t−1)
250
normality of Z(t)
−2
−1
0
1
Theoretical Quantiles
Theoretical Quantiles
Theoretical Quantiles
t = 14
t−1 = 13.5
t = 14 ; t−1 = 13.5
2
TUTORIAL ON QUANTILE REGRESSION – p. 40
Inference in quantile regression
1. Direct estimation
• Based on the asymptotic normality of the estimators and
the direct estimation of their variance-covariance matrix.
• In an i.i.d. error model,
yi = x⊤
i β + ei ,
ei ∼ F.
◦ The variance covariance matrix of β̂τ is
(τ (1 − τ )/f (F −1 (τ ))2 )(X ⊤ X)−1 ,
where f (F −1 (τ )) is the common error density
evaluated at F −1 (τ ).
TUTORIAL ON QUANTILE REGRESSION – p. 41
Inference in quantile regression
1. Direct estimation
• In a non i.i.d. error model,
yi = x⊤
i β + ei
ei ∼ Fi .
• The variance covariance matrix of β̂τ is
(τ (1 − τ ))(X ⊤ F X)−1 (X ⊤ X)(X ⊤ F X)−1 ,
where
F = diag{f1 (F1−1 (τ )), f2 (F2−1 (τ )), ..., fn (Fn−1 (τ ))},
and fi is the density function of ei evaluated at its τ th
quantile Fi−1 (τ ).
TUTORIAL ON QUANTILE REGRESSION – p. 42
Inference in quantile regression
2. Rank score method
• Avoid direct estimation of the error densities.
• Introduced by Gutenbrunner, Jurěcková, Koenker, and
Portnoy (1993) for an i.i.d error model
yi = x⊤
i β + ei ,
ei i.i.d.
• Extended by Koenker and Machado (1999) to
location-scale regression models
⊤
yi = x⊤
i β + (xi γ)ei .
• Generalization of sign tests.
TUTORIAL ON QUANTILE REGRESSION – p. 43
Inference in quantile regression
3. Resampling method
• To avoid direct estimation of variance-covariance matrix.
• Resampling methods
◦ Pairwise bootstrap (Efron and Tibshirani, 1998).
◦ Bootstrapping the estimating equations (Parzen, Wei
and Ying, 1993).
◦ Markov chain marginal bootstrap (MCMB) (He and
Hu, 2002).
TUTORIAL ON QUANTILE REGRESSION – p. 44
Inference in quantile regression
3. Resampling method (cont.)
• Confidence interval based on bootstrap sample
{β̂τ∗1 , ..., β̂τ∗m }
◦ SD-based
[β̂τ ± zα/2 SD∗ (β̂τ∗ )].
◦ Percentile based
[2β̂τ − Q∗1−α/2 , 2β̂τ + Q∗α/2 ].
TUTORIAL ON QUANTILE REGRESSION – p. 45
Inference in quantile regression
4. Summary
• Direct estimation
◦ The asymptotic validity holds generally.
◦ The performance in finite samples is sensitive to the
choice of smoothing.
◦ Computationally efficient.
• Rank score method
◦ The asymptotic validity holds for certain models.
◦ The performance is stable, and robust against
common deviations from the model assumptions.
◦ Time consuming for large data sets.
TUTORIAL ON QUANTILE REGRESSION – p. 46
Inference in quantile regression
4. Summary
• Resampling
◦ The asymptotic validity holds generally.
◦ Could be time consuming for large data sets.
◦ MCMB considerably relieves the computation burden
by reducing a high-dimension problem to several
one-dimensional problems.
TUTORIAL ON QUANTILE REGRESSION – p. 47
Inference in quantile regression
5. Practical recommendations
• Use rank score method for relatively small problems,
e.g. n ≤ 1000 and p ≤ 10.
• Use MCMB for moderately large problem, e.g.
1 × 104 < np < 2 × 106 .
• Use direct estimation with non i.i.d. error assumption for
very large problems.
• Defaults in SAS PROC QUANTREG are similar.
• Reference: Kocherginsky, He and Mu (2005).
TUTORIAL ON QUANTILE REGRESSION – p. 48
Inference in quantile regression
6. Computation packages
• SAS — specify at PROC QUANTREG Statement
PROC QUANTREG CI= <NONE|RANK|...> ALPHA = value
◦ By default, the rank score method is used for small
data sets (n < 5000 and p < 20). Otherwise, the
resampling method is used.
• R
– use function summary.rq().
summary.rq(object, se = “nid”, ...)
◦ By default, the rank score method is used. If
rq(..., iid = F, ), then the direct method with non i.i.d.
error model is used.
TUTORIAL ON QUANTILE REGRESSION – p. 49
Inference in quantile regression
List of inference options in the computation packages
Options
Methods
subcategory
R
SAS
Direct
i.i.d. model
se = "iid"
CI = SPARCITY/IID
n.i.d. model
se = "nid"
CI = SPARCITY
se = "rank",
CI = RANK
se = "boot",
Not avaiable
Rank Score
Resampling
Pairwise
bsmethod ="xy"
Parzen, Wei and Ying
se = "boot",
Not available
bsmethod ="pxy"
MCMB
use R package rqmcmb2
CI = RESAMPLING
TUTORIAL ON QUANTILE REGRESSION – p. 50
Examples (cont.)
• Birthwegiht example:
◦ Using the MCMB method implemented in SAS,
construct confidence band of the estimated coefficients.
◦ Comparison of computing times with the other methods.
• Growth chart example:
◦ Using the Rank Score method implemented in R, to test
the significance of prior weights and current height.
◦ Comparison of confidence intervals using different
methods.
TUTORIAL ON QUANTILE REGRESSION – p. 51
Comparison of inference methods
• Comparison of computing time for the birthweight example.
List of total computing time for τ = 0.5.
CPU time (min.)
Methods
R
SAS
Direct with n.i.d. model
0.4
0.5
Rank Score
> 30
> 30
Resampling with MCMB 3.6
3.0
◦ The CPU time includes both estimation (using interior
point algorithm with preprocessing) and confidence
interval calculation.
◦ CPU 1.4 GHz and Memory 512 M.
TUTORIAL ON QUANTILE REGRESSION – p. 52
Growth chart example
Qτ (W0.75 ) = βτ 0 + βτ 1 W0.5 + βτ 2 W0.25 + βτ 3 H0.75 ,
W0.5
W0.25
H0.75
τ
βτ 1
90% CI
βτ 2
90% CI
βτ 3
90% CI
0.05
0.89
(0.42,1.41)
0.07
(-1.20,0.26)
0.07
(-0.06,0.10)
0.10
1.00
(0.68,1.34)
-0.05
(-0.48, 0.20)
0.05
(-0.01,0.10)
0.25
1.18
(0.73,1.32)
-0.27
(-0.43,0.05)
0.02
(0.00,0.11)
0.50
0.90
(0.77,1.11)
-0.05
(-0.45,0.05)
0.04
(0.03,0.06)
0.75
1.02
(0.94,1.13)
-0.21
(-0.46,0.10)
0.04
(-0.01,0.09)
0.90
1.12
(0.58,1.30)
-0.34
(-0.39,0.56)
0.01
(-0.05,0.17)
0.95
0.62
(0.42,1.56)
0.31
(-0.68,0.66)
0.08
(-0.10,0.19)
* (n = 54 boys.)
TUTORIAL ON QUANTILE REGRESSION – p. 53
Comparison of inference methods
• Using growth chart example
Rank method
Direct method, nid
Paired bootstrap
τ
β̂τ 1
90% CI
length
90% CI
length
90% CI
length
0.05
0.89
(0.42,1.41)
0.99
(0.66,1.13)
0.47
( 0.60,1.19)
0.59
0.10
1.00
(0.68,1.34)
0.66
(0.67,1.33)
0.66
( 0.73,1.27)
0.55
0.25
1.19
(0.73,1.32)
0.59
(0.95,1.43)
0.47
( 0.91,1.47)
0.56
0.50
0.90
(0.77,1.11)
0.34
(0.66,1.14)
0.48
( 0.63,1.18)
0.55
0.75
1.02
(0.94,1.13)
0.20
(0.70,1.34)
0.65
( 0.80,1.24)
0.44
0.90
1.12
(0.58,1.30)
0.72
(0.78,1.47)
0.69
( 0.67,1.57)
0.90
0.95
0.62
(0.42,1.56)
1.14
(-.18,1.42)
1.60
(-0.01,1.26)
1.27
TUTORIAL ON QUANTILE REGRESSION – p. 54
Additional topics
• Nonparametric quantile regression model
Qτ (Yt ) = gτ (t)
◦ A simple idea: approximate gτ (t) by a linear combination
of spline basis functions, and then use PROC
QUANTREG with the basis functions as ”covariates".
◦ Reference: He and Shi (1994) and Koenker, Ng and
Portnoy (1994).
TUTORIAL ON QUANTILE REGRESSION – p. 55
• Quantile regression with censored data.
The proportional hazard model h(t, x; β) = h0 (t)exp{x⊤ β}
uses one global coefficient for the effect of each covariate
on all quantile functions.
Available software on censored quantile regression:
(a) the censoring time known for all observations; or
(b) right censoring all above one quantile function; or
(c) censoring time independent of covariates.
Reference: Portnoy (2003).
TUTORIAL ON QUANTILE REGRESSION – p. 56
References
[1] Abrevaya, J. (2001), The effects of demographics and maternal behavior on the
disgtribution of birth outcomes. Empirical Economics, Vol. 26, 247 - 257.
[2] Chen, C. (2004), An adaptive algorithm for quantile regression. Thoery and
applications of recent robust methods. ed. M. Hubert, G. Pison, A. Struyf and S. Van
Aelst, Series: Statistics for Industry and Technology, Birkhauser, Basel, 39 - 48.
[3] Cole, T. J. (1994), Growth charts for both cross-sectional and longitudinal data.
Statitics in Medicine, Vol. 13, 2477 - 2492.
[4] Corman, H. and Chaikind, S. (1998), The effect of low birthweight on the school
performance and behavior of school-aged children. Economics of Education Review,
Vol. 17, 307 - 316.
[5] Currie, J. and Hyson, R. (1999), Is the impack of health shocks cushioned by
socioeconomic status? the case of low birthweight. American Economic Review, Vol.
89, 245 - 250.
[6] Efron, B. and Tibshirani, R. (1998), An introduction to the Bootstrap. CRC Press,
New York.
[7] Gutenbrunner, C., Jurěcková, J., Koenker, R. and Portnoy, S. (1993), Tests of
linear hypotheses based on regression rank scores. Journal of Nonparametric Statistics
Vol. 2, 307 - 333.
TUTORIAL ON QUANTILE REGRESSION – p. 57
References
[8] Hack, M., Klein, N. K., and Taylor, G. (1995), Long-term developmental outcomes
of low birth weight infants. The Future of Children, Vol. 5, 19 - 34.
[9] Hamill, P. V. V., Dridzd, T. A., Johnson, C. L., Reed, R. B., Roche, A. F. and Moore,
W. M. (1979), Physical grwoth: National Center for Health Statistics percentiles.
American Journal of Clinical Nutrition, Vol. 32, 607 - 629.
[10] He, X. and Hu, F. (2002), Markov Chain Marginal Bootstrap. Jounral of the American
Statistical Association, Vol. 97, 783 - 795.
[11] He, X. and Shao, Q. (1996), A general Bahadur representation of M -estimators
and its application to linear regression with nonstochastic designs. The Annals of
Statistics, Vol. 24, 2608 - 2630.
[12] He, X. and Shi, P. (1994), Convergence rate of B-spline estimators of
nonparametric conditional quantile functions. Journal of Nonparametric Statistics, Vol. 3,
299 - 308.
[13] Lewit, E. M., Baker, L. S., Corman, H. and Shiono, P. H. (1995), The direct costs
of low birth weight. The future of Children, Vol. 5, 35 - 51.
[14] Kocherginsky, M., He, X. and Mu, Y. (2005), Practical Confidence Intervals for
Regression Quantiles. Journal of Computational and Graphical Statistics, Vol. 14, 41 - 55.
TUTORIAL ON QUANTILE REGRESSION – p. 58
References
[15] Koenker, R. and Bassett, G. J. (1978), Regression quantiles. Econometrica, Vol. 46,
33-50.
[16] Koenker,R. and Hallock, K. (2001), Quantile Regression. Journal of Economic
Perspectives, Vol. 51, No. 4, 143-156.
[17] Koenker, R. and D’Orey, V. (1987), Computing regression quantiles. Applied
Statistics, Vol. 36, 383-393.
[18] Koenker, R. and D’Orey, V. (1993), A remark on computing regression quantiles.
Applied Statistics, Vol. 43, 410-414.
[19] Koenker, R., Ng, P. and Portnoy, S. (1994), Quantile smoothing splines. Biometrika,
Vol. 81, 673-680.
[20] Koenker, R and Machado, J. A. F. (1999), Goodness of fit and related inference
processes for quantile regression. Journal of the American Statistical Association, Vol. 94,
1296-1310.
[21] Koziol, J. A., Ho, N. J., Felitti, V. J. and Beutler, E. (2001), Reference centiles for
serum ferritin and percentage of transferrin saturation, with application to
mutations of the HFE gene. Clinical Chemistry, Vol. 47, No. 10, 1804-1810.
TUTORIAL ON QUANTILE REGRESSION – p. 59
References
[22] Stettler, N., Zemel, B. S., Kumanyika, S. and Stallings, V. (2002), Infant weight
gain and childhood overweight status in a multicenter cohort study. Pediatrics, Vol.
109, No. 2, 194-199.
[23] Parzen, M. I., Wei, L. J. and Ying, Z. (1994), A resampling method based on
pivotal estimating functions. Biometrika, Vol. 81, 341-350.
[24] Portnoy, S. (2003), Censored Regression Quantiles, Jounral of the American Statistical
Association, Vol. 98, 1001 - 1012;
[25] Portnoy, S. and Koenker, R. (1997), The Gaussian Hare and the Laplacian
Tortoise: Computability of squared-error versus absolute-error estimators.
Statistical Science, Vol. 12, 279-300.
[26] Tuddenham, R. D. and Snyder, M. M. (1954), Physical growth of California boys
and girls from birth to age 18. California Public Child Development, Vol. 1, 183-364.
[27] Wei, Y., Pere, A., Koenker, R., and He, X. (2004), Quantile regression methods for
reference growth charts. Preprint.
TUTORIAL ON QUANTILE REGRESSION – p. 60