% $Id: jss_np.Rnw,v 1.69 2008/07/22 19:01:59 jracine Exp jracine $ %\VignetteIndexEntry{The crs Package} %\VignetteDepends{crs} %\VignetteKeywords{nonparametric, spline, categorical} %\VignettePackage{crs} \documentclass[nojss]{jss} \usepackage{amsmath,amsfonts,enumitem} \usepackage[utf8]{inputenc} \newcommand{\field}[1]{\mathbb{#1}} \newcommand{\R}{\field{R}} %% Change the default page sizes. \setlength{\topmargin}{-0.25in} \setlength{\textheight}{8.5in} \setlength{\oddsidemargin}{.0in} \setlength{\evensidemargin}{.0in} \setlength{\textwidth}{6.5in} \setlength{\footskip}{.5in} %% need no \usepackage{Sweave.sty} \author{Jeffrey S.~Racine\\McMaster University} \title{The \pkg{crs} Package} \Plainauthor{Jeffrey S.~Racine} \Plaintitle{The crs Package} \Abstract{ This vignette outlines the implementation of the regression spline method contained in the \proglang{R} \pkg{crs} package, and also presents a few illustrative examples. } \Keywords{nonparametric, semiparametric, regression spline, categorical data} \Plainkeywords{Nonparametric, spline, Econometrics, categorical} \Address{Jeffrey S.~Racine\\ Department of Economics\\ McMaster University\\ Hamilton, Ontario, Canada, L8S 4L8\\ E-mail: \email{racinej@mcmaster.ca}\\ URL: \url{http://www.mcmaster.ca/economics/racine/}\\ } \begin{document} %% Note that you should use the \pkg{}, \proglang{} and \code{} commands. %% Note - fragile using \label{} in \section{} - must be outside %% For graphics <>= library(crs) options(prompt = "R> ", crs.messages = FALSE, digits = 3) @ %% <>= \section*{Introduction} The \pkg{crs} package implements a framework for nonparametric regression splines that admits both continuous and categorical predictors. The categorical predictors can be handled in two ways, (i) using kernel weighting where the kernel functions are tailored to the discrete support of the categorical predictors \citep{RACINE_LI:2004}, and (ii) using indicator basis functions. The fundamental difference between these two approaches is that the use of indicator basis functions consume degrees of freedom via the number of columns in the spline basis matrix, while kernel weighting does not. This package implements the approaches described in \citeauthor{MA_RACINE_YANG:2011} \citeyear{MA_RACINE_YANG:2011} and \citeauthor{MA_RACINE:2013} \citeyear{MA_RACINE:2013} when the option \code{kernel=TRUE} is selected as described below. As well, this package implements a range of related methods and has options that (hopefully) make it appealing for applied projects, research, and pedagogical purposes alike. Data-driven methods can be used for selecting the spline degree, number of segments/knots, and bandwidths (leave-out-out cross-validation (\code{cv.func = "cv.ls"}) \citeauthor{STONE:1974} \citeyear{STONE:1974}, \citeauthor{STONE:1977} \citeyear{STONE:1977}, generalized cross-validation (\code{cv.func="cv.gcv"}) \citeauthor{CRAVEN_WAHBA:1979} \citeyear{CRAVEN_WAHBA:1979}, and the information-based criterion (\code{cv.func="cv.aic"}) proposed by \citeauthor{HURVICH_SIMONOFF_TSAI:1998} \citeyear{HURVICH_SIMONOFF_TSAI:1998}). Details of the implementation are as follows: \begin{enumerate} [label=(\roman{*}),ref=(\roman{*})] \item the degree of the spline and number of segments (i.e.~knots minus one) for each continuous predictor can be set manually as can the bandwidths for each categorical predictor (if appropriate) \item alternatively, any of the data-driven criteria (i.e.\ \code{cv.func=}) could be used to select either the degree of the spline (holding the number of segments/knots minus one fixed at any user-set value) and bandwidths for the categorical predictors (if appropriate), or the number of segments (holding the degree of the spline fixed at any user-set value) and bandwidths for the categorical predictors (if appropriate), or the number of segments and the degree of the spline for each continuous predictor and bandwidths for each categorical predictor (if appropriate) \item when indicator basis functions are used instead of kernel smoothing, whether to include each categorical predictor or not can be specified manually or chosen via any \code{cv.func} method \item we allow the degree of the spline for each continuous predictor to include zero, the inclusion indicator for each categorical predictor to equal zero, and the bandwidth for each categorical predictor to equal one, and when the degree/inclusion indicator is zero or the bandwidth is one, the variable is thereby removed from the regression: in this manner, irrelevant predictors can be automatically removed by any \code{cv.func} method negating the need for pre-testing (\citeauthor{HALL_RACINE_LI:2004} \citeyear{HALL_RACINE_LI:2004}, \citeauthor{HALL_LI_RACINE:2007} \citeyear{HALL_LI_RACINE:2007}) \end{enumerate} The design philosophy of the \pkg{crs} package aims to closely mimic the behavior of the \code{lm} function. Indeed, the implementation relies on \code{lm} for computation of the spline coefficients, obtaining fitted values, prediction and the like. 95\% confidence bounds for the fit and derivatives are constructed from asymptotic formulae and automatically generated. Below we describe in more detail the specifics of the implementation for the interested reader. \section*{Implementation} Kernel-based methods such as local constant (i.e.\ the \citeauthor{NADARAYA:1965} \citeyear{NADARAYA:1965} \citeauthor{WATSON:1964} \citeyear{WATSON:1964} estimator) and local polynomial regression \citep{FAN_GIJBELS:1996} are based on the concept of `local' weighted averaging. Regression spline methods, on the other hand, are `global' in nature since a single least square procedure leads to the ultimate function estimate over the entire data range \citep{STONE:1994}. This `global nature' implies that constructing regression splines will be less computationally burdensome that their kernel-based counterparts leading to their practical appeal relative to kernel-based approaches. However, while kernel-based regression methods admit a rich array of predictor types, spline regression methods can be limited in their potential applicability as they are predicated on \emph{continuous} predictors only. In applied settings we often encounter \emph{categorical} predictors such as strength of preference (``strongly prefer'', ``weakly prefer'', ``indifferent'' etc.) and so forth. When confronted with categorical predictors, researchers typically break their data into subsets governed by the values of the categorical predictors (i.e.~they break their data into `cells') and then conduct regression using only the response and continuous predictors lying in each cell. Though consistent, this `frequency' approach can be inefficient. Recent developments in the kernel smoothing of categorical data \citep{LI_RACINE:2007} suggest more efficient estimation approaches in such settings. The \pkg{crs} package considers two complementary approaches that seamlessly handle the mix of continuous and categorical predictors often encountered in applied settings. \subsection*{The Underlying Model} We presume the reader wishes to model the unknown conditional mean in the following location-scale model, \begin{equation*} Y=g\left( \mathbf{X},\mathbf{Z}\right) +\sigma \left( \mathbf{X},\mathbf{Z} \right) \varepsilon, \end{equation*} where $g(\cdot )$ is an unknown function, $\mathbf{X=}\left( X_{1},\ldots ,X_{q}\right) ^{\mathrm{T}}$ is a $q$-dimensional vector of continuous predictors, and $\mathbf{Z}=\left( Z_{1},\ldots ,Z_{r}\right) ^{\mathrm{T}}$ is an $r$-dimensional vector of categorical predictors. Letting $\mathbf{z} =\left( z_{s}\right) _{s=1}^{r}$, we assume that $z_{s}$ takes $c_{s}$ different values in $D_{s}\equiv \left\{ 0,1,\ldots ,c_{s}-1\right\} $, $ s=1,\ldots ,r$, and let $c_{s}$ be a finite positive constant. For the continuous predictors the regression spline model employs the B-spline routines in the GNU Scientific Library (\url{http://www.gnu.org/software/gsl/}). The B-spline function is the maximally differentiable interpolative basis function (B-spline stands for `basis-spline'), and a B-spline with no internal knots is a B\'ezier curve. Heuristically, we conduct linear (in parameter) regression using the \proglang{R} function \code{lm}. However, we replace the continuous predictors with B-splines of potentially differing order for every continuous predictor. For the tensor product bases we set \code{intercept=TRUE} for each univariate spline basis, while for the additive spline bases we adopt the \code{intercept=FALSE} variants (the B-splines will therefore not sum to one, i.e., an order $m$ B-spline with one segment (two knots/breakpoints) has $m+1$ columns and we drop the first as is often done, though see \citep{ZHOU_WOLFE:2000} for an alternative approach based upon shrinkage methods) and include instead an intercept in the model. This allows multiple bases to coexist when there is more than one continuous predictor without introducing singularities. The tensor product basis is given by \begin{equation*} B_1\otimes B_2\otimes \dots \otimes B_p, \end{equation*} where $\otimes$ is the Kronecker product where the products operate column-wise and $B_j$ is the basis matrix for predictor $j$ as outlined above. We also support a `generalized' polynomial B-spline basis that consists of a varying-order polynomial with appropriate interactions. A general $p$th order local polynomial estimator for the {\it multivariate} predictor case is more cumbersome notationally speaking (we let $q$ denote the number of continuous predictors). The general multivariate case is considered by \citeauthor{MASRY:1996a} \citeyear{MASRY:1996a} who develops some carefully considered notation and establishes the uniform almost sure convergence rate and the pointwise asymptotic normality result for the local polynomial estimator of $g(x)$ and its derivatives up to order $p$. Borrowing from \citeauthor{MASRY:1996a} \citeyear{MASRY:1996a}, we introduce the following notation: \begin{equation*} r = (r_1,\dots ,r_q) , \quad r! = r_1! \times \dots \times r_q! , \quad \bar r = \sum_{j=1}^q r_j, \end{equation*} \begin{align} \label{regression:eq:Masry 2} x^r = x_1^{r_1}\times \dots \times x_{q}^{r_q}, \quad \sum_{ 0 \leq \bar r \leq p} &= \sum_{j=0 }^p \sum_{r_1=0}^j \dots \sum_{r_q=0}^j,\cr &\quad\text{ (with $\bar r \equiv r_1+\dots +r_q = j$) } \end{align} and \begin{equation*} (D^r g)(x) = \frac{ \partial^r g(x) }{ \partial x_1^{r_1} \dots \partial x_q^{r_q} }. \end{equation*} Using this notation, and assuming that $g(x)$ has derivatives of total order $p+1$ at a point $x$, we can approximate $g(z)$ locally using a multivariate polynomial of total order $p$ given by \begin{equation*} g(z) \stackrel{\sim}{= } \sum_{ 0 \leq \bar r \leq p } \frac{1}{r!} (D^r) g(v)|_{v=x}(z-x)^r. \end{equation*} The generalized (varying-order) involves using the following expression rather than \eqref{regression:eq:Masry 2} above, \begin{equation*} \sum_{ 0 \leq \bar r \leq {\max\{p_1,\dots,p_q\}}} = \sum_{j=0 }^{\max\{p_1,\dots,p_q\}} \sum_{r_1=0}^{j\le p_1} \dots \sum_{r_q=0}^{j\le p_q}. \end{equation*} When additive B-spline bases are employed we have a semiparametric `additive' spline model (no interaction among variables), otherwise when the tensor product is employed we have a fully nonparametric model (interaction among all variables). Whether to use the additive or tensor product or generalized polynomial bases can be automatically determined via any \code{cv.func} method (see the options for \code{basis=} in \code{?crs}). We offer the option to use categorical kernel weighting (\code{lm(\dots,weights=L)}) to handle the presence of categorical predictors (see below for a description of \code{L}). We also offer the option of using indicator basis functions for the categorical predictors (again taking care to remove one column to avoid singularity given the presence of the intercept term in the model). These bases are then treated similar to the bases $B_j$ for continuous predictors described above. \subsection*{Example: A B-spline and its First Derivative.} The figure below presents an example of a B-spline and its first derivative (the spline derivatives are required in order to compute derivatives from the spline regression model). \setkeys{Gin}{width=0.45\textwidth} \begin{center} <>= degree <- 5 x <- seq(0,1,length=1000) B <- gsl.bs(x,degree=degree,intercept=TRUE) matplot(x,B,type="l") @ <>= deriv <- 1 B <- gsl.bs(x,degree=degree,deriv=deriv,intercept=TRUE) matplot(x,B,type="l") @ \end{center} Above we plot a degree-\Sexpr{degree} B-spline (left) with one segment (two knots) and its \Sexpr{deriv}st-order derivative (right). \subsection*{Least-Squares Estimation of the Underlying Model} We estimate $\beta \left( \mathbf{z}\right) $ by minimizing the following weighted least squares criterion, \begin{equation*} \widehat{\beta }\left( \mathbf{z}\right) =\arg \min_{\beta \left( \mathbf{z} \right) \in \mathbb{R}^{\mathbf{K}_{n}}}\sum\limits_{i=1}^{n}\left\{ Y_{i}- \mathcal{B}\left( \mathbf{X}_{i}\right) ^{\mathrm{T}}\beta \left( \mathbf{z}\right) \right\} ^{2}L\left( \mathbf{Z}_{i},\mathbf{z},\mathbf{\lambda } \right) . \end{equation*} \subsection*{Placement of Knots} The user can determine where knots are to be placed using one of two methods: \begin{enumerate} [label=(\roman{*}),ref=(\roman{*})] \item knots can be placed at equally spaced quantiles whereby an equal number of observations lie in each segment (`quantile knots') \item knots can be placed at equally spaced intervals (`uniform knots') \end{enumerate} \subsection*{Kernel Weighting} Let $Z_i$ be an $r$-dimensional vector of categorical/discrete predictors. We use $z_s$ to denote the $s$-th component of $z$, we assume that $z_s$ takes $c_s$ different values in $D_s \stackrel{def}{=} \{0,1,\dots ,c_s-1\}$, $s=1,\dots,r$, and let $c_s\geq 2$ be a finite positive constant. For expositional simplicity we will consider the case in which the components of $z$ are unordered. For an unordered categorical predictor, we use a variant of the kernel function outlined in \citep{AITCHISON_AITKEN:1976} defined as \begin{equation} \label{eq:barL(new)} l( Z_{is}, z_{s},\lambda_s) = \left\{ \begin{array}{ll} 1, & \text{ when $ Z_{is} = z_s$},\\ \lambda_s, & \text{ otherwise}. \end{array} \right. \end{equation} Let ${\bf 1}(A)$ denote the usual indicator function, which assumes the value one if $A$ holds true, zero otherwise. Using \eqref{eq:barL(new)}, we can construct a product kernel function given by \begin{equation*} L(Z_i,z,\lambda) = \prod_{s=1}^r l( Z_{is},z_s,\lambda_s) = \prod_{s=1}^{r} \lambda_s^{ {\bf 1}( Z_{is} \neq z_s) }, \end{equation*} while for an ordered categorical we use the function defined by \begin{equation*} \tilde l(Z_{is},z_s,\lambda_s) = \lambda_s^{|Z_{is}-z_s| } \end{equation*} and modify the product kernel function appropriately. When $Z$ contains a mix of ordered and unordered variables we select the appropriate kernel for each variable's type when constructing the product kernel. Note that when $\lambda_s=1$ all observations are `pooled' hence the variable $z_s$ is removed from the resulting estimate, while when $\lambda_s=0$ only observations lying in a given cell are used to form the estimate. \subsection*{Estimation Details} Estimating the model requires construction of the spline bases and their tensor product (if specified) along with the categorical kernel weighting function. Then, for a given degree and number of segments for each continuous predictor and bandwidth for each categorical predictor (or indicator bases if \code{kernel=FALSE}), the model is fit via least-squares. All smoothing parameters can be set manually by the user if so desired. You must use the option \code{cv="none"} otherwise the values specified manually will become the starting points for search when \code{cv="nomad"} (`nonsmooth mesh adaptive direct search', see \citep{NOMAD} and \citep{LEDIGABEL:2011} and the references therein). The degree and bandwidth vector can be jointly determined via any \code{cv.func} method by setting the option \code{cv="nomad"} or \code{cv="exhaustive"} (exhaustive search). Setting the option \code{cv="nomad"} computes NOMAD-based cross-validation directed search while setting \code{cv="exhaustive"} computes exhaustive cross-validation directed search for each unique combination of the degree and segment vector for each continuous predictor from \code{degree=degree.min} through \code{degree=degree.max} (default 0 and 10, respectively) and from \code{segments=segments.min} through \code{segments=segments.max} (default 1 and 10, respectively). When \code{kernel=TRUE} setting the option \code{cv="exhaustive"} computes bandwidths ($\in[0,1]$) obtained via numerical minimization (see \code{optim}) for each unique combination of the degree and segment vectors (restarting can be conducted via \code{restarts=}). When conducting \code{cv="nomad"} the number of multiple starts can be controlled by \code{nmulti=}. The model possessing the lowest criterion function value is then selected as the final model. Note that \code{cv="exhaustive"} is often unfeasible (this combinatoric problem can become impossibly large to compute in finite time) hence \code{cv="nomad"} is the default. However, with \code{cv="nomad"} one should set \code{nmulti=} to some sensible value greater than zero (say, 10 or larger) to strive to avoid becoming trapped in local minima. \subsection*{Data-Driven Smoothing Parameter Criteria} We incorporate three popular approaches for setting the smoothing parameters of the regression spline model, namely least-squares cross-validation, generalized cross-validation, and an AIC method corrected for use in nonparametric settings. Let the fitted value from the spline regression model be denoted $\hat Y_i=B_{m}(X_i)^T\hat \beta(Z_i)$. Letting $\hat\varepsilon_i=Y_i-\hat Y_i$ denote the $i$th residual from the categorical regression spline model, the least-squares cross-validation function is given by \begin{equation*} CV=\frac{1}{n}\sum_{i=1}^n\frac{\hat\varepsilon_i^2}{(1-h_{ii})^2} \end{equation*} and this computation can be done with effectively one pass through the data set, where $h_{ii}$ denotes the $i$th diagonal element of the spline basis projection matrix (see below for details). Since $h_{ii}$ is computed routinely for robust diagnostics by many statistics programs, this can be computed along with (and hence as cheaply as) the vector of spline coefficients themselves. Thus least-squares cross-validation is computationally appealing, particularly for large data sets. Let $H$ denote the $n\times n$ weighting matrix such that $\hat Y=HY$ with its $i$th diagonal element given by $H_{ii}$ where $\text{tr}(H)$ means the trace of $H$ which is equal to $\sum_{i=1}^n h_{ii}$. The matrix $H$ is often called the `hat matrix' or `smoother matrix' and depends on $X$ but not on $Y$. The `generalized' cross-validation function is obtained by replacing $h_{ii}$ in the above formula with its average value denoted $\text{tr}(H)/n$ \citep{CRAVEN_WAHBA:1979}. The information-based criterion proposed by \citep{HURVICH_SIMONOFF_TSAI:1998} is given by \begin{equation*} \text{AIC}_c=\ln(\hat\sigma^2)+ \frac{1+\text{tr}(H)/n}{1-\{\text{tr}(H)+2\}/n}, \end{equation*} where \begin{equation*} \hat\sigma^2=\frac{1}{n}\sum_{i=1}^n\hat\varepsilon_i^2 =Y'(I-H)'(I-H)Y/n. \end{equation*} Each of these criterion functions can be minimized with respect to the unknown smoothing parameters either by numerical optimization procedures or by exhaustive search. Though each of the above criteria are asymptotically equivalent in terms of the bandwidths they deliver ($\text{tr}(H)/n\to 0$ as $n\to\infty$), they may differ in finite-sample settings for a small smoothing parameter (large $\text{tr}(H)/n$) with the AIC$_c$ criterion penalizing more heavily when undersmoothing than either the least-squares cross-validation or generalized cross-validation criteria (the AIC$_c$ criterion effectively applies an infinite penalty for $\text{tr}(H)/n\ge 1/2$). \subsection*{Pruning} Once a model has been selected via cross-validation (i.e.~\code{degree}, \code{segments}, \code{include} or \code{lambda} have been selected), there is the opportunity to (potentially) further refine the model by adding the option \code{prune=TRUE} to the \code{crs} function call. Pruning is accomplished by conducting stepwise cross-validated variable selection using a modified version of the \code{stepAIC} function in the \proglang{R} \pkg{MASS} package where the function \code{extractAIC} is replaced with the function \code{extractCV} with additional modifications where necessary. Pruning of potentially superfluous bases is undertaken, however, the pruned model (potentially containing a subset of the bases) is returned \emph{only if its cross-validation score is lower than the model being pruned}. When this is not the case a warning is issued to this effect. A final pruning stage is commonplace in the spline framework and may positively impact on the finite-sample efficiency of the resulting estimator depending on the rank of the model being pruned. Note that this option can only be applied when \code{kernel=FALSE}. \section*{Illustrative Examples} Next we provide a few illustrative examples that may be of interest to the reader. \subsection*{Example: One Categorical/One Continuous Predictor} By way of illustration we consider a simple example involving one continuous and one discrete predictor. <>= set.seed(42) n <- 1000 x <- runif(n) z <- rbinom(n,1,.5) y <- cos(2*pi*x) + z + rnorm(n,sd=0.25) z <- factor(z) model <- crs(y~x+z) summary(model) @ The function \code{crs} called in this example returns a \code{crs} object. The generic functions \code{fitted} and \code{residuals} extract (or generate) estimated values and residuals. Furthermore, the functions \code{summary}, \code{predict}, and \code{plot} (options \code{mean=FALSE}, \code{deriv=FALSE}, \code{ci=FALSE}, \code{plot.behavior} = \code{c("plot"}, \code{"plot-data"}, \code{"data")}) support objects of this type. The figure below presents summary output in the form of partial regression surfaces (predictors not appearing on the axes are held constant at their medians/modes). Note that for this simple example we used the option \code{plot(model,mean=TRUE)}. \setkeys{Gin}{width=0.75\textwidth} \begin{center} <>= options(SweaveHooks = list(multifig = function() par(mfrow=c(2,2)))) plot(model,mean=TRUE) @ \end{center} \subsection*{Example: Regression Discontinuity Design} By way of illustration we consider a simple example involving two continuous predictors and one categorical predictor. In this example there is a `discontinuity' in the regression surface potentially demarcated by the discrete predictor. <>= set.seed(1234) n <- 1000 x1 <- runif(n) x2 <- runif(n) z <- ifelse(x1>.5,1,0) dgp <- cos(2*pi*x1)+sin(2*pi*x2)+2*z z <- factor(z) y <- dgp + rnorm(n,sd=1) model <- crs(y~x1+x2+z) summary(model) @ The figure below plots the resulting estimate. The discontinuity occurs when $x_1>0.5$ but the nature of the discontinuity is unknown as is the functional form on either side of the potential discontinuity. The categorical regression spline is able to detect this `break'. On a related note, testing for a significant break could be accomplished with an (asymptotic) F-test (to do so we must set \code{kernel=FALSE} however) as the following illustrates (note the argument \code{include=0} says to drop the one categorical predictor or, say, \code{c(1,1,\dots,0,1\dots,1)} for multivariate categorical predictors). <>= ## When kernel=FALSE, we could use the anova() function ## and set model.return=TRUE. ## Unrestricted model: model <- crs(y~x1+x2+z,cv="none", degree=model$degree, segments=model$segments, basis=model$basis, kernel=FALSE, include=1, model.return=TRUE) ## Restricted model: model.res <- crs(y~x1+x2+z,cv="none", degree=model$degree, segments=model$segments, basis=model$basis, kernel=FALSE, include=0, model.return=TRUE) anova(model.res$model.lm,model$model.lm) @ \setkeys{Gin}{width=0.75\textwidth} \begin{center} <>= num.eval <- 50 x1.seq.0 <- seq(min(x1[z==0]),max(x1[z==0]),length=num.eval) x2.seq.0 <- seq(min(x2[z==0]),max(x2[z==0]),length=num.eval) x.grid <- expand.grid(x1.seq.0,x2.seq.0) newdata <- data.frame(x1=x.grid[,1],x2=x.grid[,2],z=factor(rep(0,num.eval**2),levels=c(0,1))) z0 <- matrix(predict(model,newdata=newdata),num.eval,num.eval) x1.seq.1 <- seq(min(x1[z==1]),max(x1[z==1]),length=num.eval) x2.seq.1 <- seq(min(x2[z==1]),max(x2[z==1]),length=num.eval) x.grid <- expand.grid(x1.seq.1,x2.seq.1) newdata <- data.frame(x1=x.grid[,1],x2=x.grid[,2],z=factor(rep(1,num.eval),levels=c(0,1))) z1 <- matrix(predict(model,newdata=newdata),num.eval,num.eval) xlim <- c(0,1) zlim=c(min(z0,z1),max(z0,z1)) theta <- 15 phi <- 10 persp(x=x1.seq.0,y=x2.seq.0,z=z0, xlab="x1",ylab="x2",zlab="y", xlim=xlim, ylim=xlim, zlim=zlim, ticktype="detailed", border="red", theta=theta,phi=phi) par(new=TRUE) persp(x=x1.seq.1,y=x2.seq.1,z=z1, xlab="x1",ylab="x2",zlab="y", xlim=xlim, ylim=xlim, zlim=zlim, theta=theta,phi=phi, ticktype="detailed", border="blue") @ \end{center} \section*{Acknowledgements} I would like to gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada (\url{http://www.nserc-crsng.gc.ca}), the Social Sciences and Humanities Research Council of Canada (\url{http://www.sshrc-crsh.gc.ca}), and the Shared Hierarchical Academic Research Computing Network (\url{http://www.sharcnet.ca}). \bibliography{crs} \end{document}