%% $Id: crs_faq.tex,v 1.37 2017/04/29 16:29:38 jracine Exp jracine $ %\VignetteIndexEntry{Frequently Asked Questions (crs)} %\VignetteDepends{crs} %\VignetteKeywords{nonparametric, spline, categorical} %\VignettePackage{crs} \documentclass[12pt]{amsart} \tolerance=5000 \usepackage{setspace,srcltx,hyperref} \newcommand{\field}[1]{\mathbb{#1}} \newcommand{\R}{\field{R}} %% Change the default page sizes. \setlength{\topmargin}{-0.25in} \setlength{\textheight}{8.5in} \setlength{\oddsidemargin}{.0in} \setlength{\evensidemargin}{.0in} \setlength{\textwidth}{6.5in} \setlength{\footskip}{.5in} \title{Package \texttt{crs} FAQ} \date{\today} \author{Jeffrey S.~Racine} \begin{document} \maketitle \tableofcontents \onehalfspacing \section{Overview and Current Version} This set of frequently asked questions is intended to help users who are encountering unexpected or undesired behavior when trying to use the \texttt{crs} package. If you encounter any issues with the \texttt{crs} package, kindly first ensure that you have the most recent version of \texttt{crs}, \texttt{R}, and \texttt{RStudio} (if appropriate) installed. Sometimes issues encountered using outdated versions of software have been resolved in the current versions, so this is the first thing one ought to investigate when the unexpected occurs. Having ensured that the problem persists with the most recently available versions of all software involved, kindly report any issues you encounter to me, and please include your code, data, version of the \texttt{crs} package and version of \texttt{R} used so that I can help track down any such issues (\href{mailto:racinej@mcmaster.ca}{racinej@mcmaster.ca}). And, of course, if you encounter an issue that you think might be of interest to others, kindly email me the relevant information and I will incorporate it into this FAQ. This FAQ refers to the most recent version, which as of this writing is 0.15-37. Kindly update your version should you not be using the most current (from within R, \texttt{update.packages()} ought to do it, though also see \ref{update} below.). See the appendix in this file for cumulative changes between this and previous versions of the \texttt{crs} package. \section{Frequently Asked Questions} \subsection{How do I cite the \texttt{crs} package?} Once you have installed the \texttt{crs} package ({\tt install.packages("crs")}), if you load the \texttt{crs} package ({\tt library("crs")}) and type \texttt{citation("crs")} you will be presented with the following information. \begin{singlespacing} \begin{verbatim} > citation("crs") To cite package 'crs' in publications use: Jeffrey S. Racine and Zhenghua Nie (2018). crs: Categorical Regression Splines. R package version 0.15-31. https://github.com/JeffreyRacine/R-Package-crs/ A BibTeX entry for LaTeX users is @Manual{, title = {crs: Categorical Regression Splines}, author = {Jeffrey S. Racine and Zhenghua Nie}, year = {2018}, note = {R package version 0.15-31}, url = {https://github.com/JeffreyRacine/R-Package-crs/}, } \end{verbatim} \end{singlespacing} \subsection{I have never used \texttt{R} before. Can you direct me to some introductory material that will guide me through the basics?} There are many excellent introductions to the \texttt{R} environment with more on the way. First, I would recommend going directly to the \texttt{R} website (\url{http://www.r-project.org}) and looking under Documentation/Manuals (\url{http://cran.r-project.org/manuals.html}) where you will discover a wealth of documentation for \texttt{R} users of all levels. See also the \texttt{R} task views summary page (\url{http://cran.r-project.org/web/views}) for information grouped under field of interest. A few documents that I mention to my students which are tailored to econometricians include \url{http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf}, Cribari-Neto \& Zarkos (1999) \cite{CRIBARI_NETO_ZARKOS:1999}, Racine \& Hyndman (2002) \cite{RACINE_HYNDMAN:2002} and Farnsworth (2006) \cite{FARNSWORTH:2006}, to name but a few. Those looking for exemplar data sets outside of those contained in the \texttt{crs} package are directed to the \texttt{Ecdat} \cite{Ecdat} and \texttt{AER} \cite{AER} packages. I maintain a 'Gallery' to provide a forum for users to share code and discover examples and illustrations which can be found at \url{http://socserv.mcmaster.ca/racinej/Gallery/Home.html}. Often the best resource is right down the hall. Ask a colleague whether they use or know anyone who uses R, then offer to buy that person a coffee and along the way drop something like "I keep hearing about the \texttt{R} project\dots I feel like such a Luddite\dots" \subsection{\label{update}How do I keep all \texttt{R} packages on my system current?} Run the command \texttt{update.packages(checkBuilt=TRUE,ask=FALSE)}, which will not only update all packages that are no longer current, but will also update all packages built under outdated installed versions of R, if appropriate. \subsection{It seems that there are a lot of packages that must be installed in order to conduct econometric analysis (tseries, lmtest, np, etc.). Is there a way to avoid having to individually install each package individually?} Certainly. The Comprehensive R Archive Network (CRAN) is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for \texttt{R}. The CRAN 'task view' for computational econometrics might be of particular interest to econometricians. The econometric task view provides an excellent summary of both parametric and nonparametric econometric packages that exist for the R environment and provides one-stop installation for these packages. See \href{http://cran.r-project.org/web/views/Econometrics.html}{cran.r-project.org/web/views/Econometrics.html} for further information. To automatically install a task view, the \texttt{ctv} package first needs to be installed and loaded, i.e., \begin{verbatim} install.packages("ctv") library("ctv") \end{verbatim} The econometric task view can then be installed via {\tt install.views()} and updated via \texttt{update.views()} (which first assesses which of the packages are already installed and up-to-date), i.e., \begin{verbatim} install.views("Econometrics") \end{verbatim} or \begin{verbatim} update.views("Econometrics") \end{verbatim} \subsection{Is there a 'gentle guide' to the \texttt{crs} package that contains some easy to follow examples?} Perhaps the most gentle introduction is contained in the \texttt{crs} package itself in the form of a 'vignette'. To view the vignette run R, install the \texttt{crs} package (\texttt{install.packages("crs")}), then type \texttt{vignette("crs",package="crs")} to view or print the vignette. See also \texttt{vignette("spline\_primer",package="crs")} for a vignette that presents a 'gentle' introduction to regression splines. For a listing of all routines in the \texttt{crs} package type: {\tt library(help="crs")}. \subsection{I noticed you have placed a new version of the \texttt{crs} package on CRAN. How can I determine what has been changed, modified, fixed etc?} See the CHANGELOG on the CRAN site (\url{http://cran.r-project.org/web/packages/crs/ChangeLog}), or go to the end of this document where the CHANGELOG is provided for your convenience. \subsection{How can I read data stored in various formats such as Stata, SAS, Minitab, SPSS etc.~into the R program?} Install the foreign library via \texttt{install.packages("foreign")} then do something like \begin{singlespacing} \begin{verbatim} mydat <- read.dta("datafile.dta"), \end{verbatim} \end{singlespacing} where \texttt{datafile.dta} is the name of your Stata data file. Note that, as of version 0.8-34, the foreign package function {\tt read.dta} supports reading files directly over the Internet making for more portable code. For instance, one could do something like \begin{verbatim} mydat <- read.dta(file="http://www.principlesofeconometrics.com/stata/mroz.dta") \end{verbatim} as one could always do with, say, \texttt{read.table()}. \subsection{Where can I get some examples of R code for the crs package in addition to the examples in the help files?} Start \texttt{R} then type \verb+demo(package="crs")+ and you will be presented with a list of demos for constrained estimation, inference, and so forth. To run one of these demos type, for example, \verb+demo(radial_rgl)+ (note that you must first install the {\tt rgl} package to run this particular demo). To find the location of a demo type \verb+system.file("demo","radial_rgl.R",package="crs")+ for example, then you can take the source code for this demo and modify it for your particular application. \subsection{Can I use crs to perform linear regression?} Certainly! Simply use the options \texttt{cv="none"}, \texttt{degree=rep(1,q)} and \texttt{segments=rep(1,q)} where \texttt{q} is the number of continuous predictors and \texttt{kernel=FALSE} as per the following example: \begin{verbatim} n <- 1000 x1 <- runif(n) x2 <- runif(n) z1 <- rbinom(n,1,.1) z2 <- rbinom(n,1,.1) y <- x1+x2^2+z1+z2+rnorm(n) z1 <- factor(z1) z2 <- factor(z2) model.lm <- lm(y~x1+x2+z1+z2) summary(model.lm) model.crs <- crs(y~x1+x2+z1+z2,cv="none",kernel=FALSE,degree=rep(1,2),segments=rep(1,2)) summary(model.crs) \end{verbatim} You will note that the summary statistics are identical for each model. Also, when conducting search the initial values are set so that the first model estimated is linear (albeit with \texttt{kernel=TRUE} by default) so if the linear model is optimal it will be the one chosen by cross-validation in probability. \subsection{I would like more/less information displayed when conducting search using the NOMAD routines\dots} This is accomplished by feeding the argument {\tt opts=list("DISPLAY\_DEGREE"=x)} to \texttt{crs} where x is a non-negative integer. Setting x=0 produces no information whatsoever while integers $x\ge1$ provide successively more information. \subsection{\texttt{crs} consumes a lot of memory when conducting cross-validation via NOMAD. Can I control this?} By default, the minimum degrees of freedom possible is set to $n-k=1$ where $n$ is the sample size and $k$ is the trace of the basis matrix. This is done in order to allow for extremely complex relationships to be estimated. But for some situations one might not want to 'go there' for two reasons, namely a) memory consumption and b) improbable models having nearly as many predictors as observations. During the search process if you have a large number of predictors and a large number of observations, you will end up computing models having a minuscule number of degrees of freedom and, when this occurs, the basis function may well consume an inordinate amount of memory. Plus the final model determined by the search process may well end up being much more parsimonious. By way of illustration if \texttt{n=100000}, \texttt{k=20}, \texttt{degree.max=10}, and \texttt{segments.max=10} you will likely compute models with spline bases of dimension \texttt{100000x99999} when conducting cross-validated search (i.e.\ models that hit the minimum number of degrees of freedom). In such cases you might wish to restrict the minimum degrees of freedom via setting \texttt{cv.df.min=n-j} where \texttt{n} is your sample size and \texttt{j} is, say, \texttt{1000} so that the maximum dimension of the B-spline would be less than or equal to \texttt{j} (e.g.\ the maximum dimension of the spline basis would now be \texttt{100000x1000}). Of course, you ought to make sure that your final model does not hit this bound as this would indicate that imposing this restriction is binding on the search process which is of course undesirable. \subsection{When I have a large number of regressors/data the function {\tt crs} just 'sits there' when conducting cross-validation via NOMAD\dots} First, if you are concerned that the code is indeed just 'sitting there', you can verify that search is progressing by changing the \verb+"DISPLAY_DEGREE"+ setting in the opts list along the lines of the following: \begin{verbatim} opts <- list("MAX_BB_EVAL"=10000, "EPSILON"=.Machine$double.eps, "INITIAL_MESH_SIZE"="r1.0e-01", "MIN_MESH_SIZE"=paste("r",sqrt(.Machine$double.eps),sep=""), "MIN_POLL_SIZE"=paste("r",sqrt(.Machine$double.eps),sep=""), "DISPLAY_DEGREE"=3) model <- crs(...,opts=opts) \end{verbatim} will print out in gory detail exactly what the search engine is doing. However, if this reveals that there something odd going on (i.e.\ you are seeing a lot of \verb+inf+ function values being printed out), then you might wish to begin by restricting the dimension of the combinatoric search process. By default \texttt{degree.max=10} for each predictor and \texttt{segments.max=10} as well. So this can lead to a basis with 21 columns for one predictor and when using {\tt basis="tensor"} or \texttt{basis="auto"} (which computes both the tensor and additive bases) the dimension of the basis can swamp the number of observations in the sample (e.g.\ with 4 regressors we can have a tensor product multivariate basis that has up to \verb+21^4=194481+ columns when using the default \texttt{degree.max=10} for each predictor and \texttt{segments.max=10}). For this illustration, the cross-validation function can approach $\infty$ if the sample size approaches \verb+194491+ from above and the search process will be searching for a non-$\infty$ value in order to proceed or terminate. So, in such cases begin by restricting the dimension of the spline basis matrix by setting, for instance, the minimum degrees of freedom via \texttt{cv.df.min=n-j} where \texttt{n} is your sample size and \texttt{j=500} so that the maximum dimension of the B-spline would be less than or equal to \texttt{j}. Or you can restrict the dimension of the spline basis matrix by setting, for instance, \texttt{degree.max=2} and \texttt{segments.max=2}. Or begin by searching only over the spline degree by setting \texttt{complexity="degree"} (the default is {\tt complexity="degree-knots"}). The routine will throw a warning if you have a solution that hits the maximum value of \texttt{degree.max} or \texttt{segments.max} and offer some practical advice in these cases. Alternatively, restrict attention to additive (semiparametric) splines by setting \texttt{basis="additive"} (e.g.\ with 4 regressors we can have a tensor product multivariate basis that has up to \verb+21x4=84+ columns when using the default \texttt{degree.max=10} for each predictor and \texttt{segments.max=10}) at the cost of imposing additivity which can be restrictive. Alternatively, consider kernel regression that does not suffer from this computational limitation (see e.g.\ the \texttt{np} package). \subsection{\label{nonsmooth}My estimated model is not 'smooth' (e.g.\ cross-validation chooses, say, the spline degree=1 and number of segments=3). How can I modify this?} Unlike, say, smoothing splines \cite{WAHBA:1990} that penalize 'roughness' (the second derivative of the estimate) and fix the spline degree at, say, three, regression splines fitted by cross-validation can deliver a model that minimizes the objective function without regard to an ad hoc proxy for smoothness (a strength, not a weakness in my opinion). \begin{enumerate} \item\label{all rel} \texttt{All Predictors Relevant}: If, however, you wish an estimate that is, say, twice continuously differentiable, simply set \texttt{degree.min=3} (one degree higher than the desired degree of smoothness) and then determine the appropriate model via cross-validation subject to this constraint (i.e.\ re-estimate your model with the option \texttt{degree.min=3} for example). Otherwise, you can simply override what cross-validation delivered via \texttt{cv="none"} and \texttt{degree=c(3,3,...)} etc. (i.e.\ set the degree and segments in an ad-hoc fashion, which I would not recommend). Of course, this will not be optimal according to the cross-validation criterion but will achieve the desired degree of smoothness. \item \texttt{Some Predictors Irrelevant}: In some cases the optimal spline degree will be zero or bandwidth one or inclusion indicator zero (the latter occurring when \texttt{kernel=FALSE} is specified). This means that cross-validation has automatically removed one or more variables. In this case first remove these variables from the model to be estimated, then re-run your model on the subset of relevant variables only but following the advice in \eqref{all rel} which will achieve the desired degree of smoothness subject to the irrelevant variables no longer being present in the final model. \end{enumerate} \subsection{The crs package implements 'regression splines' and optimizes the spline degree and knot vector by default. But 'smoothing splines' \cite{WAHBA:1990} typically set the degree to 'cubic' and penalize roughness. Can we use \texttt{crs()} (i.e.\ regression splines) to mirror the cubic smoothing spline approach?} Certainly. Consider the following illustration: \begin{verbatim} library(crs) data(cps71) attach(cps71) model <- crs(logwage~age) summary(model) \end{verbatim} You can see that cross-validating both the spline degree and number of knots chooses a degree 2 spline for age and, as a consequence, \texttt{plot(model,deriv=1)} produces a derivative that is nonsmooth (i.e.\ piecewise linear). If one preferred to mirror the cubic spline approach (or for that matter, any fixed order spline), then the following will optimize knots only holding the degree constant at, say, 3 (and will, in general, be faster to solve since we optimize over fewer smoothing parameters). \begin{verbatim} model <- crs(logwage~age,complexity="knots",degree=3) summary(model) \end{verbatim} This model uses a higher degree than the previous one that optimized both the degree and number of knots, but fewer knots. However, note that the cross-validation function is minimized (i.e.\ improved) for the model that optimizes both the degree and number of knots hence this latter model is optimal according to the cross-validation criterion. See also \ref{nonsmooth} above for further caveats. \subsection{I estimated a parametric model using the lm() function. How can I compare the cross-validation score from the crs() approach with that for the parametric model?} This can be readily achieved for the parametric model as follows: \begin{verbatim} data(wage1) model.lm <- lm(lwage ~ married + female + nonwhite + educ + exper + tenure, data = wage1) cv.lm <- mean(residuals(model.lm)^2/(1-hatvalues(model.lm))^2) cv.lm \end{verbatim} You can then compare this with that for the \verb+crs+ model. If the cross-validation score is lower for one model, that indicates that the model possessing the lowest score is to be preferred. \subsection{Why do some runs result in a function value of 1.340781e+154 when conducting multistarting?} As of version 0.15-1 we conduct extensive testing for ill-conditioned bases (univariate and multivariate) and adjust search limits accordingly. However, when a multivariate basis is ill-conditioned we apply a large penalty (\verb+sqrt(.Machine$double.xmax)+ which equals 1.340781e+154 on most processors). Though the search process will try to detect a minimum it can fail here if the objective function is 'flat' in a neighborhood of the initial values. When this occurs you can either increase \verb+nmulti+ and/or decrease \verb+degree.max+ and restart the search. Alternatively, you might try \verb+singular.ok=TRUE+. Note also that as of version 0.15-1, the initial search values will be degree one and segment one (i.e.\ a linear model) unless you provide the vectors \verb+degree=c(...)+ and \verb+segments=c(...)+ which will then be used instead as the starting values for the first multistart. \subsection{snomadr appears to be crashing} If you receive the message \begin{verbatim} Calling NOMAD (Nonsmooth Optimization by Mesh Adaptive Direct Search) *** caught segfault *** address 0x68, cause 'memory not mapped' Traceback: 1: .Call(smultinomadRSolve, ret) \end{verbatim} kindly first ensure that you have write privileges in your current directory (\texttt{snomadr} creates temporary files in the current working directory and if this operation fails you may receive this error). Note this should not occur when using version 0.15-13 or higher since temporary files are no longer generated. \subsection{How do I estimate the conditional quantile rather than the conditional mean?} Simply specify the option \texttt{tau=0.5} to estimate the conditional median regression spline, or any quantile you choose to specify where $\tau\in(0,1)$, as in \begin{verbatim} model <- crs(y~x1+x2,tau=0.5) \end{verbatim} See also \texttt{demo(cqrs)} for a demonstration. \subsection{How can I save a PDF of a plot created with the option persp.rgl=TRUE?} Version 0.15-1 has added support for \verb+RGL+ via the \verb+rgl+ package which is a 3D real-time rendering device driver system for R using OpenGL. These plots are dynamic so you can spin them and resize them using your keypad/mouse. However, they are not standard graphics objects that can be saved using R commands such as \verb+pdf()+. But they can be saved as a PDF by first calling rgl and then issuing the command \verb+rgl.postscript("foo.pdf","pdf")+ where \verb+foo.pdf+ is the desired name of your PDF file as the following illustrates: \begin{verbatim} n <- 1000 x1 <- sort(rnorm(n)) x2 <- rnorm(n) y <- x1^3 + rnorm(n,sd=.1) model <- crs(y~x1+x2) plot(model,mean=T,persp.rgl=T) rgl.postscript("foo.pdf","pdf") \end{verbatim} However, this pdf driver does not support some features such as transparency etc. A better alternative is to create a \verb+png+ file as follows: \begin{verbatim} n <- 1000 x1 <- sort(rnorm(n)) x2 <- rnorm(n) y <- x1^3 + rnorm(n,sd=.1) model <- crs(y~x1+x2) plot(model,mean=T,persp.rgl=T) rgl.snapshot("foo.png") \end{verbatim} and then include this in your \LaTeX\ document using \verb+\includegraphics[scale=.5]{foo.png}+. \subsection{Is it possible to use B-splines instead of indicator bases or kernel weighting for discrete predictors?} Providing the discrete predictors are numeric, certainly. Just make sure they are of type numeric and they will be treated the same as continuous predictors, as in the following example. \begin{verbatim} n <- 1000 x <- runif(n,-2,2) z <- rbinom(n,1,.5) y <- x^3 + z + rnorm(n,sd=.1) model <- crs(y~x+z) summary(model) \end{verbatim} Note in summary that you are told 'There are 2 continuous predictors' (i.e.\ numeric predictors). Note also that the message 'optimal degree equals search maximum (10): rerun with larger degree.max optimal degree equals search maximum (1): rerun with larger degree.max' will appear simply because the default maximum degree is 10 but the program automatically checks for the maximum \textsl{well-conditioned} basis which, for the binary numeric predictor, is 1 (you can ignore this message here). \subsection{I have noticed that as more categorical predictors are added or as the number of outcomes for each categorical predictor increases, computation time increases when \texttt{kernel=TRUE}. Why is this so and can anything be done?} When \texttt{kernel=TRUE} we need to compute kernel weighted regression for each unique combination of the categorical predictors. So if you have one binary predictor, estimation involves two calls to the weighted least squares solver. Now suppose that you have three categorical predictors each having $c=10$ outcomes. Now estimation involves $10^3=1000$ calls to the weighted least squares solver. This will affect all aspects of estimation (cross-validation etc.) which results in increases in computation time. As of version 0.15-8 and up, you can potentially reduce computation time {\em when you have categorical predictors} by discretizing the bandwidth \texttt{lambda} (when \texttt{lambda.discrete=TRUE} we divide $\texttt{lambda}\in [0,1]$ into \texttt{lambda.discrete.num=100 (+1)} values, $(0/100,1/100,\dots,100/100)$) so that integer (discrete) search via NOMAD will be undertaken rather than continuous search which is done by setting \texttt{lambda.discrete=TRUE}. This can potentially reduce run-time with little expected loss in accuracy for the cross-validated search procedure. However, when there are many categorical predictors each having $>2$ outcomes, search based on \texttt{lambda.discrete=TRUE} may be more likely to become ensnared in local minima than the default (i.e.\ properly treating the bandwidths for the categorical predictors as continuous). To reduce computation time further, you can also decrease the number of times search is restarted from different (random) starting points by setting \texttt{nmulti=$i$} to $i<5$ (the default is 5) but I would caution against doing this except when conducting exploratory data analysis (the objective function is typically nonsmooth and may possess multiple local minima). \subsection{The R function 'lag()' does not work as I expect it to. How can I create the $l$th lag of a numeric variable in R to be fed to functions in the \texttt{crs} package?} You can use the \texttt{embed} function to accomplish this task. Here is a simple function that might work more along the lines that you expect. By default we 'pad' the vector with NAs but you can switch this to \texttt{FALSE} if you prefer. The function will return a vector of the same length as the original vector with NA's padded for the missing values. \begin{verbatim} lag.numeric <- function(x,l=1,pad.NA=TRUE) { if(!is.numeric(x)) stop("x must be numeric") if(l < 1) stop("l (lag) must be a positive integer") if(pad.NA) x <- c(rep(NA,l),x) return(embed(x,l+1)[,l+1]) } x <- 1:10 x.lag.1 <- lag.numeric(x,1) \end{verbatim} \subsection{How can I turn off all console I/O?} To disable all console I/O, set \texttt{options(crs.messages=FALSE)} and wrap the function call in \texttt{suppressWarnings()} to disable any warnings printed to the console. For instance \begin{verbatim} library(crs) options(crs.messages=FALSE) set.seed(42) n <- 100 x <- sort(rnorm(n)) z <- factor(rbinom(n,1,.5)) y <- x^3 + rnorm(n) model.crs <- suppressWarnings(crs(y~x+z)) \end{verbatim} ought to produce no console I/O whatsoever in the call to \texttt{crs}. Note that for direct calls to \texttt{snomadr} you instead need to set \texttt{opts=list("DISPLAY\_DEGREE"=0)} (see \texttt{?snomadr} for details). \subsection{How can I retrieve knots used for each predictor when knots are selected via cross-validation} It depends on whether you use uniform or quantile knots. If you have a model with two predictors and you are using uniform knots, for the first predictor (presumed to be named \texttt{x1}) the knots will be \begin{verbatim} seq(min(x1),max(x1),length=model$segments[1]+1) \end{verbatim} while if you are using quantile knots the knots will be \begin{verbatim} quantile(x1,probs=seq(0,1,length=model$segments[1]+1)) \end{verbatim} For the second predictor use \texttt{x2} and \texttt{model\$segments[2]} and so forth. \bibliographystyle{plain} \bibliography{crs_faq} \clearpage \appendix \section*{Changes from Version 0.15-37 to 0.15-38 [4-Sep-2024]} \begin{itemize} \item Thanks to Zhenghua Nie for fixing the compiling failure when R\_NO\_REMAP is defined, and the C/C++ compiling failure when R\_NO\_REMAP is defined. \item Thanks to Duncan Murdoch for the heads up and PR (new function names from rgl package merged) ``An upcoming release of rgl will deprecate a large number of rgl.* functions in favour of the *3d versions. This PR makes the substitutions needed to avoid warnings when that happens. I think the displays should remain the same, but it is possible some defaults (e.g. background colour) are changed.'' \end{itemize} \section*{Changes from Version 0.15-36 to 0.15-37 [31-Dec-2022]} \begin{itemize} \item Fixed issue in snomadr.cpp reported by Brian Ripley \texttt{warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead. [-Wdeprecated-declarations] (thanks Zhenghua!)} \item Fixed issue reported - compilation error on Alpine Linux \#9 \end{itemize} \section*{Changes from Version 0.15-35 to 0.15-36 [18-Oct-2022]} \begin{itemize} \item Fixed \texttt{warning: a function definition without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]} (thanks Zhenghua!) \end{itemize} \section*{Changes from Version 0.15-34 to 0.15-35 [09-Oct-2022]} \begin{itemize} \item Explicit acknowledgment in \texttt{DESCRIPTION} of code \texttt{gsl\_bspline.c/.h} adapted from the GNU GSL library with further details appended to \texttt{gsl\_bspline.c/.h} (thanks to Professor Brian Ripley for noting this oversight). \end{itemize} \section*{Changes from Version 0.15-33 to 0.15-34 [30-Mar-2022]} \begin{itemize} \item Fixed warnings flagged on CRAN (thanks Zhenghua!) \begin{itemize} \item snomadr.cpp:374:60: warning: void operator delete(void*, std::size\_t) called on pointer returned from a mismatched allocation function [-Wmismatched-new-delete] \item nomad\_src/Sgtelib\_Model\_Search.cpp:957:43: warning: array subscript -1 is outside array bounds of bool [2147483647] [-Warray-bounds]] \end{itemize} \item Additional casting of x as \texttt{as.matrix()} in \texttt{is.fullrank()} \end{itemize} \section*{Changes from Version 0.15-32 to 0.15-33 [02-Feb-2021]} \begin{itemize} \item Updated NOMAD to version 3.9.1 (thanks Zhenghua!) \item Removed dependency on C++11 \texttt{random\_shuffle} per recommendation from R development team (thanks Zhenghua!) \item Followed R core team request to ``Please move \texttt{rgl} to Suggests and use conditionally (see 1.1.3.1 of Writing R Extensions) at the next package update'' \end{itemize} \section*{Changes from Version 0.15-31 to 0.15-32 [19-Nov-2019]} \begin{itemize} \item Corrected return value for \texttt{mgcv\_tmm()} (was void, not valid address of R object) \item Added \texttt{crs.messages=FALSE} capabilities to \texttt{clsd()} \end{itemize} \section*{Changes from Version 0.15-30 to 0.15-31 [01-May-2018]} \begin{itemize} \item \texttt{basis="auto"} was default, doc/code discrepancy fixed \item Hattip to Thibault Vatter who patched an issue with weights not being functional in crsiv.R \item Fixed issue with compiler compliance to support ICC and retain backwards compatability with MSVC C++ \end{itemize} \section*{Changes from Version 0.15-29 to 0.15-30 [03-Dec-2017]} \begin{itemize} \item Fixed issue where, occasionally during cross-validation, execution of the function \texttt{npglpreg()} could halt with an error indicating missing value needed where TRUE/FALSE needed \end{itemize} \section*{Changes from Version 0.15-28 to 0.15-29 [06-Oct-2017]} \begin{itemize} \item Fixed issue pointed out by Brian Ripley (thanks!) about \texttt{double ** \_X} in the nomad code \item Updated licence to GPL (>= 3) \end{itemize} \section*{Changes from Version 0.15-27 to 0.15-28 [06-Oct-2017]} \begin{itemize} \item Zhenghua worked his magic to furnish a computationally efficient glp.model.matrix using a C backend similar to Simon Wood's C backend for his tensor.product.model.matrix \item Fixed issue with integer predictors and NOMAD for \texttt{npglpreg()} arising from upgrade to latest version of NOMAD \item Fixed issue with \texttt{MIN\_POLL\_SIZE} default arising from upgrade to latest version of NOMAD \end{itemize} \section*{Changes from Version 0.15-26 to 0.15-27 [06-May-2017]} \begin{itemize} \item Zhenghua fixed minor issue with \texttt{dim.bs()} \item Incorporating Simon Wood's C-based tensor product code (was using his R-based code, the C code is substantially faster) \item Naming of variables in NOMAD port aligned with Brian Ripley's comment ('Identifiers starting with an underscore followed by an upper-case letter or another underscore are reserved for system macros and should not be used in portable code (including not as guards in C/C++ headers).') \end{itemize} \section*{Changes from Version 0.15-25 to 0.15-26 [29-April-2017]} \begin{itemize} \item Zhenghua patched up issues arising with windows build \item \texttt{crs\_init.c} attempts to deal with registration issue/note \end{itemize} \section*{Changes from Version 0.15-24 to 0.15-25 [29-Apr-2017]} \begin{itemize} \item Fixed glitch that could potentially arise when computing derivatives and multiple predictors are assigned degree 0 when using the glp basis \item Updated NOMAD to latest version (3.8, released March 2017) \end{itemize} \section*{Changes from Version 0.15-23 to 0.15-24 [18-Dec-2014]} \begin{itemize} \item Move option init from .onAttach to .onLoad in zzz.R (Bernd Bischl) \item Reworded DESCRIPTION (request from Kurt Hornik) \end{itemize} \section*{Changes from Version 0.15-22 to 0.15-23 [11-Aug-2014]} \begin{itemize} \item Fixed bugs on setting seed in \texttt{snomadr} so that successive calls with restarting result in the same solution (thanks to Arne Henningson for detecting and reporting this issue) \item Addressed issue with \texttt{cv.aic} and \texttt{npglpreg} where degenerate solution could occur when shrinking was invoked \item Enhanced the robust measure of scale to include \texttt{mad} which is more robust than \texttt{IQR} \end{itemize} \section*{Changes from Version 0.15-21 to 0.15-22 [22-Jan-2014]} \begin{itemize} \item \texttt{crs.messages=FALSE} was being ignored in \texttt{frscv}, \texttt{krscv}, \texttt{frscvNOMAD}, and \texttt{krscvNOMAD}, as was the passing of additional arguments via \texttt{opts=list()} to \texttt{snomadr} from certain functions \item Default settings for tuning parameters for \texttt{snomadr} in \texttt{npglpreg} have been optimized based on extensive simulations for simulated and real datasets (we 'optimized the optimizer' so to speak) \item Fixed glitch in \texttt{npglpreg} where \texttt{cv.aic} was only working properly for the local constant estimator (i.e. \texttt{degree=0}) \item \texttt{bwscaling} is deprecated in the latest (development) version of np which necessitated changes to \texttt{npglpreg}. In the process the way scaling is performed ought to be more robust in general due to changes to default starting values for numeric predictors \end{itemize} \section*{Changes from Version 0.15-19 to 0.15-21 [08-Jan-2014]} \begin{itemize} \item Addressed issues with certain non-package files being included via use of \texttt{.Rbuildignore} \item NOMAD team corrected dereferencing null pointer in \texttt{Mads.cpp} that was uncovered through the use of UB sanitizer checks with clang 3.4 \end{itemize} \section*{Changes from Version 0.15-18 to 0.15-19 [30-Dec-2013]} \begin{itemize} \item Updated snomadr to NOMAD 3.6.2 (released December 2013) which resolves crash/incompatibility with the clang compiler on Mac OS X Mavericks 10.9.1 \item Added new function \texttt{clsd()} that does logspline regression jointly choosing the degree and number of segments (knots) - this is to be treated as beta but the univariate continuous only case currently implemented is capable of outperforming existing logspline methods that set the spline degree to an ad hoc value (3) \item \texttt{npglpreg()} now optimizes at the level of the bandwidth scaling factors (thanks to Sebastien Le Digabel for the suggestion) to bring parameters to a 'common scale' \item For cross-validation with \texttt{npglpreg()} we test for very large bandwidths for the continuous predictors ($>$ bandwidth.switch robust standard deviations) then switch to the global polynomial approximation for computation speed (produces identical results but uses global least squares rather than local least squares). This can result in substantial improvements when large bandwidths are appropriate. Combined with trees (in progress in np) this could lead to marked reductions in computation time with zero loss in accuracy \end{itemize} \section*{Changes from Version 0.15-17 to 0.15-18 [29-Dec-2012]} \begin{itemize} \item Using .onUnload rather than .Last.lib in zzz.R \item Stopping rules for crsiv and crsivderiv modified \item Added options to smooth residuals for stopping rule in crsiv (i.e. smooth y-phi(z)~w as opposed to separately smoothing y~w and phi(z)~w) \item Added options to input starting values for crsiv and crsivderiv \end{itemize} \section*{Changes from Version 0.15-16 to 0.15-17 [04-Jun-2012]} \begin{itemize} \item Updated NOMAD from 3.5.0 (released Jan 2011) to 3.5.1 (released Mar 2012) \end{itemize} \section*{Changes from Version 0.15-15 to 0.15-16 [30-Apr-2012]} \begin{itemize} \item Fixed bug when \texttt{deriv=} is used in \texttt{crs()} (derivatives for categorical variables were not correct in some cases - derivatives computed by plot and plot-data were correct however) \item Fixed issue when plot called with \texttt{"plot-data"} or \texttt{"data"} and \texttt{par(mfrow())} is not reset \item Startup message points to the faq, faq is now a vignette \item Glitch fixed in \texttt{npglpreg()} where kernel types were not being passed to the NOMAD solver... \item Added ckerorder option to \texttt{npglpreg()} \item More information output by summary for \texttt{npglpreg()} objects (kernel type, order etc.) \item tol for determining ill-conditioned bases now consistent with Octave/Matlab etc. and uses the tolerance \texttt{max(dim(x))*max(sqrt(abs(e)))*.Machine\$double.eps} \end{itemize} \section*{Changes from Version 0.15-14 to 0.15-15 [16-Apr-2012]} \begin{itemize} \item NOTE: ** major change in default setting ** Default basis set to \texttt{"auto"} from \texttt{"additive"} for more robust results when using default settings with more than one continuous predictor (at the cost of additional computation for all supported bases, more memory required). Warnings changed to reflect this, additional warnings provided, but mainly when \texttt{cv="none"} (i.e. user is doing things manually and failing to provide defaults) \item Added switch for discarding singular bases during cross-validation (\texttt{singular.ok=FALSE} is default, so default is to discard singular bases) \item More stringent checking for ill-conditioned bases during cross-validation - previous checks were insufficient allowing poorly conditioned bases to creep into the final model (this may result in increased runtime for large datasets and multivariate tensor models) \item Potential runtime improvement for large bases (test for ill-conditioned bases was relying partly on \texttt{rcond(t(B)\%*\%B)}, now using crossprod(B) which is more computationally efficient than \texttt{rcond(t(B)\%*\%B)} and has a substantially smaller memory footprint) \item No longer report basis type for one continuous predictor as all bases are identical in this case \end{itemize} \section*{Changes from Version 0.15-13 to 0.15-14 [22-Mar-2012]} \begin{itemize} \item Added support for weights to \texttt{crs()} \item Corrected issue with reporting of cv function value by \texttt{summary()} with quantile regression splines \item Test for coexistence of pruning and quantile regression splines and stop with a message that these cannot coexist \item Check for valid derivative integer \end{itemize} \section*{Changes from Version 0.15-12 to 0.15-13 [05-Mar-2012]} \begin{itemize} \item Added new function \texttt{crsivderiv()} \item More options added to \texttt{crsiv()} \item Modified vignette (\texttt{spline\_primer}) \item Corrected error in demo for IV regression when exhaustive search was selected (nmulti needed to be set though it is not used) \item Changes to code to improve compliance with R 'Writing portable packages' guidelines and correct partial argument matches \end{itemize} \section*{Changes from Version 0.15-11 to 0.15-12 [7-Dec-2011]} \begin{itemize} \item Added option to limit the minimum degrees of freedom when conducting cross validation (see \texttt{cv.df.min} which defaults to 1) which can save memory and computation time for large sample sizes by avoiding computation of potentially very large dimensioned spline bases \end{itemize} \section*{Changes from Version 0.15-10 to 0.15-11 [3-Dec-2011]} \begin{itemize} \item Fixed glitch in \texttt{bwtype="auto"} when \texttt{adaptive\_nn} is selected \end{itemize} \section*{Changes from Version 0.15-9 to 0.15-10 [3-Dec-2011]} \begin{itemize} \item Fixed regression in code when using \texttt{cv.func=cv.aic} \item Added option \texttt{bwtype="auto"} for \texttt{npglpreg} (automatically determine the bandwidth type via cross-validation) \end{itemize} \section*{Changes from Version 0.15-8 to 0.15-9 [25-Nov-2011]} \begin{itemize} \item Fixed glitch when computing/plotting derivatives with \texttt{crs} and multiple predictors are of degree zero \item Added support for parallel \texttt{npglpreg} (calls \texttt{npRmpi} rather than \texttt{np} - see demo) \item Improved handling of complex bases via dim.bs (test for negative degrees of freedom prior to attempting to construct the basis function - dramatically reduces memory overhead and can cut down on unnecessary computation) \end{itemize} \section*{Changes from Version 0.15-7 to 0.15-8 [16-Nov-2011]} \begin{itemize} \item Added support for \texttt{is.fullrank} testing for generalized local polynomial kernel regression for cross-validation (default remains ridging a la Seifert \& Gasser), replaced \texttt{rcond(t(X)\%*\%X)} with \texttt{is.fullrank(X)} (smaller memory footprint for large datasets) \item Fixed glitch with derivative computation when one or more \texttt{degree} is 0 when \texttt{kernel=TRUE} and additionally when one or more factors is excluded when using \texttt{kernel=FALSE} \item Fixed issue with reported cross-validation score only corresponding to leave-one-out cross-validation by passing back cv function from solver rather than computing post estimation \item Added ability to estimate quantile regression splines \item More rigorous testing for rank deficient fit via \texttt{rcond()} in cv function \item Fixed issue where \texttt{degree.min} was set $> 1$ but initial degree was 1 (corresponding to the linear model which is the default for the initial degree otherwise) \item Added the option to treat the continuous bandwidths as discrete with \texttt{lambda.discrete.num+1} values in the range [0,1] which can be more computationally efficient when a 'quick and dirty' solution is sufficient rather than conducting mixed integer search treating the lambda as real-valued \item Corrected incorrect warning about using \texttt{basis="auto"} when there was only one continuous predictor \end{itemize} \section*{Changes from Version 0.15-6 to 0.15-7 [24-Oct-2011]} \begin{itemize} \item Added logical \verb+model.return+ to \verb+crs+ (default \verb+model.return=FALSE+) which previously returned a list of models corresponding to each unique combination of the categorical predictors when \verb+kernel=TRUE+ (the memory footprint could be potentially very large so this allows the user to generate this list if so needed) \end{itemize} \section*{Changes from Version 0.15-5 to 0.15-6 [17-Oct-2011]} \begin{itemize} \item Compiler error thrown on some systems due to changes in \verb+Eval_Point.hpp+ corrected \end{itemize} \section*{Changes from Version 0.15-4 to 0.15-5 [16-Oct-2011]} \begin{itemize} \item Thanks to Professor Brian Ripley, additional Solaris C/C++ compiler warnings/issues have been resolved \item Some internal changes for soon to be deprecated functionality (\verb+sd()+) expected to be deprecated shortly) \end{itemize} \section*{Changes from Version 0.15-3 to 0.15-4 [15-Oct-2011]} \begin{itemize} \item Extended the \verb+gsl.bs+ functionality to permit out-of-sample prediction of the spline basis and its derivatives \item Added option \verb+knots="auto"+ to automatically determine via cross-validation whether to use quantile or uniform knots \item Minor changes to help page examples and descriptions and to the crs vignette \end{itemize} \section*{Changes from Version 0.15-2 to 0.15-3 [05-Sept-2011]} \begin{itemize} \item Fixed glitch when all degrees are zero when computing the cross-validation function (also fixes glitch when all degrees are zero when plotting the partial surfaces) \item Added new function \verb+crssigtest+ (to be considered in beta status until further notice) \item Added F test for no effect (joint test of significance) in \verb+crs summary+ \item Both degree and segments now set to one for first multistart in \verb+crs+ (previously only degree was, but intent was always to begin from a linear model (with interactions where appropriate) so this glitch is corrected) \item Test for pathological case in npglpreg when initializing bandwidths where \verb+IQR+ is zero but \verb+sd > 0+ (for setting robust sd) which occurs when there exist many repeated values for a continuous predictor \item Added 'typical usage' preformatted illustrations for docs \end{itemize} \section*{Changes from Version 0.15-1 to 0.15-2 [30-July-2011]} \begin{itemize} \item Renamed COPYING file to COPYRIGHTS \end{itemize} \section*{Changes from Version 0.15-0 to 0.15-1 [29-Jul-2011]} \begin{itemize} \item Automated detection of ordered/unordered factors implemented \item Initial degree values set to 1 when conducting NOMAD search (only for initial, when \verb+nmulti > +1 random valid values are generated) \item Multiple tests for well-conditioned B-spline bases, dynamic modification of search boundaries when ill-conditioned bases are detected, and detection of non-positive degrees of freedom and full column rank of the spline basis (otherwise the penalty \verb+sqrt(.Machine$double.xmax)+ is returned during search) - this can lead to a significant reduction in the memory footprint \item Added support for generalized B-spline kernel bases (varying order generalized polynomial) \item Corrected issue with plot when variables were cast as \verb+factor+ in the model formula \item Fixed glitch with return object and i/o when \verb+cv="bandwidth"+ and \verb+degree=c(0,0,...,0)+ \item Added tests for pathological cases (e.g. optimize degree and knots but set max degree to min degree or max segments to min hence no search possible). \item Added argument \verb+cv.threshold+ that uses exhaustive search for simple cases where the number of objective function evaluations is less than \verb+cv.threshold+ (currently set to 1000 but user can set). Naturally exhaustive search is always preferred but often unfeasible, so when it is feasible use it. \item Added additional demos for constrained estimation (Du, Parmeter, and Racine (2011)), inference, and a sine-based function. \item Substantial reductions in run-time realized. \begin{itemize} \item Product kernel computation modified for improved run-time of kernel-based cross-validation and estimation. \item Moved from \verb+lsfit+ to \verb+lm.fit+ and from \verb+lm+ to \verb+lm.wfit+/\verb+lm.fit+ in \verb+cv.kernel.spline+ and \verb+cv.factor.spline+ (compute objective function values). Two effects - R devel indicates \verb+lm.fit+/\verb+lm.wfit+ are more robust (confirmed for large number of predictors) and much faster \verb+cv.kernel.spline+ function emerges (run-time cut 20-30\%). \item The combined effects of these changes are noticeable. For instance, run-time for \verb+wage1+ with 7 predictor cross-validation goes from 510 seconds in 0.15-0 to 304 seconds due to use of \verb+lm.fit+/\verb+lm.wfit+ described below to 148 seconds due to the modified kernel function. \end{itemize} \end{itemize} \section*{Changes from Version 0.14-9 to 0.15-0 [23-Jun-2011]} \begin{itemize} \item Thanks to Professor Brian Ripley, compile on Solaris system issues are resolved, and check/examples are reduced in run time to alleviate the excessive check times by the R development team. Many thanks to them for their patience and guidance. \item Minor changes to \verb+radial_rgl+ demo \end{itemize} \section*{Changes from Version 0.14-8 to 0.14-9 [20-Jun-2011]} \begin{itemize} \item Cleaned up issues for creating binary for windows \item Setting seed in \verb+snomadr.cpp+ via \verb+snomadr.R+ for starting points when \verb+nmulti > 0+ \item Increased default \verb+MAX_BB_EVAL+ from 500 to 10000 (makes a difference for difficult problems) and modified default EPSILON in NOMAD along with other parameters (\verb+MIN_MESH_SIZE+, \verb+MIN_POLL_SIZE+) to reflect actual machine precision (using R's \verb+.Machine$double.eps+ where \verb+NOMAD+ fixed EPSILON at 1e-13) \item Zhenghua added help functionality for retrieving help via snomadr \item Now default number of restarts in \verb+crs+ is 5 (zero is not reliable and I want sensible defaults in this package - higher is better but for many problems this ought to suffice) \item Corrected glitches in interactive demos where options were not being passed, updated docs to reflect demos \end{itemize} \section*{Changes from Version 0.14-7 to 0.14-8 [10-Jun-2011]} \begin{itemize} \item \verb+crsiv+ now returns a crs model object that supports residuals, fitted, predict and other generic functions. Note that this approach is based on first computing the model via regularization and then feeding a transformed response to a {\tt crs} model object. You can test how close the two approaches are to one another by comparing \verb+model$phihat+ with \verb+fitted(model)+ via \verb+all.equal(as.numeric(fitted(model)),as.numeric(model$phihat))+ \end{itemize} \section*{Version 0.14-7 [09-Jun-2011]} \begin{itemize} \item Initial release of the \verb+crs+ package on CRAN. \end{itemize} \end{document}