NEWS | R Documentation |
Fixed an error in sampler when setting test offsets when there was only a single test observation.
Issues a warning instead of failing when weights are present in training data but not present in test. Suggestion thanks to github user Pentaonia (Loubert).
No longer depends on gfortran.
Uses SIMD instructions on M1 Macs.
Added experimental callback
functionality to
rbart_vi
.
Custom loss functiosn for xbart
now require an additional
weights argument.
Fixed a multithreaded issue leading to inconsistent results
with xbart
.
rbart_vi
should now correctly use default arguments.
rbart_vi
now works with keepTrainingFits
as false.
Weighted binary responses sample latent variables from the correct distribution.
Extracting the values from the posterior predictive distribution for models with weights now incorporates them into the variance.
Weighted values are considered in loss functions for crossvalidation.
extract
now accepts as a type "trees"
, which
allows for easier inspection of models fit with
"keepTrees"
as TRUE
.
print
generics now exist for bart
and rbart
fits; implementation thanks to Emil Hvitfeldt.
xbart
now accepts a seed
argument to enhance
reproducibility.
bart
/bart2
(and dbarts
through its
tree.prior
argument) accept splitprobs
/
split.probs
which controls the prior probability that any
variable is used when splitting observations.
fitted
for rbart_vi
models now uses a C++
implementation for the expected value that uses less memory
and is faster.
xbart
for binary outcomes with log loss no longer returns
NaN when some subset of the response is perfectly predicted by the
covariates. Bug report thanks to Marcela Veselkova.
dbarts
now exposes access to the underlying proposal
rules and their probabilities through its proposal.probs
argument. bart2
response to the same argument, while
bart
uses proposalprobs
.
bart
, bart2
, and rbart_vi
accept a
seed
argument that will yield reproducible results, even
when running with multiple threads and multiple chains.
The interface registered under R_RegisterCCallable
has
changed to reflect proper fixed hyperpriors for k
.
Samples of the end-node sensitivity parameter, k
,
are returned by rbart_vi
when it modeled.
Burn-in samples of the end-node sensitivity parameter,
k
, are included in the results of bart
,
bart2
, and rbart_vi
.
rbart_vi
will now look for group.by
and
group.by.test
in the data
and test
arguments
before looking in the formula
or calling environments.
Fix for k
mixing across chains when running multithreaded
and with k
being modeled. Bug report thanks to Noah
Greifer.
Fix for xbart
with method = "k-fold"
when data
not evenly divided by number of folds. Rug report thanks to Jesse
(@ALEXLANGLANG on Github).
Sampler method getLatents
and corresponding C function now
add user supplied offset to result.
Saved, flattened trees now correctly partition observations on left and right.
Samplers now have method sampleNodeParametersFromPrior
.
When used in conjunction with sampleTreesFromPrior
allow
the model to fully make predictions from the prior distribution.
dbartsControl
(and now bart
/bart2
through
...
) now accept rngSeed
argument. This can be used
to generate reproducible results with multiple threads. It should
only be used for testing, as the thread-specific pRNGs are seeded
using sequential draws from a pRNG created with the user-supplied
seed.
C interface supports dbarts_createStateExpression
and
dbarts_initializeState
which can be used to re-create
samplers that were allocated using forked multithreading.
C interface also supports dbarts_predict
,
dbarts_setControl
, and dbarts_printTrees
.
Exports makeTestModelMatrix
to allow package authors to
create test data at a later point from training data.
varcount
for bart
fits now has dimnames set.
residuals
generic added to bart
and rbart_vi
.
Parallelization for rbart
now creates the correct
number of chains.
Should now compile on non-x86 architectures. Report thanks to Lars Viklund.
Fixed hang when verbose = TRUE
for multiple threads and
multiple chains. Report thanks to Noah Greifer.
Fixed potential memory access errors when recreating sample from saved state.
Correctly de-serializes saved tree structure.
Sampler now explicitly supports setSigma
for use in
hierarchical models.
Sampler function setOffset
has an additional argument
of updateScale
. When the response is continuous and
updateScale
is TRUE
, the implicit scaling,
effecting the node parameters' variance, is adjusted to match
the range of the new data. This optionally reverts the change
of version 0.9-13 with the intention of being used only during
warmup when using an offset that is itself being sampled.
Extraneous print line from debugging 0.9-17.
Eliminated two race conditions from multithreaded crossvalidation. Report thanks to Ignacio Martinez.
Eliminated garbage read on construction of crossvalidation sampler, removing inconsistencies across multiple runs with the same starting seed.
makeModelMatrixFromDataFrame
now converts character vectors
to factors instead of dropping them. Report thanks to Colin Carlson.
Memory leak for predict
when keepTrees
is FALSE
.
Added extract
and fitted
generics for bart
models. Respects "train"
and "test"
sets of
observations while returning "ev"
- samples from the
posterior of the individual level expected value, "bart"
- the sum of trees component; same as "ev"
for linear
models but on the probit scale for binary ones, and "ppd"
- samples from the posterior predictive distribution. To synergize
with fitted.glm
, "response"
can be used as a synonym
for "ev"
and "link"
can be used as a synonym for
"bart"
.
predict
for bart
models with binary outcomes returns
a result on the probability scale, not probit. The argument
value
is deprecated - use type
instead.
predict
further conforms to the same system of arguments as
extract
and fitted
.
xbart
with a k-hyperprior should no longer crash. Report thanks
to Colin Carlson.
Fits from rbart_vi
now work with generics fitted
,
extract
, and predict
. extract
retrieves
samples from the posterior distribution for the training and test
samples, fitted
applies averages across those samples,
while predict
can be used to obtain values for completely
new observations.
predict
for rbart_vi
takes value "ev" instead of "post-mean"
to clarify what is being returned, i.e. samples from the posterior
distribution of the observation-level expected values.
save
/load
should work correctly. Report thanks to Jeremy Coyle.
predict
now works when trees aren't saved, for use in testing
Metropolis-Hasting proposals.
The offset
slot no longer changes the relative scaling of the
response. This stabilizes predictions across iterations. For a semantic
where the scaling does change, use setResponse
instead.
Varying intercepts model for probit regression.
A hyperpriors for k
has now been implemented. Passing
k = chi(degreesOfFreedom, scale)
now penalizes small values of
k
, encouraging more shrinkage.
Hyperprior of chi(1.25, Inf)
is now default for bart2
with binary outcomes. The default accuracy should improve substantially.
xbart
divides data correctly with random subsampling.
More control over cut points has been added. It is now possible to specify
the cut points for a variable once and subsequently change that predictor
without also modifying the cuts using sampler$setCutPoints
and
sampler$setPredictor
.
sampler$getTrees
implemented to get a flattened, depth-first down
left traversal of the trees.
For sampler$setPredictor
, an argument specifies whether or not to
rollback or force the change if the new data would result in a leaf
having 0 observations.
pdbart
and pd2bart
now work with formula/data specifications,
as well as taking models or samplers that have previously stored trees.
Stores x
as integer matrix of the max of which cut point an observation is
to the left of, by default using 16 bit integers. Limited to 65535 cut points.
That can be increased with some special compilation instructions.
Uses CPU dispatch and SIMD instructions for some operations. This and the integer
x
make BART about 30% faster on datasets of around 10k observations.
Saved trees are stored using significantly less memory.
plot
now works for fits from rbart_vi
.
rbart_vi
new reports varcount
.
bart2
now defaults to not storing trees due to the memory cost.
bart2
now defaults to using quantile rules to decide splits.
predict
for binary outcomes now correct.
Fix for verbose multithreading on Linux, reported by @ignacio82 on github.
General improvements to slice sampler in rbart_vi
thanks to reports from Yutao Liu.
sampler$plotTree
now handles multiple chains correctly.
Negative log loss for xbart
with binary outcomes should now be computed correctly.
rbart_vi
fits a simple varying intercept, random effects model.
Now natively supports multiple chains running in parallel.
Objects fit by bart
can be used with the predict generic
when instructed to save the trees.
New function bart2
introduced, similar to bart
but with
more efficient default parameters.
dbartsControl
has had two parameters renamed: numSamples
is now defaultNumSamples
and numBurnIn
is now
defaultNumBurnIn
.
dbartsControl
supports parameters runMode
,
n.chains
, rngKind
and rngNormalKind
.
In the C interface, a new function (setRNGState
) has been
added to specify the states of the random number generators, of which there
is now one for every chain.
State objects saved by the handles no longer contain the total fits, since they can be rebuild from the tree fits. States are also lists of objects now, with one corresponding to each chain. Tree fits and strings are matrices corresponding to the number of trees and saved samples.
random subsampling crossvalidation (xbart
) has been implemented
in C++. Refits model using current set of trees for changes in
hyperparameters n.trees
, k
, power
, and base
.
Natively parallelized.
Rudimentary tree plotting added to sampler (sampler$plotTree
).
Exported dbartsData
as a way of constructing data objects
and setting the data seen by the sampler all at once. Sampler now supports
sampler$setData()
.
keepevery
argument to bart
matches BayesTree
.
bart
now has argument keepcall
to suppress
storing the call object.
bart
now accepts a weights
argument.
MakeModelMatrixFromDataFrame
now implemented in C, supports
an argument for tracking/keeping dropped values from factors.
Usage of weights was causing incorrect updates to posterior for
\sigma^2
.
Should now JIT byte compile correctly.
Cuts derived from quantiles should now be valid.
Uses a rejection sampler to simulated binary latent variables (CP Robert 2009, http://arxiv.org/pdf/0907.4010.pdf). Code thanks to Jared Murray.
Now encapsulates its own random number generator, so that the C++ objects can safely be used in parallel. Shouldn't affect pure-R users unless their RNG has non-exported state (i.e. Box-Muller normal kind).
Includes a offset.test
vector that can be controlled
independently of the offset
vector, but in general inherits
behavior from it. Set at creation with dbarts()
or after
with setTestOffset
or setTestPredictorAndOffset
.
By default, no longer attempts to obtain identical results as
BayesTree. To recover this behavior, compile from source with
configure.args = "--enable-match-bayes-tree"
.
Changing the entirety of the test matrix using setTestPredictor
no longer allowed. Use setTestPredictors
instead.
Changing the predictor can now result in failure if the covariates
would leave an end-node empty. setPredictor
returns a logical
as to success.
Saved dbarts
objects may not be compatible and should be
re-created to be sure of valdity.
Now requires R versions >= 3.1.0.
Corrected binary latent variable sampler and no longer multiply adds offset (reported by Jared Murray).
Relatively embarassing bug related to loop-unrolling when n mod 5 != 0
fixed.
Correct aggregation of results for multithreaded variance calculations.
More equitably distributed tasks across multiple threads.
Makevars tweaked to allow compilation on Ubuntu.
Initial public release.