A more flexible BART • flexBART

Welcome to version 2.0 of the flexBART package! flexBART (>= 2.0.0) is a new implementation of BART that is designed to fit flexible varying coefficient models using ensembles of binary regression trees. In addition to the flexible priors for categorical decision rules introduced in earlier versions, this new version introduces a formula interface and implements a lot of data pre-processing that (hopefully) makes it easier than ever to fit BART models.

Installation & Basic Usage

It is highly recommended that you install R version 4.0.0 or later before installing flexBART. Before installing flexBART, ensure that you have set up an appropriate C++ toolchain for your system.

For macOS: we recommend using the macrtools package
For Windows: we recommend using Rtools, which can be downloaded here. Please make sure you download the version of Rtools that corresponds to your R version (e.g., RTools45 for R version 4.5.x)
For Linux: we recommend following these instructions from the Stan development team.

Once your C++ toolchain is configured, you can install flexBART using devtools::install_github:

devtools::install_github(repo = "skdeshpande91/flexBART")

Basic Usage

Starting in version 2.0.0, flexBART features a formula interface and allows users to pass their data as data.frame or tibble objects. So, given a data frame train_data containing named columns for an outcome (e.g., Y) and predictors, you can fit a simple BART model to predict Y using all the predictors by running

flexBART(formula = Y ~ bart(.), train_data = train_data)

flexBART also supports fitting varying coefficient models of the form

$Y = \beta_{0}(X) + \beta_{1}(X)Z_{1} + \cdots + \beta_{R}Z_{R} + \sigma \epsilon; \epsilon \sim N(0,1),$

where each coefficient function $\beta_{r}(X)$ is approximated with its own tree ensemble. To fit such a model in flexBART, you can use a formula like Y ~ bart(.) + Z1 * bart(.) + Z2 * bart(.), including a separate bart() for each coefficient function.

The formula interface also provides fine control over the predictor variables used in each ensemble. To allow an ensemble to only split on a few variables (e.g., X1, X2, and X3), you would specify bart(X1 + X2 + X3) and to allow an ensemble to split on all variables except X1 and X2, you would specify bart(.-X1-X2). Note that when it detects multiple ensembles in the formula, flexBART will not include any of the $Z_{r}$ ’s as splitting variables when it expands the . So, to include, say, a piece-wise linear function, $X_{1} * \beta_{1}(X_{1}),$ you would need to specify X1 * bart(X1) in the formula argument.

See the package articles at the package website for more details.