bayescart package

Submodules

bayescart.bcart module

Bayesian CART (BCART) models.

This module implements the BCART class and several subclasses to perform Bayesian inference on classification and regression trees. Subclasses address performance improvements, it is recommended to use these.

class bayescart.bcart.BCART(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)[source]

Bases: object

Base class for Bayesian CART (Classification and Regression Trees).

This class provides methods for initializing the tree, running the Markov chain Monte Carlo (MCMC) sampling, computing likelihoods and priors, resampling parameters, and updating tree moves (grow, prune, change, swap).

Specific MCMC-methods are abstract and should be implemented in derived classes.

Parameters:
  • X (pd.DataFrame) – Feature data.

  • y (pd.Series) – Response data.

  • alpha (float, optional) – Hyperparameter for the tree prior (default 0.95).

  • beta (float, optional) – Hyperparameter controlling the split probability decay (default 0.5).

  • mu_0 (float or None, optional) – Initial value for the regression mean (default None, will be sampled).

  • sigma_0 (float or None, optional) – Initial value for the regression standard deviation (default None, will be sampled).

  • nu (float, optional) – Hyperparameter for the inverse gamma prior (default 3).

  • lambd (float, optional) – Scale hyperparameter for the inverse gamma prior (default 0.1).

  • mu_bar (float, optional) – Prior mean for regression (default 0).

  • a (float, optional) – Hyperparameter (default 0.3).

  • node_min_size (int, optional) – Minimum observations per node (default 5).

  • alpha_dirichlet (NDArrayLike, optional) – Parameter(s) for the Dirichlet prior in classification (default 1.0).

  • iters (int, optional) – Total number of MCMC iterations (default 1250).

  • burnin (int, optional) – Number of burn-in iterations (default 250).

  • thinning (int, optional) – Thinning factor (default 1).

  • store_tree_spacing (int, optional) – Spacing for storing tree copies (default 10).

  • max_stored_trees (int, optional) – Maximum number of stored trees (default 2000). Caps the total memory consumption.

  • move_prob (Sequence[float], optional) – Probabilities for the moves (grow, prune, change, swap) (default [0.25,0.25,0.25,0.25]).

  • verbose (str, optional) – Verbosity level.

  • seed (int or np.random.Generator, optional) – Random seed or generator (default 45).

  • debug (bool, optional) – If True, enables debugging assertions (default False).

  • light (bool, optional) – If True, uses lighter copies for speed (default True).

is_classification

True if running a classification task.

Type:

bool

K

Number of classes (if classification).

Type:

int

Initialize the BCART object with model and MCMC parameters.

__init__(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)[source]

Initialize the BCART object with model and MCMC parameters.

calc_change_swap_mh_data(transition_p)[source]

Compute and store move-specific information needed for the change/swap move acceptance ratio.

Parameters:

transition_p (float) – The transition probability ratio for the move.

calc_grow_prune_mh_data(new_tree, node_to_split, tot_parents_with_two_leaves, b)[source]

Compute and store move-specific information needed for the grow/prune move acceptance ratio.

Parameters:
  • new_tree (Tree) – The proposed new tree.

  • node_to_split (Node) – The node that was split (or pruned).

  • tot_parents_with_two_leaves (int) – Total number of internal nodes with two leaves.

  • b (int) – The number of leaves in the current tree.

calc_llik(tree)[source]

Compute the integrated log-likelihood P(Y|X,T) for the tree.

Parameters:

tree (Tree)

Returns:

The integrated log-likelihood.

Return type:

float

calc_log_tree_prob(tree)[source]

Compute the log prior probability of the tree structure P(T|X).

This is the prob of sampling a specific tree given a dataset x

Parameters:

tree (Tree)

Returns:

The log prior probability.

Return type:

float

change_copy(*args, light=False, **kwargs)[source]

Pick an internal node and change its splitting rule. Return a copied tree.

Returns:

The new tree after the change move.

Return type:

Tree

copy()[source]

Create a copy of the BCART object.

Returns:

A copy of the current BCART instance.

Return type:

Self

Raises:

NotImplementedError – To be implemented.

deep_copy()[source]

Create a deep copy of the BCART object.

Returns:

A deep copy.

Return type:

Self

Raises:

NotImplementedError – To be implemented.

get_acceptance_prob(new_tree, move)[source]

Compute the acceptance probability for a proposed tree move.

Parameters:
  • new_tree (Tree) – The proposed tree after a move.

  • move (str) – The move type.

Returns:

The acceptance probability.

Return type:

float

get_leaves_counts(tree)[source]

For classification, compute the counts of observations per class in each leaf.

Parameters:

tree (Tree)

Returns:

Array of shape (n_leaves, n_classes).

Return type:

NDArrayInt

get_leaves_means(tree)[source]

For regression, compute for each leaf the number of observations, the mean, and un-normalized variance.

Parameters:

tree (Tree)

Returns:

(n_obs per leaf, means per leaf, un-normalized variances per leaf)

Return type:

tuple

get_log_posterior_prob(tree)[source]

Compute the log posterior probability of a tree (up to a normalizing constant).

This includes the full conditionals for leaf parameters, the tree prior, and the integrated likelihood.

P( heta, T | X, Y) prop P( heta | T, X, Y) * P(T | X, Y) P( heta | T, X, Y) = P(mu | T, X, Y, sigma) * P(sigma | T, X, Y) P(T | X, Y) prop P(Y | T, X) * P(T | X)

Parameters:

tree (Tree)

Returns:

The log posterior probability.

Return type:

float

get_p_split(depth)[source]

Compute the probability of splitting a node at a given depth.

Parameters:

depth (int) – The node depth.

Returns:

The splitting probability.

Return type:

float

get_posterior_params_mu(leaf, new_sigma)[source]

Get the posterior parameters for mu for a leaf (regression).

Parameters:
  • leaf (Node) – The leaf node.

  • new_sigma (float) – The resampled sigma value.

Returns:

(m, std) for the normal full conditional.

Return type:

tuple

get_posterior_params_p(leaf)[source]

Get the posterior Dirichlet parameters for a leaf (classification).

Parameters:

leaf (Node)

Returns:

The updated Dirichlet parameters.

Return type:

NDArrayFloat

get_posterior_params_sigma(leaf)[source]

Get the posterior parameters for sigma for a leaf (regression).

Parameters:

leaf (Node)

Returns:

(a, scale) parameters for the inverse gamma full conditional.

Return type:

tuple

grow_copy(*args, light=False, **kwargs)[source]

Pick a leaf which has > node_min_size observations. Split it using the splitting rule. Return a copied tree.

Returns:

The new tree after the grow move.

Return type:

Tree

prune_copy(*args, light=False, **kwargs)[source]

Pick a node with two leaves as children and prune them. Return a copied tree.

Returns:

The new tree after the prune move.

Return type:

Tree

resample_leaf_params()[source]

Resample the leaf parameters from their full conditionals.

resample_mu_sigma()[source]

Resample mu and sigma for each leaf (regression).

resample_mu_sigma_old()[source]

(Deprecated) Resample mu and sigma using the old full conditional formulas.

Raises:

Exception – Always raises an exception indicating deprecation.

resample_p()[source]

Resample class probabilities for each leaf.

run()[source]

Run the MCMC algorithm for Bayesian CART.

Returns:

A dictionary containing stored trees, integrated likelihoods, terminal region counts, cache counters, setup parameters, and data information.

Return type:

dict

sample_leaf_params()[source]

Sample new parameters for the leaves from the prior full conditionals.

Returns:

(left_leaf_params, right_leaf_params)

Return type:

tuple

sample_prior_mu(sigma)[source]

Sample an initial mu value from the normal prior.

Parameters:

sigma (float) – The sigma value used in the variance of the normal prior.

Returns:

The sampled mu.

Return type:

float

sample_prior_p()[source]

Sample initial class probabilities from a Dirichlet prior.

Returns:

The sampled probability vector.

Return type:

NDArrayFloat

sample_prior_sigma()[source]

Sample an initial sigma value from the inverse gamma prior.

Returns:

The sampled sigma.

Return type:

float

swap_copy(*args, light=False, **kwargs)[source]

Pick a parent-child of internal nodes and swap their splitting rules. If the otehr child has the same rule, then swap parent with both children. Return a copied tree.

Returns:

The new tree after the swap move.

Return type:

Tree

trans_prob_log(move, new_tree, old_tree)[source]

(Abstract in this base class) Compute the log transition probability for a move.

Parameters:
  • move (str) – Move type.

  • new_tree (Tree) – Proposed tree.

  • old_tree (Tree) – Current tree.

Returns:

The log transition probability.

Return type:

float

Raises:

AbstractMethodError – Always raised; to be implemented in a subclass.

update_tree(move)[source]

Evolve the tree based on the specified move.

Parameters:

move (str) – The move type (‘grow’, ‘prune’, ‘change’, ‘swap’).

Returns:

The new tree after applying the move.

Return type:

Tree

class bayescart.bcart.BCARTClassic(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)[source]

Bases: BCARTFast, BCARTClassicSlow

BCART implementation using the classic CGM98 algorithm.

Initialize the BCART object with model and MCMC parameters.

class bayescart.bcart.BCARTClassicSlow(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)[source]

Bases: BCART

A slower implementation of BCART following the original CGM98 algorithm.

This can be made more efficient by improving underlying operations in the BCART class.

Initialize the BCART object with model and MCMC parameters.

calc_change_swap_mh_data(transition_p)[source]

Compute and store move-specific information needed for the change/swap move acceptance ratio.

Parameters:

transition_p (float) – The transition probability ratio for the move.

calc_grow_prune_mh_data(new_tree, node_to_split, tot_parents_with_two_leaves, b)[source]

Compute and store move-specific information needed for the grow/prune move acceptance ratio.

Parameters:
  • new_tree (Tree) – The proposed new tree.

  • node_to_split (Node) – The node that was split (or pruned).

  • tot_parents_with_two_leaves (int) – Total number of internal nodes with two leaves.

  • b (int) – The number of leaves in the current tree.

trans_prob_log(move, new_tree, old_tree)[source]

(Abstract in this base class) Compute the log transition probability for a move.

Parameters:
  • move (str) – Move type.

  • new_tree (Tree) – Proposed tree.

  • old_tree (Tree) – Current tree.

Returns:

The log transition probability.

Return type:

float

Raises:

AbstractMethodError – Always raised; to be implemented in a subclass.

class bayescart.bcart.BCARTFast(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)[source]

Bases: BCART

Optimized version of BCART that uses the “fast” classes.

Optimizations: - Improved memory usage on copy operations (“fast” classes) - Using custom sampling functions instead of scipy’s for speed Optimized for memory.

Initialize the BCART object with model and MCMC parameters.

__init__(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)[source]

Initialize the BCART object with model and MCMC parameters.

get_log_posterior_prob(tree)[source]

Compute posterior probability for a tree (up to norming constant) P( heta, T | X, Y) prop P( heta | T, X, Y) * P(T | X, Y) P( heta | T, X, Y) = P(mu | T, X, Y, sigma) * P(sigma | T, X, Y) P(T | X, Y) prop P(Y | T, X) * P(T | X)

Return type:

float

resample_mu_sigma()[source]

Resample mu and sigma for each leaf (regression).

sample_prior_sigma()[source]

Sample an initial sigma value from the inverse gamma prior.

Returns:

The sampled sigma.

Return type:

float

update_tree(move)[source]

Evolve the tree based on the specified move.

Parameters:

move (str) – The move type (‘grow’, ‘prune’, ‘change’, ‘swap’).

Returns:

The new tree after applying the move.

Return type:

Tree

bayescart.eval module

Utility functions for BCART experiments evaluation and analysis.

This module contains helper functions for computing tree probabilities, comparing trees from BCART runs and provide summary tables of estimated posterior probabilities for the most commonly visited trees. This can be used to gauge convergence.

bayescart.eval.calc_log_tree_prob(tree, run)[source]

Calculate the log prior probability of a tree given a run result.

Parameters:
  • tree (Tree) – The tree for which to compute the prior probability.

  • run (dict) – A run result dictionary containing configuration and data information.

Returns:

The log prior probability P(T|X) of the tree.

Return type:

float

bayescart.eval.calc_log_tree_prob_and_llik(tree, run)[source]

Calculate both the log prior probability and the integrated log-likelihood of a tree.

Parameters:
  • tree (Tree) – The tree for which to compute the probabilities.

  • run (dict) – A run result dictionary containing configuration and data information.

Returns:

A tuple (prior, llik) where ‘prior’ is the log prior probability and ‘llik’ is the integrated log-likelihood of the tree.

Return type:

tuple of float

bayescart.eval.calc_tree_llik(tree, run)[source]

Calculate the integrated log-likelihood of a tree given a run result.

Parameters:
  • tree (Tree) – The tree for which to compute the likelihood.

  • run (dict) – A run result dictionary containing configuration and data information.

Returns:

The integrated log-likelihood P(Y|X,T) of the tree.

Return type:

float

bayescart.eval.calc_tree_post_prob(tree, run)[source]

Calculate the (log) posterior probability of a tree given a run result.

Parameters:
  • tree (Tree) – The tree for which to compute the posterior probability.

  • run (dict) – A run result dictionary containing configuration and data information.

Returns:

The log posterior probability of the given tree.

Return type:

float

bayescart.eval.compare_trees(tree1, tree2, res_or_run, type='prob')[source]

Compare two trees to check if they are equivalent under a given mode.

There are different levels of comparison: - Basic, using buil-in tree comaprison (hard=0) - Probability, using data likelihood and tree prior (hard=1). Note that different likelihoods imply different partitions, with high probability. Instead, the prior should tell apart trees that have same partition but different structure so that the two trees are not probabilistically equivalent.

Parameters:
  • tree1 (Tree) – The first tree to compare.

  • tree2 (Tree) – The second tree to compare.

  • res_or_run (dict or list) – Either a single run result dictionary or a list of run results, used to extract necessary configuration and data.

  • type (str, optional) – Comparison type (‘basic’ or ‘prob’). Default is ‘prob’.

Returns:

True if the trees are considered equal under the chosen criteria, False otherwise.

Return type:

bool

bayescart.eval.produce_tree_table(res)[source]

Produce a summary table of tree posterior probability estimates across runs.

This function:
  1. Extracts the top 5 most frequent trees from each run.

  2. Pools all top trees and computes a unique set.

  3. Sorts the unique trees by their average empirical frequency (posterior probability) across runs.

  4. Constructs a table with each unique tree’s estimated frequency per run, mean, standard deviation, and number of terminal nodes.

Parameters:

res (list) – A list of run result dictionaries.

Returns:

A pandas DataFrame summarizing the frequency estimates for the most frequent trees.

Return type:

pd.DataFrame

bayescart.eval.summarize_trees(run, top_n=None, type='prob', plot=False)[source]

Summarize the most common trees from a single run.

The function extracts the stored ‘tree_store’ from the run result, summarizes the top trees using frequency of occurrence (by the chosen comparison type), and optionally plots each unique tree.

Parameters:
  • run (dict) – A run result dictionary containing a ‘tree_store’ key.

  • top_n (int or None, optional) – The number of top trees to extract. If None, all unique trees are returned.

  • type (str, optional) – The type of comparison to use (‘prob’ or ‘basic’). Default is ‘prob’.

  • plot (bool, optional) – If True, each unique tree is displayed using its show() method. Default is False.

Returns:

A tuple (freq_list, unique_trees) representing the frequencies and corresponding unique trees.

Return type:

tuple

bayescart.exceptions module

Custom exceptions for bayescart.

This module defines exceptions used by bayescart to signal errors in tree operations and abstract method implementations.

exception bayescart.exceptions.AbstractMethodError[source]

Bases: Exception

Exception raised when an abstract method has not been implemented.

exception bayescart.exceptions.InvalidTreeError[source]

Bases: Exception

Exception raised for errors in the tree structure or data.

exception bayescart.exceptions.NotImplementedError[source]

Bases: Exception

Exception raised when an abstract method has not been implemented.

bayescart.mytyping module

bayescart.node module

Node class for bayescart.

This module defines the Node class used to represent nodes in a tree. It extends the Node implementation from treelib.

class bayescart.node.Node(id, is_l, data, rng, debug)[source]

Bases: Node

Extended node class for Bayesian CART. Provides functionalities for adding data, splitting, and updating parameters.

This class is mostly a wrapper around the NodeData object, which handles the data and parameters associated with the node.

is_l

Flag indicating if this node is a left child.

Type:

bool

_data

The node data (parameters and associated data).

Type:

NodeData

_rng

The random number generator.

Type:

np.random.Generator

debug

If True, enable debug checks.

Type:

bool

_depth

The depth of the node.

Type:

int

Create a new Node object to be placed inside a Tree object

__init__(id, is_l, data, rng, debug)[source]

Create a new Node object to be placed inside a Tree object

calc_avail_split_and_vars()[source]

Compute the number of available variables for splitting, and the number of available splits for the current split variable.

Returns:

(number of available variables, number of available splits)

Return type:

tuple

copy(light=False, no_data=False, memo=None)[source]
Return type:

Node

count_values()[source]

Count the of observations per class.

Returns:

Array of counts per class.

Return type:

NDArrayInt

property depth
get_available_splits(*args, **kw)[source]

Get available splits from the underlying NodeData.

Returns:

(avail_vars, is_cat)

Return type:

tuple

get_data_averages()[source]

Compute data averages (number of observations, mean, un-normalized variance) for regression.

Returns:

(n, mean, un-normalized variance)

Return type:

tuple

get_data_split(split_var=None, split_val=None)[source]

Split the node’s data using the current (or provided) split rule. Returns the left and right data subsets.

Parameters:
  • split_var (str or None, optional) – The splitting feature. If None, uses the current split_var.

  • split_val (Sequence[T] or T or None, optional) – The splitting value(s). If None, uses the current split_set.

Returns:

(left_X, right_X, left_y, right_y)

Return type:

tuple

get_new_split()[source]

Sample a new split for the node.

Returns:

(split_var, split_val)

Return type:

tuple

get_nobs()[source]
Return type:

int

get_params(print=False)[source]
Return type:

Any

get_preds()[source]

Get the predictions from the node’s model.

Returns:

(indices, predictions)

Return type:

tuple

get_split_data(split_var, split_val, left_params, right_params)[source]

Split the data at this node into two parts. Returns the children.

Parameters:
  • split_var (str) – The feature to split on.

  • split_val (Sequence[T] or T) – The split rule.

  • left_params (Any) – Parameters for the left child.

  • right_params (Any) – Parameters for the right child.

Returns:

(left_node_data, right_node_data)

Return type:

tuple

get_split_info()[source]

Retrieve the current split rule: split variable (str), split value (array if cat, float else), whether the split is categorical.

Returns:

(split_var, split_val, is_cat)

Return type:

tuple

get_true_preds()[source]

Return the true predictions (i.e. the actual response values).

Returns:

(indices, true response values)

Return type:

tuple

has_data()[source]
Return type:

bool

property id
is_split_rule_empty()[source]
Return type:

bool

reset_split_info()[source]
update_node_params(params)[source]
update_split_data(X, y)[source]
update_split_info(split_var, split_val)[source]
class bayescart.node.NodeFast(id, is_l, data, rng, debug)[source]

Bases: Node

Fast implementation of Node with optimized copy operations.

Create a new Node object to be placed inside a Tree object

copy(light=False, no_data=False, memo=None)[source]

Copy the node, with an option for a light (optimized) copy.

Change: Instead of thoroughly copying the object, just copy what we know we need.

Return type:

Node

bayescart.node_data module

Node data classes for bayescart.

This module defines classes that encapsulate the data associated with a tree node. It provides a base class NodeData along with subclasses for regression (NodeDataRegression) and classification (NodeDataClassification).

class bayescart.node_data.NodeData(X, y, rng, debug, node_min_size, split_var=None, split_set=None, is_cat_split=None)[source]

Bases: object

Abstract base class for data associated with a node in the CART tree.

X

The data features at this node.

Type:

pd.DataFrame

y

The response variable at this node.

Type:

pd.Series

rng

Random number generator for sampling.

Type:

np.random.Generator

debug

If True, enables additional debugging checks.

Type:

bool

node_min_size

Minimum number of observations required in the node.

Type:

int

split_var

The variable used for splitting (if any).

Type:

str

split_set

The value(s) used in the splitting rule.

Type:

Sequence[T] or T

is_cat_split

Indicator whether the split is categorical.

Type:

bool

avail_splits

Cached available splits (initially None).

Type:

Any

property X: DataFrame
__init__(X, y, rng, debug, node_min_size, split_var=None, split_set=None, is_cat_split=None)[source]
calc_avail_split_and_vars()[source]

Compute the number of available variables and the number of available splits for the current split variable.

Returns:

A tuple (n_avail_vars, n_splits).

Return type:

tuple

copy(light=False, no_data=False, memo=None)[source]
Return type:

NodeData

count_values()[source]

Compute the number of observations per class (classification only).

Returns:

Array of counts for each class.

Return type:

NDArrayInt

Raises:

AbstractMethodError – Always raised; should be implemented in a subclass.

get_available_splits(force_eval=False, assert_cached=False)[source]

For each feature in X, return the available unique splits.

Parameters:
  • force_eval (bool, optional) – If True, force re-evaluation of available splits (default is False).

  • assert_cached (bool, optional) – If True, raise an error if a cached value is not available (default is False).

Returns:

A tuple containing:
  • A dictionary mapping feature names to sorted unique available split values.

  • A dictionary mapping feature names to booleans indicating if the feature is categorical.

Return type:

tuple

get_data_averages()[source]

Compute the number of observations, the mean, and un-normalized variance (regression only).

Returns:

(n, mean, un-normalized variance)

Return type:

tuple

Raises:

AbstractMethodError – Always raised; should be implemented in a subclass.

get_data_split(split_var, split_val)[source]

Split the data into left and right subsets based on the split rule. Returns the left and right data subsets.

Parameters:
  • split_var (str) – The feature on which to split.

  • split_val (Sequence[T] or T) – The split value(s).

Returns:

A tuple (left_X, right_X, left_y, right_y).

Return type:

tuple

get_nobs()[source]
Return type:

int

static get_nobs_from_arg(x)[source]

Get the number of observations from a DataFrame or Series.

Parameters:

x (pd.DataFrame or pd.Series)

Returns:

The number of rows (observations).

Return type:

int

get_params(print=False)[source]

Retrieve the node-specific parameters (e.g., mu and sigma for regression, or p for classification).

Parameters:

print (bool, optional) – If True, return a formatted string (default is False).

Returns:

The node parameters.

Return type:

Any

Raises:

AbstractMethodError – Always raised; should be implemented in a subclass.

get_preds()[source]

Compute the model predictions (e.g., posterior mean).

Returns:

A tuple (indices, predictions).

Return type:

tuple

Raises:

AbstractMethodError – Always raised; should be implemented in a subclass.

get_split_data(split_var, split_val, left_params, right_params)[source]

Split the current node’s data into two subsets (left and right) based on a given split rule and generate new NodeData objects for the children.

Parameters:
  • split_var (str) – The feature on which to split.

  • split_val (Sequence[T] or T) – The split value(s).

  • left_params (Any) – Parameters for the left child.

  • right_params (Any) – Parameters for the right child.

Returns:

A tuple (l_node_data, r_node_data) corresponding to the left and right NodeData objects.

Return type:

tuple

Raises:

AbstractMethodError – Always raised; this method should be implemented in a subclass.

get_split_set(print=False)[source]

Get the splitting value(s) for this node.

Parameters:

print (bool, optional) – If True, return a formatted string description (default is False).

Returns:

The splitting value(s) or a description thereof.

Return type:

Sequence[T] or T

get_split_var(print=False)[source]
Return type:

str

has_data()[source]
Return type:

bool

is_split_rule_emtpy()[source]
Return type:

bool

reset_avail_splits()[source]

Reset the cached available splits. Necessary whenever the underlying dataset changes

reset_split_info()[source]
sample_split(split_var, avail_vals)[source]
Return type:

tuple[bool, Union[Sequence[TypeVar(T, str, int, float)], TypeVar(T, str, int, float)]]

update_node_params(params)[source]

Update the node parameters with new values.

Parameters:

params (tuple[float, float] or NDArrayFloat) – The new parameter values.

Raises:

AbstractMethodError – Always raised; should be implemented in a subclass.

update_split_data(X, y)[source]
update_split_info(split_var, split_val)[source]
property y: Series
class bayescart.node_data.NodeDataClassification(p, *args, **kwargs)[source]

Bases: NodeData

Node data class for classification trees.

p

The probability vector for the classes.

Type:

NDArrayFloat

__init__(p, *args, **kwargs)[source]
count_values()[source]

Compute the number of observations per class (classification only).

Returns:

Array of counts for each class.

Return type:

NDArrayInt

Raises:

AbstractMethodError – Always raised; should be implemented in a subclass.

get_params(print=False)[source]

Retrieve the node-specific parameters (e.g., mu and sigma for regression, or p for classification).

Parameters:

print (bool, optional) – If True, return a formatted string (default is False).

Returns:

The node parameters.

Return type:

Any

Raises:

AbstractMethodError – Always raised; should be implemented in a subclass.

get_split_data(split_var, split_val, left_params, right_params)[source]

Split the data for classification and generate new NodeDataClassification objects.

Parameters:
  • split_var (str) – The feature to split on.

  • split_val (Sequence[T] or T) – The splitting rule.

  • left_params (NDArrayFloat) – New class probability vector for the left child.

  • right_params (NDArrayFloat) – New class probability vector for the right child.

Returns:

(l_node_data, r_node_data)

Return type:

tuple

update_node_params(params)[source]

Update the node parameters with new values.

Parameters:

params (tuple[float, float] or NDArrayFloat) – The new parameter values.

Raises:

AbstractMethodError – Always raised; should be implemented in a subclass.

class bayescart.node_data.NodeDataClassificationFast(p, *args, **kwargs)[source]

Bases: NodeDataFast, NodeDataClassification

copy(*args, **kwargs)[source]

Instead of thoroughly copying the object, just copy what we know we need.

class bayescart.node_data.NodeDataFast(X, y, rng, debug, node_min_size, split_var=None, split_set=None, is_cat_split=None)[source]

Bases: NodeData

Efficient implementation of NodeData, reduces the cost of copy operations.

copy(light=False, no_data=False, memo=None)[source]

Instead of thoroughly copying the object, just copy what we know we need.

Return type:

NodeData

class bayescart.node_data.NodeDataRegression(mu, sigma, *args, **kwargs)[source]

Bases: NodeData

Node data class for regression trees.

mu

The mean parameter.

Type:

float

sigma

The standard deviation parameter.

Type:

float

__init__(mu, sigma, *args, **kwargs)[source]
get_data_averages()[source]

Compute the number of observations, mean, and un-normalized variance for regression.

Returns:

(n, mean, uvar)

Return type:

tuple

get_mu(print=False)[source]
Return type:

float

get_params(print=False)[source]

Retrieve the node-specific parameters (e.g., mu and sigma for regression, or p for classification).

Parameters:

print (bool, optional) – If True, return a formatted string (default is False).

Returns:

The node parameters.

Return type:

Any

Raises:

AbstractMethodError – Always raised; should be implemented in a subclass.

get_preds()[source]

Get predictions for regression.

Returns:

(indices, predicted constant value equal to mu)

Return type:

tuple

get_sigma(print=False)[source]
Return type:

float

get_split_data(split_var, split_val, left_params, right_params)[source]

Split the data for regression and generate new NodeDataRegression objects for the left and right children.

Parameters:
  • split_var (str) – The feature to split on.

  • split_val (Sequence[T] or T) – The splitting threshold (or set).

  • left_params (tuple[float, float]) – Parameters (mu, sigma) for the left child.

  • right_params (tuple[float, float]) – Parameters (mu, sigma) for the right child.

Returns:

(l_node_data, r_node_data)

Return type:

tuple

update_node_params(params)[source]

Update the node parameters with new values.

Parameters:

params (tuple[float, float] or NDArrayFloat) – The new parameter values.

Raises:

AbstractMethodError – Always raised; should be implemented in a subclass.

class bayescart.node_data.NodeDataRegressionFast(mu, sigma, *args, **kwargs)[source]

Bases: NodeDataFast, NodeDataRegression

copy(*args, **kwargs)[source]

Instead of thoroughly copying the object, just copy what we know we need.

get_data_averages()[source]

Compute the number of observations, mean, and un-normalized variance using fast routines.

Returns:

(n, mean, uvar)

Return type:

tuple

bayescart.parallel_tempering_base module

Bayesian CART (BCART) models solved using parallel tempering.

This module implements the parallel tempering algorithm for addressing the multimodality of BCART posterior space. This is an abstract base class. Subclasses will implement the specific parallel tempering logic, such as how to apply flattening of the posterior distribution.

class bayescart.parallel_tempering_base.BCARTPT(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)[source]

Bases: BCARTClassic

Base class for Parallel Tempering variants of BCART.

Supports staggered activation of chains through steps_per_chain. This can help warmer chains to converge before adding more chains closer to the cold one (target chain).

Parameters:
  • temps (Sequence[float] or NDArrayFloat) – List of temperatures for each chain. One of the ends must be 1.

  • steps_per_chain (Sequence[int] or NDArray[int] or None, optional) – Number of iterations to run before unlocking additional chains.

  • swap_type (str, optional) – Swap type: ‘stochastic’ or deterministic (default ‘stochastic’).

Initialize the BCART object with model and MCMC parameters.

__init__(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)

Initialize the BCART object with model and MCMC parameters.

get_acceptance_prob(new_tree, move)[source]

Compute the acceptance probability for a move in the parallel tempering framework.

Parameters:
  • new_tree (Tree) – The proposed tree.

  • move (str) – The move type.

Returns:

The acceptance probability.

Return type:

float

get_swap_acceptance_prob(tree_from, tree_to)[source]

Compute the swap acceptance probability between two chains.

Parameters:
  • tree_from (Tree) – The tree from one chain.

  • tree_to (Tree) – The tree from the adjacent chain.

Returns:

The swap acceptance probability.

Return type:

float

Raises:

NotImplementedError – This method must be implemented in subclasses.

run()[source]

Run the Parallel Tempering MCMC sampler.

Returns:

A dictionary containing MCMC results and tempering statistics.

Return type:

dict

tempering = ''

bayescart.tempering_geom_lik module

Bayesian CART (BCART) models solved using geometric-likelihood parallel tempering. Only the likelihood is flattened, thereby preserving prior information on the importnance of different tree sizes. The prior is used instrumentally to guide the search over small trees.

Disentangling the likelihood from the prior leads to sensible analytical simplifications and performacne gains (also in terms of convergence). However, this approach loses the ability to leverage prior inforamtion, as it is now used to constrain smaller trees in heated chains.

This module implements the geometric-likelihood parallel tempering algorithm for addressing the multimodality of BCART posterior space.

class bayescart.tempering_geom_lik.BCARTGeomLik(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)[source]

Bases: BCARTPT

Geometric-Likelihood parallel tempering implementation.

In this variant, only the likelihood is flattened, while the prior remains unchanged.

Initialize the BCART object with model and MCMC parameters.

__init__(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)

Initialize the BCART object with model and MCMC parameters.

get_swap_acceptance_prob(tree_from, tree_to)[source]

Compute the swap acceptance probability for the geometric-likelihood variant.

Parameters:
Returns:

The swap acceptance probability.

Return type:

float

bayescart.tempering_geometric module

Bayesian CART (BCART) models solved using geometric parallel tempering, the whole posterior distribution is flattened uniformly.

This module implements the geometric parallel tempering algorithm for addressing the multimodality of BCART posterior space.

class bayescart.tempering_geometric.BCARTGeom(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)[source]

Bases: BCARTPT

Geometric parallel tempering implementation of Bayesian CART.

In this variant, the posterior of each chain is raised to a power controlled by the temperature. This yields a flattening effect. Chains furthest from the original one (t=1) are designed to explore the space fully. However, this leads to the search over very big trees, which are computationally expensive. It also slows down the convergence of the original chain (beyond runtime, because it is difficult to undo big trees).

Initialize the BCART object with model and MCMC parameters.

__init__(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)

Initialize the BCART object with model and MCMC parameters.

calc_change_swap_mh_data(transition_p)[source]

Compute and store move-specific information needed for the change/swap move acceptance ratio.

Parameters:

transition_p (float) – The transition probability ratio for the move.

calc_grow_prune_mh_data(new_tree, node_to_split, tot_parents_with_two_leaves, b)[source]

Compute and store move-specific information needed for the grow/prune move acceptance ratio.

Parameters:
  • new_tree (Tree) – The proposed new tree.

  • node_to_split (Node) – The node that was split (or pruned).

  • tot_parents_with_two_leaves (int) – Total number of internal nodes with two leaves.

  • b (int) – The number of leaves in the current tree.

get_swap_acceptance_prob(tree_from, tree_to)[source]

Compute the acceptance probability for a swap between chains in the geometric tempering variant.

Parameters:
Returns:

The swap acceptance probability.

Return type:

float

trans_prob_log(move, new_tree, old_tree)[source]

(Abstract in this base class) Compute the log transition probability for a move.

Parameters:
  • move (str) – Move type.

  • new_tree (Tree) – Proposed tree.

  • old_tree (Tree) – Current tree.

Returns:

The log transition probability.

Return type:

float

Raises:

AbstractMethodError – Always raised; to be implemented in a subclass.

bayescart.tempering_pseudo_prior module

Bayesian CART (BCART) models solved using tempering with a pseudo-prior. Only the likelihood is flattened, thereby preserving prior information on the importnance of different tree sizes. The prior is left unchanged as it should be used to inform the model with previous information. Instead, a pseudo-prior is used instrumentally to guide the search over small trees.

We take as pseudo prior the same CGM98 prior to benefit from substantial cancellations in computing acceptance rates.

This module implements the tempering with a pseudo-prior algorithm for addressing the multimodality of BCART posterior space.

class bayescart.tempering_pseudo_prior.BCARTPseudoPrior(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)[source]

Bases: BCARTPT

Bayesian CART with a pseudo-prior used to bias the search towards smaller trees.

The pseudo-prior is used instrumentally in computing the acceptance probabilities.

Parameters:
  • pprior_alpha (float, optional) – Pseudo-prior alpha parameter (default 0.95).

  • pprior_beta (float, optional) – Pseudo-prior beta parameter (default 1).

Initialize the BCART object with model and MCMC parameters.

__init__(X, y, alpha=0.95, beta=0.5, mu_0=None, sigma_0=None, nu=3, lambd=0.1, mu_bar=0, a=0.3, node_min_size=5, alpha_dirichlet=1.0, iters=1250, burnin=250, thinning=1, store_tree_spacing=10, max_stored_trees=2000, move_prob=[0.25, 0.25, 0.25, 0.25], verbose='', seed=45, debug=False, light=True)

Initialize the BCART object with model and MCMC parameters.

calc_log_p_split(tree, alpha, beta)[source]

Compute the log prior probability of the tree structure (ignoring the specific split values).

Parameters:
  • tree (Tree)

  • alpha (float)

  • beta (float)

Returns:

The computed log prior probability of the tree structure.

Return type:

float

get_acceptance_prob(new_tree, move)[source]

Compute the acceptance probability for a move using the pseudo-prior adjustment.

Parameters:
  • new_tree (Tree)

  • move (str)

Returns:

The acceptance probability.

Return type:

float

get_swap_acceptance_prob(tree_from, tree_to)[source]

Compute the swap acceptance probability for the pseudo-prior variant.

Parameters:
Returns:

The swap acceptance probability.

Return type:

float

run()[source]

Run the Parallel Tempering MCMC sampler.

Returns:

A dictionary containing MCMC results and tempering statistics.

Return type:

dict

trans_prob_log_pseudoprior(move, new_tree, old_tree)[source]

Compute the log transition probability for a move using the pseudo-prior.

Parameters:
  • move (str)

  • new_tree (Tree)

  • old_tree (Tree)

Returns:

The log transition probability.

Return type:

float

bayescart.tree module

Tree class for bayescart.

This module defines the Tree class that extends treelib’s Tree to support operations needed for Bayesian CART such as applying splits, copying subtrees, and computing prior probabilities.

class bayescart.tree.Tree(root_node_data, rng, node_min_size, debug)[source]

Bases: Tree

Extended tree class for Bayesian CART that builds on the treelib Tree.

This class maintains a counter for node IDs and provides additional methods for sampling leaves, copying trees, and applying splits.

id_counter

Counter for unique node identifiers.

Type:

int

rng

Random generator used for sampling.

Type:

np.random.Generator

node_min_size

Minimum number of observations per node.

Type:

int

debug

If True, enables additional assertions.

Type:

bool

llik

Cached integrated log-likelihood of the tree.

Type:

float or None

log_tree_prior_prob

Cached log prior probability of the tree.

Type:

float or None

temperature

The current temperature for tempering.

Type:

float

Initiate a new tree or copy another tree with a shallow or deep copy.

__init__(root_node_data, rng, node_min_size, debug)[source]

Initiate a new tree or copy another tree with a shallow or deep copy.

add_node(data, is_l, parent=None)[source]

Add a new node object to the tree and make the parent as the root by default.

The ‘node’ parameter refers to an instance of Class::Node.

Return type:

Node

apply_split(node, split_var, split_val, l_leaf_params, r_leaf_params)[source]

Apply a split to a leaf node by generating two children nodes.

Parameters:
  • node (Node) – The leaf node to be split.

  • split_var (str) – The feature to split on.

  • split_val (Sequence[T] or T) – The split value(s).

  • l_leaf_params (Any) – Parameters for the left child.

  • r_leaf_params (Any) – Parameters for the right child.

Raises:

InvalidTreeError – If any of the resulting children have fewer observations than the minimum.

check_split_struct(node)[source]

Check that the splitting rules along the subtree starting at a given node do not produce empty splits.

While this function does not guarantee that the split is valid, its fast to execute and can filter some obvious incompatibilities.

Parameters:

node (Node) – The root of the subtree to check.

Raises:

InvalidTreeError – If an empty split is encountered.

copy(light=False, no_data=False, memo=None)[source]

Deep copy the tree with all node info. If light, don’t deep-copy (X,y) at each node.

Return type:

Tree

get_children(node)[source]

Get the children of the specified node.

Parameters:

node (Node or int) – The node or its identifier.

Returns:

A list of child nodes (ordered as left then right).

Return type:

list

Raises:

ValueError – If the node does not have exactly two children (in non-leaf cases).

get_leaves()[source]

Return all the leaf nodes of the tree.

Returns:

A list of leaf nodes.

Return type:

list

get_n_leaves()[source]

Get the number of leaf nodes.

Returns:

The count of leaves.

Return type:

int

get_node(node_id)[source]

Get the object of the node with ID of nid.

An alternative way is using ‘[]’ operation on the tree. But small difference exists between them: get_node() will return None if nid is absent, whereas ‘[]’ will raise KeyError.

Return type:

Node

get_nonleaf_nodes(filter_root=False)[source]
Return type:

list[Node]

get_parent(node)[source]
Return type:

Node

get_parents_with_two_leaves()[source]

Get all internal nodes that have exactly two leaves as children.

Returns:

List of nodes satisfying the condition.

Return type:

list

get_root()[source]

Return the root node of the tree.

Returns:

The root node.

Return type:

Node

get_sibling(node)[source]
Return type:

Node

is_equal(other, hard=0, categories=None)[source]

Check if two trees are equal in structure and (optionally) in parameters.

The hard parameter controls how strict the testing should be. Hard = 0 checks they have the same structure and same splitting variables. Hard = 1 additionally checks the splitting values. Hard = 2 expects same structure, same splits, same data in nodes. Hard = 3 expects equality also for leaf parameters.

Parameters:
  • other (Tree) – The tree to compare with.

  • hard (int, optional) – Level of strictness (0: structure and split variable; 1: includes split values; 2: includes data; 3: also leaf parameters) (default 0).

  • categories (dict[str, set] or None, optional) – Mapping of categorical variable values, if needed.

Returns:

True if the trees are equal under the chosen criteria.

Return type:

bool

is_stump()[source]
Return type:

bool

is_valid()[source]

Check whether the tree is valid.

The function recursively checks each node for logical consistency of splits, data assignments, and node properties.

Returns:

True if the tree passes all validity checks.

Return type:

bool

node_class

alias of Node

remove_node(node)[source]

Remove a node indicated by ‘identifier’ with all its successors. Return the number of removed nodes.

Return type:

int

sample_leaf(node_min_size)[source]

Sample a leaf node at random from the tree that has at least node_min_size observations.

Parameters:

node_min_size (int) – Minimum required observations.

Returns:

A randomly chosen leaf node.

Return type:

Node

Raises:

InvalidTreeError – If no leaf with sufficient observations is found.

show()[source]

Print the tree. Update all the tags first. Tags are not automatically updated for performance speed.

Returns:

The string representation of the tree.

Return type:

str

update_split(node, split_var, split_val)[source]

Change the splitting rule for the current node. Update the data subset of the children, and of all the descendants recursively.

Parameters:
  • node (Node) – The node to update.

  • split_var (str) – The new splitting variable.

  • split_val (Sequence[T] or T) – The new splitting value(s).

update_subtree_data(node)[source]

Update the data splits for all descendants of a given node.

Parameters:

node (Node) – The node whose subtree data will be updated.

class bayescart.tree.TreeFast(root_node_data, rng, node_min_size, debug)[source]

Bases: Tree

Fast implementation of Tree using NodeFast as the underlying node class. Tree copy is also improved.

Initiate a new tree or copy another tree with a shallow or deep copy.

copy(light=False, no_data=False, memo=None)[source]

Optimized copy of the tree.

Return type:

Tree

node_class

alias of NodeFast

update_subtree_data(node)[source]

Update the subtree data with a fast preliminary check before performing the full update.

The preliminary check does not count how many observations fall in each node; just whether the splits make logical sense. This is fast. If so, proceed with the proper, expensive check.

bayescart.utils module

Utility functions for bayescart.

This module provides helper functions for sampling from distributions, computing log-probability densities, and a generic choice function.

bayescart.utils.dirichlet_logpdf(x, alpha)[source]

Compute the log probability density function of a Dirichlet distribution.

Parameters:
  • x (array-like) – The point (vector) at which to evaluate the log-pdf.

  • alpha (array-like) – The concentration parameters of the Dirichlet distribution.

Returns:

The log probability density.

Return type:

float

bayescart.utils.invgamma_logpdf(x, a, scale)[source]

Compute the log probability density function of an inverse gamma distribution.

Parameters:
  • x (float) – The point at which to evaluate the log-pdf.

  • a (float) – The shape parameter.

  • scale (float) – The scale parameter.

Returns:

The log probability density.

Return type:

float

bayescart.utils.invgamma_rvs(a, scale, rng)[source]

Sample a random variate from an inverse gamma distribution.

Parameters:
  • a (float) – The shape parameter (adjusted by 1/2 in this context).

  • scale (float) – The scale parameter.

  • rng (np.random.Generator) – The random number generator.

Returns:

A random variate from the inverse gamma distribution, scaled accordingly.

Return type:

float

bayescart.utils.my_choice(rng, a, replace=False, p=None)[source]

Sample a random element from a generic sequence using the provided random generator.

Parameters:
  • rng (np.random.Generator) – The random number generator.

  • a (Sequence[elem]) – The sequence to sample from.

  • replace (bool, optional) – Whether the sampling is done with replacement (default is False).

  • p (Sequence[float] or None, optional) – The probability weights associated with each element (default is None).

Returns:

A randomly selected element from the sequence.

Return type:

elem

bayescart.utils.norm_logpdf(x, a, scale)[source]

Compute the log probability density function of a normal distribution.

Parameters:
  • x (float) – The point at which to evaluate the log-pdf.

  • a (float) – The mean of the distribution.

  • scale (float) – The standard deviation of the distribution.

Returns:

The log probability density.

Return type:

float

bayescart.utils.plot_chain_comm(res, title='')[source]

Plot the distribution of terminal regions across the chains from a PT run.

This function uses the stored parallel tempering (PT) statistics (the number of terminal nodes per chain) from the run results and plots them as histograms. It also prints the final swap acceptance probabilities.

Parameters:
  • res (dict) – A run result dictionary containing ‘PT_swap_stats’ with key ‘PT_term_reg’ and ‘PT_swap_final_prob’.

  • title (str, optional) – The title of the plot (default is ‘’).

Return type:

None

bayescart.utils.plot_hists(res, idx_range=(0, 1), title='')[source]

Plot the distribution of terminal nodes for the cold chain across runs.

The function extracts the stored terminal node counts from each run in ‘res’ and plots histograms for a selected fraction of the post-burnin iterations, where idx_range is given as (min_percentage, max_percentage).

Parameters:
  • res (list or sequence of dict) – A list of run result dictionaries, each containing a ‘setup’ key and a ‘tree_term_reg’ key.

  • idx_range (tuple, optional) – A tuple (min_percentage, max_percentage) specifying the range (as fractions) of the available post-burnin observations to use for plotting. Default is (0,1).

  • title (str, optional) – The title of the plot (default is ‘’).

Return type:

None

bayescart.utils.plot_swap_prob(res, title='')[source]

Plot the evolution of swap acceptance probabilities over time for PT chains.

Parameters:
  • res (dict) – A run result dictionary containing ‘PT_swap_stats’ with key ‘PT_swap_prob_over_time’, which is a mapping from iteration numbers to swap acceptance probabilities.

  • title (str, optional) – The title of the plot (default is ‘’).

Return type:

None

bayescart.utils.sim_cgm98(n, rng)[source]

Simulate data following the CGM98 example.

Parameters:
  • n (int) – Number of observations.

  • rng (np.random.Generator) – Random generator.

Returns:

(X, y) where X is a DataFrame with features ‘v1’ and ‘v2’, and y is a Series.

Return type:

tuple

Module contents

bayescart: A Python package for Bayesian CART with tempering.

This package provides classes and functions to build, sample, and evaluate Bayesian CART models with advanced tempering strategies. The package is composed of modules for node data handling, tree structures, Bayesian CART sampling, and utility functions.

Available objects:
  • BCARTClassic, BCARTPT, BCARTGeom, BCARTGeomLik, BCARTPseudoPrior

  • Tree

  • Node

  • NodeData, NodeDataRegression, NodeDataClassification

  • Utility functions and plotting functions

  • Evaluation and functions for analysis

  • Exceptions: InvalidTreeError, AbstractMethodError