Skip to content

oblique random forests for classification and regression #1116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bcjaeger opened this issue May 3, 2024 · 3 comments
Closed

oblique random forests for classification and regression #1116

bcjaeger opened this issue May 3, 2024 · 3 comments

Comments

@bcjaeger
Copy link
Contributor

bcjaeger commented May 3, 2024

Hello!

aorsf has recently been updated to allow for oblique classification and regression forests. May I submit a PR that would add a classification and regression mode for the aorsf engine?

There are a few datasets where the oblique random forest is really helpful (e.g., modeldata::meats)

suppressPackageStartupMessages({
  library(modeldata)
  library(rsample)
  library(recipes)
  library(workflows)
  library(workflowsets)
  library(yardstick)
})
#> Warning: package 'modeldata' was built under R version 4.3.3
#> Warning: package 'yardstick' was built under R version 4.3.3

# load my branch
devtools::load_all(path = "D:/parsnip/")
#> ℹ Loading parsnip

meat_rec <- 
  recipe(protein ~ ., data = meats) %>%
  step_select(-water, -fat)

meat_folds <- vfold_cv(meats)

meat_models <- list(oblique = rand_forest(mode = 'regression', 
                                          engine = 'aorsf'),
                    axis = rand_forest(mode = 'regression',
                                       engine = 'ranger'),
                    xgb = boost_tree(mode = 'regression', 
                                     engine = 'xgboost',
                                     trees = 500))


workflows <- workflow_set(list(meat_rec), meat_models, cross = TRUE)

res <- workflows %>% 
  workflow_map("fit_resamples", 
               verbose = TRUE,
               resamples = meat_folds,
               metrics = metric_set(rsq))
#> i 1 of 3 resampling: recipe_oblique
#> ✔ 1 of 3 resampling: recipe_oblique (3.8s)
#> i 2 of 3 resampling: recipe_axis
#> ✔ 2 of 3 resampling: recipe_axis (2.1s)
#> i 3 of 3 resampling: recipe_xgb
#> ✔ 3 of 3 resampling: recipe_xgb (6.2s)

collect_metrics(res)
#> # A tibble: 3 × 9
#>   wflow_id       .config    preproc model .metric .estimator  mean     n std_err
#>   <chr>          <chr>      <chr>   <chr> <chr>   <chr>      <dbl> <int>   <dbl>
#> 1 recipe_oblique Preproces… recipe  rand… rsq     standard   0.944    10 0.00858
#> 2 recipe_axis    Preproces… recipe  rand… rsq     standard   0.529    10 0.0582 
#> 3 recipe_xgb     Preproces… recipe  boos… rsq     standard   0.524    10 0.0574

Created on 2024-05-03 with reprex v2.1.0

@simonpcouch
Copy link
Contributor

Duplicate of tidymodels/bonsai#73. Closing so as not to track duplicate issue, but we're certainly interested in making this happen!

Looks like you've got an implementation put together locally? I'd be more than happy to work with you to get this merged into bonsai if you're game to start a PR over there. :)

@bcjaeger
Copy link
Contributor Author

bcjaeger commented May 3, 2024

Thank you! I didn't know about bonsai, but it looks awesome. =] I will open a PR there soon.

Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators May 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants