Document Type
Dissertation
Degree
Doctor of Philosophy
Major
Mathematics
Date of Defense
6-26-2025
Graduate Advisor
Dr. Haiyan Cai
Committee
Dr. Haiyan Cai
Dr. David Covert
Dr. Adrian Clingher
Dr. Wenjie He
Abstract
Tree ensemble methods such as Random Forests and Boosted Trees have introduced a range of variable importance statistics, offering powerful tools for feature selection. The advent of knockoff filters marked a significant advancement by combining the use of these variable importance statistics with the ability to control the False Discovery Rate (FDR). However, achieving a low FDR frequently comes at the cost of a high False Negative Rate (FNR), limiting the power of such approaches. In this work, we propose a novel method for leveraging knockoff variables to keep both FDR and FNR low. While this method does not have a specific mechanism to control the FDR, for many data sets the method produces results with lower FDR and FNR. Our approach builds upon established techniques for knockoff variable construction and incorporates a comparative analysis of variable importance measures derived from tree ensemble models. We introduce a new variable selection strategy and demonstrate its performance relative to existing methods. We call this strategy the Positive Difference Algorithm.
Recommended Citation
Ehlman, Nicholas, "Variable Importance, Knockoff Filters, and Improving False Discovery and False Negative Rates" (2025). Dissertations. 1520.
https://irl.umsl.edu/dissertation/1520