Document Type

Dissertation

Degree

Doctor of Philosophy

Major

Mathematics

Date of Defense

6-26-2025

Graduate Advisor

Dr. Haiyan Cai

Committee

Dr. Haiyan Cai

Dr. David Covert

Dr. Adrian Clingher

Dr. Wenjie He

Abstract

Tree ensemble methods such as Random Forests and Boosted Trees have introduced a range of variable importance statistics, offering powerful tools for feature selection. The advent of knockoff filters marked a significant advancement by combining the use of these variable importance statistics with the ability to control the False Discovery Rate (FDR). However, achieving a low FDR frequently comes at the cost of a high False Negative Rate (FNR), limiting the power of such approaches. In this work, we propose a novel method for leveraging knockoff variables to keep both FDR and FNR low. While this method does not have a specific mechanism to control the FDR, for many data sets the method produces results with lower FDR and FNR. Our approach builds upon established techniques for knockoff variable construction and incorporates a comparative analysis of variable importance measures derived from tree ensemble models. We introduce a new variable selection strategy and demonstrate its performance relative to existing methods. We call this strategy the Positive Difference Algorithm.

Share

COinS