Speaker
Description
ontent
In collider data analyses, fake or non-prompt backgrounds arise from events that do not genuinely meet the selection requirements of a signal region but still pass due to particle misidentification. Such cases can occur when leptons originate from secondary decays rather than the primary interaction point (non-prompt leptons), or when other reconstructed objects, like hadronic jets, are misidentified as leptons (fake leptons). To correct for these effects, one usually determines a scale factor — the so-called fake factor — and applies it as an event weight through a data-driven approach. Conventionally, this is achieved by binning data in a few relevant observables (e.g. pT, eta, MET) and extracting the fake factor as the ratio of two distributions.
In this contribution, we introduce a new method that employs density ratio estimation with a transformer-based neural network, trained directly on event-level features in high-dimensional space. This framework provides an event-by-event, continuous, and unbinned estimate of the fake factor, resulting in a more versatile and precise approach compared to traditional techniques.
Significance
The proposed approach demonstrates how machine learning can be used to evaluate general scale factors, not only fake factors. The outcome of the method is a neural network function that processes event-level information and predicts the corresponding scale factor in complex, high-dimensional feature spaces. The method can be used by any HEP analysis and is not experiment specific.
References
Systematic evaluation of generative machine learning capability to simulate distributions of observables at the Large Hadron Collider: https://link.springer.com/article/10.1140/epjc/s10052-024-13284-6
Normalizing Flows for Physics Data Analyses, presented at the Conference on Computing in High Energy and Nuclear Physics 2024: https://indico.cern.ch/event/1338689/contributions/6016108/