Burn-in and the Extinction Model
Table of Contents
Overview
While the analysis (online or offline) is filtering data, each svd bin collects snr-$\chi^2$ statistics of the noise triggers found by templates in that svd bin.
A noise trigger is defined as a single-detector trigger (or single trigger) during a time when more than one detector was running (or coinc time).
An example plot of the snr-$\chi^2$ of an svd bin looks like this:
Simultaneously, each svd bin is also collecting foreground (zerolag) triggers.
The zerolag triggers are assigned Likelihood Ratios (LRs) using the snr-$\chi^2$ background for that bin (this is just one term among many to calculate the LR).
Now, we want to convert these LRs to False Alarm Rates (FARs).
This is done by creating fake noise triggers from every svd bin by randomly sampling the background for that bin, assigning random times, phases, etc, and calculating LRs for them.
The procedure is done by the gstlal_inspiral_calc_rank_pdfs
job in the offline case, and the gstlal_inspiral_marginalize_likelihoods_online
job in the online case.
Then, by comparing the background and foreground LR distribution, and by taking the livetime of the analysis into account, we can calculate the FAR for the foreground triggers, which represents the amount of time required to produce a trigger with the same LR purely from noise.
The Extinction Model
There are a few problems with this procedure: There is no reason to believe this random sampling procedure with which we draw the background follows the same LR distribution as the foreground does; and the foreground goes broadly through two clustering procedures, which the background does not. The extinction model fixes these problems. It does this by taking the background and zerolag LRs of every bin, and curve fitting the zerolag CCDF (complementary cumulative distribution function) of that svd bin to A(1 - e^(-c*N(L))), where A is a normalization constant, c represents the rate of clustering for that svd bin, and N(L) is the background CCDF function for that svd bin. This function simulates the effect of first round of clustering on the background CCDF, and the normalization ensures that the backgrounds form each svd bin are added together in the same proportions as the zerolags from those bins. Subsequently, to simulate the effect of the second round of clustering, this procedure is repeated again, using the clustered zerolag and summed-up background (both across svd bins).
This procedure happens at the end of the filtering and LR calculation for the offline mode, when all the LRS from all bins are available. Since this cannot be done at the end for offline, the procedure is a bit more complicated in this case.
Burn-in of the Online Analysis
The first step, before any of the extinction model calculations can happen is to collect enough snr-$\chi^2$ background statistics for each svd bin, so that each bin can reliably assign LRs to the triggers from that bin.
This is called the LR burn-in.
After that happens, the process of creating fake noise LRs can start.
The gstlal_inspiral_marginalize_likelihoods_online
job cycles through all svd bins, requests the RankingStat (also called DIST_STATS) from the corresponding inspiral job, and calculates the fake noise LRs for that bin.
If this bin had been processed earlier, it also loads the previously calculated noise LRs from disk (which are stored in the RankingStatPDF file, also called DIST_STAT_PDF), and add the two together.
It also requests the zerolag for that bin from the inspiral job and performs extinction for that bin.
If it was able to perform extinction, the extincted noise LRs get added to the marginalized PDF.
If more than 99% bins were able to be extinced (of 67% if the config.rank.fast-burnin
option was provided), the marginalized PDF is saved to disk, and gets used by the inspiral jobs to calculate FARs.
How to check for burn-in
You can check if LRs are burned in for any bin by looking at the LR history for that bin in the dashboard. Alternately, you can also use the script that checks burn in on the RankingStats (or do it yourself in python)
To check for FAR burn-in, you’ll first need to check how many bins were able to be extincted.
This can be done in the log files of the gstlal_inspiral_marginalize_likelihoods_online
.
If a bin was not able to be extincted, it will print out a line saying Skipping first-round extinction for dist_stat_pdfs/H1L1-0000_GSTLAL_DIST_STAT_PDFS-0-0.xml.gz, using an empty PDF instead
.
If the required number of bins were able to be extincted, you’ll see the marginalized RankingStatPDF file in the dist_stat-pdfs
dir.
Finally, you can use the script that checks burn in on that file, and if that is burned-in, the analysis as a whole is burned-in, and you are ready to turn on uploads (Though it’s a good idea to check if the FARs are stable before doing so).