Compton Online Analysis

From PREX Wiki
Jump to: navigation, search

This page is written for people with little background on how the compton online analysis scripts work, but want to know more. The analyzer itself is maintained by Juan Carlos Cornejo, but the online scripts are maintained by AJ Zec. This page will focus on the latter.

The Directories

There are three main directories pertaining to the online analysis, which can be found on the compton machine in /home/compton/online/:

  1. CompMon/: This directory contains the CompMon analyzer code, as well as the scripts that produce the runwise online plots. Also contains code to display a panguin-style window of those plots.
  2. aggregator/: Contains the code used to make a series of plots for each snail.
  3. grand/: Contains the code for making runwise and snailwise plots as well as the compton grand rootfile.

The Main Script: online.sh

The online script is run whenever a new production run is completed. The script is run as such: ./online.sh -r <run number>. The run number argument is the only mandatory argument for the script.

Arguments

online.sh has a number of optional arguments that can be used.

  1. --panguin: Including this argument brings up a panguin-style window of the compton run plots.
  2. --rootfile: During the online plotting process a rootfile containing run plots gets generated and then deleted. Invoking this flag means the file does not get deleted after execution.
  3. --nowebupload: By default the run plots get turned into PDFs and moved into the compton web directory /group/prex/analysis/www/<prex2/crex>/compton/. Invoking this flag skips that process.
  4. --replay: By default the online script skips re-running this analyzer if a rootfile has already been generated for the specified run. Invoking this flag forces the analyzer to be re-run.
  5. --nogrand: By default the online script generates a rootfile which is later used in the generation of the grand rootfile for either PREX-II or CREX. Invoking this flag skips that process.

How does it work?

This is a list of scripts that online calls in order and a short summary of what each script does. Assume that the script online.sh is called with a run number argument referred to as runNum:

  1. Check if a directory for our web uploaded plots exists. If not, create one
    • The online plots are stored in the directory /group/prex/analysis/www/<prex2/crex>/compton/ depending on which experiment they are for. This directory is stored in the environment variable $COMPMON_WEB which can be changed from prex to crex mode. In each of those directories is a folder called runs/ which contains a folder for each individual run in the experiment which itself contains the plots for that run.
    • The script looks for a folder named like $COMPMON_WEB/runs/Run<runNum>/ and if it can't find it, it creates it.
  2. ./compmon.sh -r runNum: If no analyzed rootfile exists for the run OR the --replay flag is invoked, then the script will call the compmon analyzer to replay the run.
  3. root -l -b -q $COMPMON_LASERCYCLES/laserCycles.C(runNum): This is the laser cycle identification script found in CompMon/laserCycles/. This script parses the run rootfile sequentially, identifies laser periods (defined as any continuous group of MPSs sharing the same laser state) and groups them into laser cycles (defined as any three consecutive laser periods following a laser off-laser on-laser off pattern.)
  4. root -q -b -l ./dataQualityCheck.C(runNum): This script makes the online plots for the run and stores them in a rootfile located in CompMon/output/. The rootfile is named compton_online_run_<runNum>.root. It is normally deleted at the end of execution.
  5. python $COMPMON_PANGUIN/macros/writeCFG.py runNum: This is only called if the --panguin flag is invoked. This script writes a config file with the correct run information that panguin uses. AUTHOR'S NOTE: I tried doing it the regular way but I never got it to work, and frankly it's not worth the time investing into more because it's an aesthetic issue that almost never comes up.
  6. $COMPMON_PANGUIN/build/panguin -f $COMPMON_PANGUIN/macros/prex_auto_runs.cfg -r $runNum: This is only called if the --panguin flag is invoked. This is what actually displays the panguin window using the auto-generated config file above.
  7. root -q -b -l ./writeToPDF.C(runNum): This script is a wrapper for the code in utils.h which takes the plots in the rootfile generated by dataQualityCheck.C and turns them into a set of multi-page PDFs, and then places them in the correct $COMPMON_WEB directory.
  8. root -l -b -q ./plotAllCycles.C(runNum): This script generates eight plots for every laser cycle in the run and groups them all into a single page PDF and adds it to the correct web directory.
  9. python ./write_html.py $DATE $TIME index.html: This script creates an HTML file in the $COMPMON_WEB directory that can be accessed from a browser.
    • Steps 7-9 are skipped if
  10. root -l -q $COMPMON_GRAND/buildRunRootfile.C(runNum): Creates a rootfile in the grand/ directory which will later be used when building the grand rootfile.

Laser Cycle Identification: laserCycles.C

The first non-analyzer script that gets called, this identifies the laser cycles in the run and stores their MPS number limits in a csv file. The script heavily uses the header laserUtils.h to define easy data structures in which to store laser periods and laser cycles, as well as evaluate them.

The Algorithm Described Verbally

  1. Determine the laserState of the first pattern in the quartetwise tree.
  2. Begin iterating through the tree until you find an entry with a different laserState than all the previous entries.
  3. Once you find such an entry, then store the entry number at which the change happened, and start a new laser period in the new laserState. Also count the number of beam on entries. Save for later.
  4. Continue until you have iterated through the entire run. Store the list of laser period limits in a vector.
  5. Loop through the laser period list. Throw out any periods which:
    • Are less than 3 seconds long
    • Have laserState==4
  6. Also throw out the first laser period only if it is not a laser-off period.
  7. Loop through the trimmed laser period list. If two subsequent laser periods now have the same laserState, combine them by taking the start MPS of the first period and the end MPS of the last period and putting them together into the same period.
  8. Create a new vector of the trimmed and combined laser periods.
  9. Loop through the shortened laser period list. Look for any three consecutive laser periods in an OFF-ON-OFF pattern. Define it as the "candidate" cycle
  10. Ignore the candidate cycle if:
    • The laser pattern is wrong or misidentified
    • There are 3 seconds or less of beam on laser on data
    • There are 3 seconds or less of beam off laser on data in either of the laser off periods.
    • If the separation between the laser on period limits and either laser off period's limits is greater than 10 seconds
    • The event range of the cycle is wholly or partially in a region that has been cut by mapfiles
  11. If the cycle passes all cuts, add it to the laser cycle list. Continue until all laser periods have been examined.
  12. Write the cycle MPS limits in a csv file
    • File format is one line for each laser cycle where each line contains data: first off period start MPS, first off period end MPS, on period start MPS, on period end MPS, last off period start MPS, last off period end MPS

Making Run Plots: dataQualityCheck.C

This macro makes plots for the run number specified and stores them in a rootfile in the CompMon directory. The macro does this sequentially, accessing one variable at a time, looping sequentially over the tree that has this variable in it. The order of plots is:

  1. EPICS variables (Date, time, IHWP state, table position, etc.)
  2. Runwise tree plots (FADC pedestal, laser & BCM cutoffs, etc.)
  3. Snapshot data
  4. Triggered sum spectra
  5. 1-D histos of mpswise variables, namely Acc0
  6. 2-D histos of mpswise variables (Acc0, BCM, CavPower, BPMs)
  7. 1-D histos of quartetwise variables (helicity-correlated Acc0, differences, sums, background det asyms, etc.)
  8. 2-D histos of quartetwise variables (same as above)

Note: all italicized steps are plotted broken down by laser state and beam state in every combination thereof.

Once plots are made, they are saved in a rootfile in the file CompMon/output/compton_online_run_<runNum>.root. NOTE: these are not the same plots that appear on the webpage. They merely act as a base. They are formatted properly in utils.h.

Snapshots and Pulse Shape Discrimination

The primary snapshot plot made is the sum vs peak height plot. Neither of these values is calculated in CompMon so the calculation is done in dataQualityCheck.C. The process is the following:

  1. Loop through the first forty samples of the snapshot
    1. Store the pedestal sum as pre_ped store the minimum pedestal sample as min_ped and the maximum pedestal sample as max_ped
  2. Loop through the last forty samples of the snapshot
    1. Store the second pedestal sum as post_ped. If any samples exceed the limits previously set in min_ped and max_ped then overwrite them with the new limits.
  3. Calculate pedestal as pedestal = (pre_ped + post_ped)/(2*numberSamples)
  4. Loop once more through the middle 220 samples of the snapshot
    1. Sum the samples using sum += pedestal - snapshot[i]
    2. Store the minimum sample value as minY, store the number of the minimum sample value as minX, store the maximum sample value as maxY

Once the key metrics of the snapshot are obtained, the script uses them to make cuts to select "healthy snapshots." That is to say snapshots with a good pedestal, and don't have pileup. Those cuts are:

  1. Cut if maxY - pedestal >= pedestal - minY
    • Likely means the snapshot is just noise that got through
  2. Cut if minY == 0
    • Pulse is saturating the FADC, no good
  3. Cut if Abs(pre_ped - post_ped)/pedestal >= 0.03 OR pedestal >= 3900 OR min_ped <= minY
    • Checks to see of the pre and post pedestal windows agree with each other. Also checks that the electronics pedestal is a believable value. Also checks if the biggest peak of the snapshot window is in the pedestal regions. Failing any of these usually means there is a pileup pulse in the pedestal window and the pedestal is incorrect.
  4. Cut if Abs(max_ped - min_ped) >= 50
    • Another check on the average pedestal width. Normally this figure is less than 10. If there's pileup in the pedestal window, this will fail.
  5. Cut if snapClock<=10e3 && snapClock>=803e3
    • If the snapshot happens very early or late in the mps we don't generally trust the FADC's response, so we cut it.

The peak height vs sum plot is made from snapshots that survive these cuts.

The First Asymmetry Calculation

In the function quartet_graphs there is a section of code where the background-subtracted asymmetries are first calculated. Though these asymmetries are not used later, they make sure our asyms vs time graphs agree with our actual analysis. I will talk more when I discuss the calc_polarization function, but suffice to say it follows the same calculation and pattern in that function. The basic calculation of asymmetry is done for each point both Acc0 and Acc4 as such: ((Acc0/NAcc0)_LasOn_+ - (Acc0/NAcc0)_LasOn_-)/((Acc0/NAcc0)_LasOn_+ + (Acc0/NAcc0)_LasOn_-) - <(Acc0/NAcc0)_LasOff_+ + (Acc0/NAcc0)_LasOff_->).

The Panguin Window writeCFG.py and plotPanguin.C

On the rare occasion that we want to use the panguin window (AJZ: I usually use it for quick debugging when I don't want to generate 20 different files) this script will be run. Panguin requires a config file to run. This config file invokes a macro which generates the plot based on the selected window. AJZ NOTE: I really tried to make it so panguin could be invoked with a static macro, but I found that I couldn't invoke the run number correctly and after the umpteenth segfault, decided to use a script to generate a macro for panguin to run.

The writeCFG.py generates a macro with the filename CompMon/panguin/macros/prex_autos_runs.cfg which invokes the script CompMon/panguin/macros/plotPanguin.C which acts as a wrapper for CompMon/utils.h. That file will be discussed in the next section so I won't do it here, however the function for generating PDFs of plots is nearly identical.

Writing PDFs of Plots writeToPDF.C and utils.h

The point of this section of code is to take the plots generated by dataQualityCheck.C and stored in CompMon/output/compton_online_run_<runNum>.root and aesthetically reformats it to make the plots that go on the webpage in the $COMPMON_WEB folder.

I won't dwell too much on the functionality of writeToPDF.C because it is a wrapper for utils.h with very little original programming. The only thing the macro does on its own is invoke the Linux convert command to turn the PDFs into PNG's and then back into a multi-page PDF again. AJZ NOTE: This may seem circuitous but saves more than 90% disk space.

Dissecting utils.h

This is where the magic formatting and arranging plots actually happens.

Q: Why is it a header file and not its own macro?

AJZ Answer: I made this design choice because originally I had two plotting macros with almost identical functionality (plotPanguin.C and <writeToPDF.C</code>). Rather than having to constantly edit the two of them sequentially for every minor change, I extracted the code base they shared and put it in a header both these macros include.

utils.h is a collection of functions that take a TPad as an argument and fill that TPad with plots from CompMon/output/compton_online_run_<runNum>.root carefully colored, formatted and arranged to be visually attractive as well as useful. The methods are

  • essential_stats_pad and essential_stats_2_pad: Creates the PDF that goes under the "Essential Stats" link on the webpage.
    • NOTE: Every pad gets its own function. Because the essential stats PDF has two pages with non-identical information I have two functions, one for each page. This is a recurring pattern you'll see throughout this macro.
  • snapshots_pad and snapshots_pad_2: Creates the PDF that goes under the "Snapshot Plots" link on the webpage.
  • breakdown_pad: This function creates a pad with four plots of one single variable. Any variable that is broken down by both beam and laser state can be plotted by this function. The four plots it makes are 1D histograms of the invoked variable for
    1. No cuts
    2. Beam OFF
    3. Beam ON Laser ON
    4. Beam ON Laser OFF
    • Worth noting this breakdown method is used for both triggered sums and Acc0.
    • This function makes the PDF in the "Triggered Sums" link as well as the first page of the PDF in the "Acc0/NAcc0" link.
  • acc0_time_pad: This function takes the TGraphs of the Acc0 variables and creates a pad of them. The cut format follows the same format as breakdown_pad. It produces the second page of the PDF under the "Acc0/NAcc0" link.
  • quartet_pad: This pad is for plotting an arbitrary variable from the quartetwise tree, that is not background-subtracted. The result is four 1D histos that follow the breakdown_pad cut pattern. This produces the series of plots found under the "Multiplet Variables" link.
  • asym_pad: This function calculates background-subtracted asymmetry variables and the polarization for the run. This one is specifically made to handle multiple quartetwise variables and divides explicitly based on laser ON and laser OFF states. This produces the 1D histograms under the "Multiplet Asymmetries" link
    • NOTE: while you CAN plot asymmetries with quartet_pad the asymmetries will not be background-subtracted and will be pretty useless.
    • The calc_polarization function is arguably the one at the core of the compton online analysis and will get its own section later.
  • asym_graph_pad: This function plots the second page of "Multiplet Asymmetries." It's an asymmetry graph taking advantage of the calculation done in dataQualityCheck::quartet_graphs. Plots in the same pattern as asym_pad except for the last plot which contains BCM info.
  • detector_asyms and detector_rates: Plots the raw beam on asymmetry seen by background detectors, as well as their rates vs time. Plots displayed under the "Background Detectors" link.

The calc_polarization Function

This is where the polarization gets calculated. This function is not the only place where polarization gets calculated, it also gets calculated in the grand rootfile macros, but those are for a different page. This one calculates polarization measured over the entire run, instead of on a cycle-to-cycle basis. Up to this point the analysis is still blind to laser cycles.

How It Works

  1. From the plot rootfile, access the plots containing info on helicity correlated differences and sums for both laser ON and OFF states.
  2. Use the plots to get mean and mean error for variables: DIFF_ON, SUM_ON, DIFF_OFF, SUM_OFF.
  3. The mean asymmetry is defined by ASYM_ON = <DIFF_ON/(SUM_ON - <SUM_OFF>)>.
    • There is no background subtraction in this method because a run period is generally too long to get a realistic idea of background asymmetry (if it's nonzero.) In the case of a nonzero background asymmetry, the polarization reported here will be wrong, and you should consult the grand rootfile.
  4. The polarization is POL = ASYM_ON/getAnalyzingPower().
    • The getAnalyzingPower will likely be updated with more specific numbers in the near future.

== Introducing Cycles: plotAllCycles.C

The last link on the analysis webpage is titled "Laser Cycles" and leads to a PDF that has 2 pages of plots per laser cycle identified in the run. Up until now the online analysis has not used the laser cycles generated by laserCycles.C but now it's time.

A Laser Cycle Structure

The structure is defined more deeply described in the section on laser cycle generation but to review a laser cycle is three laser periods. A laser cycle:

  • is one laser OFF period followed by one laser ON period, followed by one laser OFF period
  • is defined by six MPS's, the numbers of each MPS at the start and end of each period
  • has at least 3 seconds of beam on data in all periods
  • has less than 10 seconds separating subsequent periods

A laser cycle may, but not necessarily:

  • Overlap laser OFF periods with previous or subsequent cycles
  • Have a beam trip in them
  • Contain laser UNK data within the laser periods

Plots Made

The script plotAllCycles.C makes eight plots per each laser cycle. They are:

  1. A 1D histogram of multiplet differences for each laser period in the cycle
  2. A 1D histogram of multiplet sums for each laser period in the cycle
  3. A 1D histogram of multiplet asymmetries for each period in the cycle
    • In this plot the laser ON and OFF periods are treated slightly differently. The asymmetry for the laser ON period is calculated as (Acc0_ON+ - Acc0_ON-)/(Acc0_ON+ + Acc0_ON- - <Acc0_OFF+ + Acc0_OFF->)
    • The asymmetry for the laser OFF periods is calculated as (Acc0_OFF+ - Acc0_OFF-)/(<Acc0_ON+ + Acc0_ON-> - <Acc0_OFF+ + Acc0_OFF->)
    • The denominators for each asymmetry are different, although approximately equal, to keep the laser OFF asymmetry from being unphysically large.
  4. A TGraph of the Acc0 values for the cycle
  5. A TGraph of laserState for the cycle
  6. A TGraph of beamState values for the cycle
  7. A TGraph of laser power for the cycle
  8. A TGraph of BCM readback value for the cycle

These plots are all combined into one PDF under the "Laser Cycles" link.

Making a Webpage write_html.py

In order to auto-update the online analysis webpage, this script is run which generates an HTML file with the links to production runs. NOTE: This file puts up links ONLY to runs that have been grouped into snails. Any recent or ungrouped runs will need to be accessed manually by the "All Runs" link at the top.

The script works by looking for any file that matches the pattern aggregator/snails/snail<snailNum>.list and reading out all the runs in all those snails. Then the script puts together a string of HTML blocks formatted readably so that the links to the correct analysis files is preserved.

This HTML page is the one read at http://prex.jlab.org/analysis/<prex2/crex>/compton/.

Building the Grand Rootfile: buildRunRootfile.C

This file builds a rootfile for the run in the grand/runPlots/ folder which stores info that will go into the grand rootfile later.

This macro will be discussed in more detail on the compton grand analysis page.