WAC Notes May 4 2021

From PREX Wiki
Revision as of 16:15, 3 May 2021 by Cameronc (Talk | contribs) (May 4th 2021)

Jump to: navigation, search
RCDB HALOG HAPLOG DocDB Online Prompt BlueJeans Link Runlist spreadsheet, respin 1 spreadsheet

PREX Main << Weekly Analysis Coordinator << WAC Notes CREX << WAC Post-Run_Meeting Notes

WAC Notes April 27 2021 << >> WAC Notes May 11 2021


Date and time: May 4, 2021, 11am
Bluejeans ID: 564945377
Meeting link: https://bluejeans.com/564945377
Runlist spreadsheet

May 4th 2021

  • Respin2 is done, changes from respin1 are checked and any mistakes corrected in the runlist spreadsheet.
  • To do:
    • Update the RCDB information with updated flags (and redo those in aggregator to pick up the new flags)
    • Re-create grand run-list (all non-Junk and all "Good Production" separately) is necessary, as well as the accompanying slug-lists (for use in grand plots, etc.)
    • Use this updated "Good Production" list to do offline analyses:
      • Comparison of data (BCM, regressed main detector, etc.) between prompt, respin1, and respin2 (prompt and respin1 files exist already on volatile disk)
      • Post-trip data isolation (using the same outputs from above, and checking for any additional shift in Aq between respin1 and 2)
      • Hadd together beam modulation slopes into one place
      • Hadd together aggregator files into one place
      • Perform eigenvector analysis (minirunwise and then average over 3 parts of CREX)
      • Beam modulation, lagrange, and regression (overload and standard 5bpm) analyses using the 3-part averaged eigenvector basis
      • Slope comparisons between analyses
      • Beam corrections estimations from these analyses
      • Special aggregator check of Null asym (multiply each multiplet X helicity sign) and of sub-block analyses
      • BPMs vs. BPMs regression (checking for helicity pickup or bad resolution problems)
      • BCM double difference combinations checking

April 27th 2021

  • Respin 2 is done:
    • Run 8321 still tries to read the second split CODA data file
    • Not convinced every single minirun went through the aggregator for all 3 sets of cuts (ErrorFlag==0, include BMOD, BMOD only).
    • Everything else looks good, though it looks like a few runs may have run some of the plot-making steps 2 times simultaneously (due to the ifarm pause and restart over the weekend)?
  • Data quality checks:
    • Check each run with significant cut changes or fixed problems work as expected
    • Check slug plots by eye at least, spot check different parts of the experiment's run plots
    • Did every burst/minirun get treated by the aggregator?
    • Redo BMOD data quality check and compare respin2 bmod slopes with respin1 outputs (especially since the main detector pedestals shifted by ~1% or so, expect mapfile slopes to be wrong by that much)
    • Beam trip analysis for respin2, compare with prompt and respin1 (should be almost perfectly identical to respin1, other than the new runs included or excluded due to RCDB changes)
    • Mul-plots (one big histogram)
    • Another pass of the BPMs vs. BPMs regression with respin2 dataset (and also careful look at BCM DDs)
    • Eigenvector regression again
    • Implement lagrange regression after updating BMOD data cuts and segments from respin2 run list
    • Overloaded regression really struggles for 186-187
    • JAPAN Regression breaks for 183-185 because of a misplaced "0" in prex_maindet_pedestal.7414-7499.map

strange slurm errors

ifarmlog_8395.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
Binary file ifarmlog_8223.err matches
summary_ifarmlog_8103.err:slurmstepd: error: *** JOB 31175283 ON farm180244 CANCELLED AT 2021-04-23T13:54:11 ***
summary_ifarmlog_8102.err:slurmstepd: error: *** JOB 31175282 ON farm180244 CANCELLED AT 2021-04-23T13:54:11 ***
ifarmlog_8266.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_7424.err:slurmstepd: error: _is_a_lwp: open() /proc/62619/status failed: No such file or directory
ifarmlog_8321.err:slurmstepd: error: Detected 1 oom-kill event(s) in step 31186990.batch cgroup. Some of your processes may have been killed by the cgroup out-of- memory handler.
ifarmlog_8246.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8264.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8298.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8243.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8215.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8244.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8265.err:slurmstepd: error: get_exit_code task 0 died by signal: 9

April 20th 2021

Respin 2 is going

  • Looks good so far
  • Fixed an issue with bmod slopes (had the wrong sign for correction)

March 30th 2021

  • BCM and BPM Pedestals get a once over: http://ace.phys.virginia.edu/HAPPEX/4507
  • Ifarm prompt-install is set up for respin2 (almost finished - still need to pull crex-respin2-alpha and recompile JAPAN).
  • Run lists updated - still need to cross check with python sql database interacting scripts
  • Other items from before are all addressed (?)

New RCDB/runlist updates

  • Cameron has added the label "Good" to a bunch of worthwhile calibration and pedestal runs that need to be kept around for the future (listed at the bottom)
  • Cameron has changed some runs to improve our coverage of nearly-production conditions and remove "Test" from our set of respin2 runs
    • Change 5907 Test NeedCut to Production Suspicious (it is a slow target burn in run on Ca48 at low current).
    • Change 6115 from Test Suspicious to Production Suspicious
    • Change 6525 from Test Good to Production Bad (as there is some fatal blinder/target move issue and the run is very short and has messed up FFB/coil7 response)
    • Change 8367, 8388, and 8389 from Test Good to Junk Suspicious (Carbon beam off-ramp ups)
    • Change 6038-6043 from Test Good to Junk Good (DAQ testing runs)
  • Cameron and Weibin triple checked 2 C++ scripts and a python script, after cleaning up the rcdb a bit more and utilize the following cut:
run_flag = NeedCut,Suspicious,Good, run_type = Calibration,Pedestal,Parityscan,Production and not cut on target
    • This yields 1583 total runs, including some that are "Home" target (pedestal scans, etc.)
    • 1404 are CREX Production (so 179 are Calibration, Pedestal and Parityscan) with 56 runs being AT only
    • 1392 are CREX Production Good
    • Python version /u/group/halla/parity/software/japan_offline/prompt/prex-prompt/WAC/auto_run_list/crex-respin2/CREX-All-run-list.list
    • C++ version /u/group/halla/parity/software/japan_offline/prompt/prex-prompt/prex-runlist/crex-respin2/all_nonJunk.list

March 23rd and before

WAC Notes March 23 2021