WAC Notes April 27 2021

From PREX Wiki
Revision as of 10:30, 29 April 2021 by Cameronc (Talk | contribs) (April 27th 2021)

Jump to: navigation, search
RCDB HALOG HAPLOG DocDB Online Prompt BlueJeans Link Runlist spreadsheet, respin 1 spreadsheet

PREX Main << Weekly Analysis Coordinator << WAC Notes CREX << WAC Post-Run_Meeting Notes

WAC Notes April 20 2021 << >> WAC Notes May 4 2021


Date and time: April 27, 2021, 11am
Bluejeans ID: 564945377
Meeting link: https://bluejeans.com/564945377
Runlist spreadsheet

April 27th 2021

  • Respin 2 is done:
    • Run 8321 still tries to read the second split CODA data file
    • Not convinced every single minirun went through the aggregator for all 3 sets of cuts (ErrorFlag==0, include BMOD, BMOD only).
    • Everything else looks good, though it looks like a few runs may have run some of the plot-making steps 2 times simultaneously (due to the ifarm pause and restart over the weekend)?
  • Data quality checks:
    • Check each run with significant cut changes or fixed problems work as expected
    • Check slug plots by eye at least, spot check different parts of the experiment's run plots
    • Did every burst/minirun get treated by the aggregator?
    • Redo BMOD data quality check and compare respin2 bmod slopes with respin1 outputs (especially since the main detector pedestals shifted by ~1% or so, expect mapfile slopes to be wrong by that much)
    • Beam trip analysis for respin2, compare with prompt and respin1 (should be almost perfectly identical to respin1, other than the new runs included or excluded due to RCDB changes)
    • Mul-plots (one big histogram)
    • Another pass of the BPMs vs. BPMs regression with respin2 dataset (and also careful look at BCM DDs)
    • Eigenvector regression again
    • Implement lagrange regression after updating BMOD data cuts and segments from respin2 run list
    • Overloaded regression totally fails for slugs 183-185, and really struggles for 186-187

strange slurm errors

ifarmlog_8395.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
Binary file ifarmlog_8223.err matches
summary_ifarmlog_8103.err:slurmstepd: error: *** JOB 31175283 ON farm180244 CANCELLED AT 2021-04-23T13:54:11 ***
summary_ifarmlog_8102.err:slurmstepd: error: *** JOB 31175282 ON farm180244 CANCELLED AT 2021-04-23T13:54:11 ***
ifarmlog_8266.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_7424.err:slurmstepd: error: _is_a_lwp: open() /proc/62619/status failed: No such file or directory
ifarmlog_8321.err:slurmstepd: error: Detected 1 oom-kill event(s) in step 31186990.batch cgroup. Some of your processes may have been killed by the cgroup out-of- memory handler.
ifarmlog_8246.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8264.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8298.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8243.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8215.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8244.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8265.err:slurmstepd: error: get_exit_code task 0 died by signal: 9

April 20th 2021

Respin 2 is going

  • Looks good so far
  • Fixed an issue with bmod slopes (had the wrong sign for correction)

March 30th 2021

  • BCM and BPM Pedestals get a once over: http://ace.phys.virginia.edu/HAPPEX/4507
  • Ifarm prompt-install is set up for respin2 (almost finished - still need to pull crex-respin2-alpha and recompile JAPAN).
  • Run lists updated - still need to cross check with python sql database interacting scripts
  • Other items from before are all addressed (?)

New RCDB/runlist updates

  • Cameron has added the label "Good" to a bunch of worthwhile calibration and pedestal runs that need to be kept around for the future (listed at the bottom)
  • Cameron has changed some runs to improve our coverage of nearly-production conditions and remove "Test" from our set of respin2 runs
    • Change 5907 Test NeedCut to Production Suspicious (it is a slow target burn in run on Ca48 at low current).
    • Change 6115 from Test Suspicious to Production Suspicious
    • Change 6525 from Test Good to Production Bad (as there is some fatal blinder/target move issue and the run is very short and has messed up FFB/coil7 response)
    • Change 8367, 8388, and 8389 from Test Good to Junk Suspicious (Carbon beam off-ramp ups)
    • Change 6038-6043 from Test Good to Junk Good (DAQ testing runs)
  • Cameron and Weibin triple checked 2 C++ scripts and a python script, after cleaning up the rcdb a bit more and utilize the following cut:
run_flag = NeedCut,Suspicious,Good, run_type = Calibration,Pedestal,Parityscan,Production and not cut on target
    • This yields 1583 total runs, including some that are "Home" target (pedestal scans, etc.)
    • 1404 are CREX Production (so 179 are Calibration, Pedestal and Parityscan) with 56 runs being AT only
    • 1392 are CREX Production Good
    • Python version /u/group/halla/parity/software/japan_offline/prompt/prex-prompt/WAC/auto_run_list/crex-respin2/CREX-All-run-list.list
    • C++ version /u/group/halla/parity/software/japan_offline/prompt/prex-prompt/prex-runlist/crex-respin2/all_nonJunk.list

March 23rd and before

WAC Notes March 23 2021