WAC Notes April 27 2021
From PREX Wiki
RCDB | HALOG | HAPLOG | DocDB | Online Prompt | BlueJeans Link | Runlist spreadsheet, respin 1 spreadsheet |
---|
PREX Main << Weekly Analysis Coordinator << WAC Notes CREX << WAC Post-Run_Meeting Notes
WAC Notes April 20 2021 << >> WAC Notes May 4 2021
Contents
Organization
Date and time: April 27, 2021, 11am
Bluejeans ID: 564945377
Meeting link: https://bluejeans.com/564945377
Runlist spreadsheet
April 27th 2021
- Respin 2 is done:
- Run 8321 still tries to read the second split CODA data file
- Not convinced every single minirun went through the aggregator for all 3 sets of cuts (ErrorFlag==0, include BMOD, BMOD only).
- Everything else looks good, though it looks like a few runs may have run some of the plot-making steps 2 times simultaneously (due to the ifarm pause and restart over the weekend)?
- Data quality checks:
- Check each run with significant cut changes or fixed problems work as expected
- Check slug plots by eye at least, spot check different parts of the experiment's run plots
- Did every burst/minirun get treated by the aggregator?
- Redo BMOD data quality check and compare respin2 bmod slopes with respin1 outputs (especially since the main detector pedestals shifted by ~1% or so, expect mapfile slopes to be wrong by that much)
- Beam trip analysis for respin2, compare with prompt and respin1 (should be almost perfectly identical to respin1, other than the new runs included or excluded due to RCDB changes)
- Mul-plots (one big histogram)
- Another pass of the BPMs vs. BPMs regression with respin2 dataset (and also careful look at BCM DDs)
- Eigenvector regression again
- Implement lagrange regression after updating BMOD data cuts and segments from respin2 run list
- Overloaded regression totally fails for slugs 183-185, and really struggles for 186-187
- JAPAN Regression breaks too it seems for 183-185 (but postpan is fine?)
strange slurm errors
ifarmlog_8395.err:slurmstepd: error: get_exit_code task 0 died by signal: 9 Binary file ifarmlog_8223.err matches summary_ifarmlog_8103.err:slurmstepd: error: *** JOB 31175283 ON farm180244 CANCELLED AT 2021-04-23T13:54:11 *** summary_ifarmlog_8102.err:slurmstepd: error: *** JOB 31175282 ON farm180244 CANCELLED AT 2021-04-23T13:54:11 *** ifarmlog_8266.err:slurmstepd: error: get_exit_code task 0 died by signal: 9 ifarmlog_7424.err:slurmstepd: error: _is_a_lwp: open() /proc/62619/status failed: No such file or directory ifarmlog_8321.err:slurmstepd: error: Detected 1 oom-kill event(s) in step 31186990.batch cgroup. Some of your processes may have been killed by the cgroup out-of- memory handler. ifarmlog_8246.err:slurmstepd: error: get_exit_code task 0 died by signal: 9 ifarmlog_8264.err:slurmstepd: error: get_exit_code task 0 died by signal: 9 ifarmlog_8298.err:slurmstepd: error: get_exit_code task 0 died by signal: 9 ifarmlog_8243.err:slurmstepd: error: get_exit_code task 0 died by signal: 9 ifarmlog_8215.err:slurmstepd: error: get_exit_code task 0 died by signal: 9 ifarmlog_8244.err:slurmstepd: error: get_exit_code task 0 died by signal: 9 ifarmlog_8265.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
April 20th 2021
Respin 2 is going
- Looks good so far
- Fixed an issue with bmod slopes (had the wrong sign for correction)
March 30th 2021
- BCM and BPM Pedestals get a once over: http://ace.phys.virginia.edu/HAPPEX/4507
- Ifarm prompt-install is set up for respin2 (almost finished - still need to pull crex-respin2-alpha and recompile JAPAN).
- Run lists updated - still need to cross check with python sql database interacting scripts
- Other items from before are all addressed (?)
New RCDB/runlist updates
- Cameron has added the label "Good" to a bunch of worthwhile calibration and pedestal runs that need to be kept around for the future (listed at the bottom)
- Cameron has changed some runs to improve our coverage of nearly-production conditions and remove "Test" from our set of respin2 runs
- Change 5907 Test NeedCut to Production Suspicious (it is a slow target burn in run on Ca48 at low current).
- Change 6115 from Test Suspicious to Production Suspicious
- Change 6525 from Test Good to Production Bad (as there is some fatal blinder/target move issue and the run is very short and has messed up FFB/coil7 response)
- Change 8367, 8388, and 8389 from Test Good to Junk Suspicious (Carbon beam off-ramp ups)
- Change 6038-6043 from Test Good to Junk Good (DAQ testing runs)
- Cameron and Weibin triple checked 2 C++ scripts and a python script, after cleaning up the rcdb a bit more and utilize the following cut:
run_flag = NeedCut,Suspicious,Good, run_type = Calibration,Pedestal,Parityscan,Production and not cut on target
- This yields 1583 total runs, including some that are "Home" target (pedestal scans, etc.)
- 1404 are CREX Production (so 179 are Calibration, Pedestal and Parityscan) with 56 runs being AT only
- 1392 are CREX Production Good
- Python version /u/group/halla/parity/software/japan_offline/prompt/prex-prompt/WAC/auto_run_list/crex-respin2/CREX-All-run-list.list
- C++ version /u/group/halla/parity/software/japan_offline/prompt/prex-prompt/prex-runlist/crex-respin2/all_nonJunk.list