WAC Notes May 4 2021

From PREX Wiki
Jump to: navigation, search
RCDB HALOG HAPLOG DocDB Online Prompt BlueJeans Link Runlist spreadsheet, respin 1 spreadsheet

PREX Main << Weekly Analysis Coordinator << WAC Notes CREX << WAC Post-Run_Meeting Notes

WAC Notes April 27 2021 << >> WAC Notes May 18 2021


Date and time: May 4, 2021, 11am
Bluejeans ID: 564945377
Meeting link: https://bluejeans.com/564945377
Runlist spreadsheet

May 4th 2021

  • Respin2 is done, changes from respin1 are checked and any mistakes corrected in the runlist spreadsheet.
  • Noteworthy features of grand plots:
    • AT detectors look bad in the grand plots (lrb_alldet_burst analysis) because of their blinding and movements for several slugs...
      • We should verify that the AT detectors are working in the Grand plots in the ranges that they are supposed to be (and then rerun the plots with a cut to ignore the blinded, etc. sections)
      • Exclude slugs <= 107 and >= 222 per old notes
      • There shouldn't be any more AT cut problems with the global error flag in place, unless it is a real problem with a singular matrix inversion failure
    • USL raw signal has a strong indication of a helicity correlated background for the last 20 or so slugs... warrants investigation (and similarly but not as obviously in US_dd).
      • This appears to correlate directly with the 4eY signal gaining a strong outlier series at the same time (so it is real in the beam likely)
      • Note that these BPM outlier points are extremely noticeable in the injector BPM signals
      • Also this appears to be due exactly to RTPAPOSVCNT being set to the Wien left values? Was this shift on purpose by Caryn to negate accumulated position difference? And the RTPPITA1CNT signal is non-trivial for these slugs as well (newly added to CODA EPICs list?)
    • All BCMs have a strong outlier for 1 slug (119) due to low current running that needs to be removed from the slug-list/slug-plots
    • Slug 123 is 1 run only - 4 miniruns - Right HRS running alone... probably easier on the plots to remove it completely?
    • Slugs 186 and 187 (first back from summer shut down and with known large FFB noise issues) appear to elude regression
      • This may be because regression really fails to remove extra or too fast noise, or there is a fundamental resolution limit on BPMs, or slopes are imprecise at a level where this large noise is too high
      • Overload regression is almost useless for these 2 slugs
      • SAM regression is even more negatively impacted
  • To do:
    • Update the RCDB information with updated flags (and redo those in aggregator to pick up the new flags)
    • Re-create grand run-list (all non-Junk and all "Good Production" separately) is necessary, as well as the accompanying slug-lists (for use in grand plots, etc.)
    • Use this updated "Good Production" list to do offline analyses:
      • Comparison of data (BCM, regressed main detector, etc.) between prompt, respin1, and respin2 (prompt and respin1 files exist already on volatile disk)
      • Post-trip data isolation (using the same outputs from above, and checking for any additional shift in Aq between respin1 and 2, or issues in position diffs or regressed asyms)
      • Hadd together beam modulation slopes into one place
      • Hadd together aggregator files into one place
      • Perform eigenvector analysis (minirunwise and then average over 3 parts of CREX)
      • Beam modulation, lagrange, and regression (overload and standard 5bpm) analyses using the 3-part averaged eigenvector basis
      • Slope comparisons between analyses
      • Beam corrections estimations from these analyses
      • Special aggregator check of Null asym (multiply each multiplet X helicity sign) and of sub-block analyses
      • BPMs vs. BPMs regression (checking for helicity pickup or bad resolution problems)
      • BCM double difference and resolution combinations checking
      • Regression of BCM against BPMs (included in existing slug plots - non-trivial slopes for some runs... warrants investigation)

April 27th 2021

  • Respin 2 is done:
    • Run 8321 still tries to read the second split CODA data file
    • Not convinced every single minirun went through the aggregator for all 3 sets of cuts (ErrorFlag==0, include BMOD, BMOD only).
    • Everything else looks good, though it looks like a few runs may have run some of the plot-making steps 2 times simultaneously (due to the ifarm pause and restart over the weekend)?
  • Data quality checks:
    • Check each run with significant cut changes or fixed problems work as expected
    • Check slug plots by eye at least, spot check different parts of the experiment's run plots
    • Did every burst/minirun get treated by the aggregator?
    • Redo BMOD data quality check and compare respin2 bmod slopes with respin1 outputs (especially since the main detector pedestals shifted by ~1% or so, expect mapfile slopes to be wrong by that much)
    • Beam trip analysis for respin2, compare with prompt and respin1 (should be almost perfectly identical to respin1, other than the new runs included or excluded due to RCDB changes)
    • Mul-plots (one big histogram)
    • Another pass of the BPMs vs. BPMs regression with respin2 dataset (and also careful look at BCM DDs)
    • Eigenvector regression again
    • Implement lagrange regression after updating BMOD data cuts and segments from respin2 run list
    • Overloaded regression really struggles for 186-187
    • JAPAN Regression breaks for 183-185 because of a misplaced "0" in prex_maindet_pedestal.7414-7499.map

strange slurm errors

ifarmlog_8395.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
Binary file ifarmlog_8223.err matches
summary_ifarmlog_8103.err:slurmstepd: error: *** JOB 31175283 ON farm180244 CANCELLED AT 2021-04-23T13:54:11 ***
summary_ifarmlog_8102.err:slurmstepd: error: *** JOB 31175282 ON farm180244 CANCELLED AT 2021-04-23T13:54:11 ***
ifarmlog_8266.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_7424.err:slurmstepd: error: _is_a_lwp: open() /proc/62619/status failed: No such file or directory
ifarmlog_8321.err:slurmstepd: error: Detected 1 oom-kill event(s) in step 31186990.batch cgroup. Some of your processes may have been killed by the cgroup out-of- memory handler.
ifarmlog_8246.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8264.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8298.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8243.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8215.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8244.err:slurmstepd: error: get_exit_code task 0 died by signal: 9
ifarmlog_8265.err:slurmstepd: error: get_exit_code task 0 died by signal: 9

April 20th 2021

Respin 2 is going

  • Looks good so far
  • Fixed an issue with bmod slopes (had the wrong sign for correction)

March 30th 2021

  • BCM and BPM Pedestals get a once over: http://ace.phys.virginia.edu/HAPPEX/4507
  • Ifarm prompt-install is set up for respin2 (almost finished - still need to pull crex-respin2-alpha and recompile JAPAN).
  • Run lists updated - still need to cross check with python sql database interacting scripts
  • Other items from before are all addressed (?)

New RCDB/runlist updates

  • Cameron has added the label "Good" to a bunch of worthwhile calibration and pedestal runs that need to be kept around for the future (listed at the bottom)
  • Cameron has changed some runs to improve our coverage of nearly-production conditions and remove "Test" from our set of respin2 runs
    • Change 5907 Test NeedCut to Production Suspicious (it is a slow target burn in run on Ca48 at low current).
    • Change 6115 from Test Suspicious to Production Suspicious
    • Change 6525 from Test Good to Production Bad (as there is some fatal blinder/target move issue and the run is very short and has messed up FFB/coil7 response)
    • Change 8367, 8388, and 8389 from Test Good to Junk Suspicious (Carbon beam off-ramp ups)
    • Change 6038-6043 from Test Good to Junk Good (DAQ testing runs)
  • Cameron and Weibin triple checked 2 C++ scripts and a python script, after cleaning up the rcdb a bit more and utilize the following cut:
run_flag = NeedCut,Suspicious,Good, run_type = Calibration,Pedestal,Parityscan,Production and not cut on target
    • This yields 1583 total runs, including some that are "Home" target (pedestal scans, etc.)
    • 1404 are CREX Production (so 179 are Calibration, Pedestal and Parityscan) with 56 runs being AT only
    • 1392 are CREX Production Good
    • Python version /u/group/halla/parity/software/japan_offline/prompt/prex-prompt/WAC/auto_run_list/crex-respin2/CREX-All-run-list.list
    • C++ version /u/group/halla/parity/software/japan_offline/prompt/prex-prompt/prex-runlist/crex-respin2/all_nonJunk.list

March 23rd and before

WAC Notes March 23 2021