DAQ Testing/20181217
From PREX Wiki
Jump to navigationJump to searchBack to Main Page >> DAQ Documentation Portal >> DAQ Testing >> DAQ Commissioning Notes
Previous Day of Testing << >> Next Day of Testing
December 17th, 2018 Testers: Cameron
Goals
- Test ET deadtime with et_monitor C++ executable, perform data flow stress tests at 30 Hz and accidental deadtime from bridge slowness at 1kHz
ET Deadtime
See also ET Bridge overview
This is what I did to perform precise and comprehensive tests of the ET deadtime at 30 Hz and 1 kHz:
- Use 30 Hz injector quartet helicity signals
- Modify default chunk and queue sizes for testing:
- set ~/bin/ETbridge_bgr queue to 1 (-q parameter, was 10,000 from previous tests)
- set ET bridge blocking on (remove -nb parameter)
- set ET bridge chunk size to 100 (default value = 100, with -c parameter)
- set (for these tests only) 32 and 64 bit ET queue (input buffer) lengths down to 30 instead of previously increased to 10,000 value (so they fill in 1 second instead of forever) - set these with -q parameter in ~/bin/startcoda and ~/bin/start2ndET
- Modify client (~/et-12.0/src/cameronc/client_main.C) settings for testing:
- set client into blocking mode by editing THaEtClient.C and changing "NONBLOCKING" to "BLOCKING" in ET station initializtion
- set client chunk size to 1 instead of default 100 value (this makes the client request a fresh set of events once for each event, instead of waiting several seconds and confusing the user/simple online analyzer code Bob wrote)
- Run online analyzer simple client using ./etclient [number of events] [time to "wait"] (and it will print the event number foreach event and complain if event numbers are skipped and calculate relative rates using scalers in the counting house DAQ, as described in prior testing)
- Check on helicity errors using ~/chkdtime/dtchk code on files in /adaq2/data1/apar/parity18_[run number].dat
Blocking Bridge, Blocking Client
- Run 4776 - test the PAN decoder, CODA setup (including vqwk ADCs), and online analyzer client all work
- Client ran at 16ms, which is faster than the 30 Hz flip rate
- Yes they all work, there are no helicity errors in tape data, no missed event numbers in online analyzer, and vqwk ADCs don't register any buffer read/write errors in the ROC during data collection either
- Run 4777 - test with a slow analyzer (analyzer speed < flip rate) - start with 66 ms per event (~2x slower than flip rate)
- There are no helicity errors in tape data, no missed event numbers in online analyzer, and vqwk ADCs don't error during data collection either
- Once the online analyzer client is killed the backed up events flow through the system unperturbed, even though the entire 64 and 32 bit ET systems had been fully backed all the way up
- Run 4778 - Test extremely slow analyzer (0.1 Hz analysis) to see how many events can be stored in extra buffer (The ROC's memory, according to Paul King, should be ~2k events or so)
- This definitely grinds both 64 and 32 bit ET systems to a complete halt
- No events flow through the ET systems or analyzer once all ET buffers are full, but once the analyzer is killed all of the events produced in the ROC come pouring out
- This is good from event 2000 (when analyzer was turned on) to event 7299, where one single ROC telnet connection output vqwk read/write buffer error is registered, and the helicity checker sees that this is exactly where the helicity sequence is violated and helicity errors begin
- The ROC buffer is on the order of 3k events, but further tests will see what is going on
- Run 4779 - Test to see if maybe its actually the client's input buffer (utilized explicitly during non-blocking mode)
- Set line 112 ("cue") of THaEtClient.C code to have 30 events long buffer instead of 10,000 as was previously increased to
- Event 1637 gives the vqwk error (seemingly unrelated to online analyzer client running)
- Hel errors
- Run 4780 - Redo 4779 test
- Made it to 2496 events before vqwk error this time
- All but the first 100 events were backed up into the ROC memory (so it is not the client "cue")
- The "normal" vqwk read/write buffer errors are too common for reliable testing of exactly how large the ROC memory is, but assuming ~2k events is probably safe for now
- Run 4781 - Test et bridge input buffer size and also test to see what the helicity and event numbers and total number of events look like when relying on ROC memory for a long time
- Edit et_bridge code and recompile to have 30 total events instead of 10,000 - open ~/et-12.0/bob/et32_2_et64.c and edit #NUMEVENTS line and recompile with make etbridge and then copy the executable over to ~/bin/etbridge (you may need to make clean and "cp ./orig/libevio.a ." and then remake)
- This still backs up both ETs as before, and ROC 23 still stays active for a few thousand events as before
- Now looking at extreme backed up ROC behavior I see in the rcgui that ROC 23 eventually hits a wall and reads "0" event data taking rate after long enough
- Interestingly once the ~few thousand event ADC read/write buffer error is reported (event 2494) the ROC will resume reading data (as reported in RC gui)
- It resumes and then the RC gui ROC event rate drops to zero again, and the ADC hiccup happens again resuming ROC event reading again (so its a cycle)
- Maybe this has to do with a chunk getting grabbed, or maybe it has to do with the ADCs legitimately having a problem sometimes, and probably related to starting or stopping writing events (the ADCs have the same kind of hiccup when beginning a new run very often, in fact so often that I told the helicity deadtime checking code to skip the first 30 events so that this issue doesn't confound its pattern recognizer)
- Still once you kill the client thousands of events come pouring through the ET system from the ROC
- Events are definitely skipped in these ADC hiccups, and the helicity sequence is broken
Blocking Bridge, non-Blocking Client
- Run 4782 - Testing blocking bridge and non-blocking client, and also testing if the blocking bridge chunk size affect ADC hiccup
- Set non-blocking client "cue" to 30, but ET bridge #NUMEVENTS back up to 10,000
- This should act like a blocking client station, since the cue length is the same as the total number of events in the 64 bit ET
- In fact it does act like a blocking client station
- Run 4783 - now set 64 bit ET client total number of events to 50 to be larger than client cue for proper non-blocking client station test (64 bit ET chunk still == 100 events here though, not a source of any problem I can see, this chunk is just a max n_events per request, not the minimum as in the case of client station chunk)
- Now grand central in 32 bit and 64 bit ET are both happy and see all of the events, no vqwk hiccups
- The non-blocking client analyzer does get bypassed when the analyzer speed is slower than the helicity flip rate, and this means no deadtime in the DAQ, but missed events in analyzer
- Run 4785 (4784 CODA failed) - see how harmful setting the chunk size of the ET bridge is for the ability of the bridge to transfer data at necessary rates from one ET to the other (still at 30 Hz injector supplied helicity information)
- Set ET Bridge chunk size to 1 (with -c parameter)
- This has no new affect, chunk = 1 is sufficient at 30 Hz (queue is still 30 from earlier)
- But the problem before was actually too small of a queue size (so too many events show up before the analyzer has a chance to use them, causing the queue to fill up and lose events - see prior testing notes)
- Run 4786 - test bridge chunk = 1 still and now bridge queue = 1 too to test transfer rate
- Appears to be fine, 30 Hz is slow enough to not be affected by slow bridge transfer
- Run 4787 - try 1 kHz helicity flip rate from Helicity control board now, using vxworks writeHelBoard command and editing integration time in ~/vxworks/halladaq6.boot script (same bridge chunk and queue = 1 scenario)
- No vqwk read errors and grand central keeps up with 1 kHz, helicity reads fine, but the vqwks are still glitching too much, even at the very beginning, to get a long helicity sequence going (skip to run 4790 for one that does work, with nothing changed to fix it)
- Run 4788 - try again with no analyzer - helicity works
- Run 4789 - try again with slow analyzer - helicity fails
- Run 4790 - try again with slow analyzer - helicity works now (so its just being obnoxious...), but does it just work because my analyzer was so slow that the chunk and queue size in the bridge being so small weren't even relevant for data transfer?
- Run 4791 - Try 1 kHz now with q = 1 and chunk = 10
- Early helicity error, try again later
- Run 4792 - Try again to avoid early helicity error
- Still has early error
- Fix helicity deadtime checking code by making it skip over the first 30 events anyway, so it doesn't matter any more whether the initial few events are good or bad
- Now checking the helicity I see no errors, but it is still the slow analyzer that would avoid q and chunk size problems
- Run 4793 - Now try a fast analyzer with queue = 1 and chunk = 10 in ET bridge
- It works, no helicity errors
- Run 4794 - Try again
- No helicity errors, but there are lots of vqwk errors in the serial port readout terminal (even after killing the client - we need to keep track of these vqwk errors or squash them permanently during the experiment, because this would be a killer probably for collecting proper data)
- Reboot to see if that fixes - it does
- Run 4795 - Retry 4794 after rebooting
- Works, no helicity errors, even though the online analyzer is definitely missing events, because the online analyzer is simply not able to receive all events it skips them, and the ET bridge doesn't care that it isn't receiving them, and there is no backing up in the 32 bit ET system
- So this means we don't need to worry about throttling through the ET bridge, it will happily not send all the events to the analyzer and even though it is in blocking mode won't panic that it isn't doing its job well enough
- This issue of having a large enough chunk and queue is still a problem for the analyzer getting all the events (at 1 kHz at least), and I still feel a bit funny about this conclusion, but it is safe for now, and we should just use large enough queue and chunk anyway, and probably not use blocking bridge since it is possibly dangerous anyway