DAQ Testing/20181217

From PREX Wiki
Revision as of 16:41, 20 December 2018 by Cameronc (talk | contribs)
Jump to navigationJump to search

Back to Main Page >> DAQ Documentation Portal >> DAQ Testing >> DAQ Commissioning Notes

Previous Day of Testing << >> Next Day of Testing

December 17th, 2018 Testers: Cameron

Goals

  • Test ET deadtime with et_monitor C++ executable, perform data flow stress tests at 30 Hz and accidental deadtime from bridge slowness at 1kHz

ET Deadtime

See also ET Bridge overview

This is what I did to perform precise and comprehensive tests of the ET deadtime at 30 Hz and 1 kHz:

  • Use 30 Hz injector quartet helicity signals
  • Modify default chunk and queue sizes for testing:
    • set ~/bin/ETbridge_bgr queue to 1 (-q parameter, was 10,000 from previous tests)
    • set ET bridge blocking on (remove -nb parameter)
    • set ET bridge chunk size to 100 (default value = 100, with -c parameter)
    • set (for these tests only) 32 and 64 bit ET queue (input buffer) lengths down to 30 instead of previously increased to 10,000 value (so they fill in 1 second instead of forever) - set these with -q parameter in ~/bin/startcoda and ~/bin/start2ndET
  • Modify client (~/et-12.0/src/cameronc/client_main.C) settings for testing:
    • set client into blocking mode by editing THaEtClient.C and changing "NONBLOCKING" to "BLOCKING" in ET station initializtion
    • set client chunk size to 1 instead of default 100 value (this makes the client request a fresh set of events once for each event, instead of waiting several seconds and confusing the user/simple online analyzer code Bob wrote)
  • Run online analyzer simple client using ./etclient [number of events] [time to "wait"] (and it will print the event number foreach event and complain if event numbers are skipped and calculate relative rates using scalers in the counting house DAQ, as described in prior testing)
  • Check on helicity errors using ~/chkdtime/dtchk code on files in /adaq2/data1/apar/parity18_[run number].dat

Blocking Bridge, Blocking Client

  • Run 4776 - test the PAN decoder, CODA setup (including vqwk ADCs), and online analyzer client all work
    • Client ran at 16ms, which is faster than the 30 Hz flip rate
    • Yes they all work, there are no helicity errors in tape data, no missed event numbers in online analyzer, and vqwk ADCs don't register any buffer read/write errors in the ROC during data collection either
  • Run 4777 - test with a slow analyzer (analyzer speed < flip rate) - start with 66 ms per event (~2x slower than flip rate)
    • There are no helicity errors in tape data, no missed event numbers in online analyzer, and vqwk ADCs don't error during data collection either
    • Once the online analyzer client is killed the backed up events flow through the system unperturbed, even though the entire 64 and 32 bit ET systems had been fully backed all the way up
  • Run 4778 - Test extremely slow analyzer (0.1 Hz analysis) to see how many events can be stored in extra buffer (The ROC's memory, according to Paul King, should be ~2k events or so)
    • This definitely grinds both 64 and 32 bit ET systems to a complete halt
    • No events flow through the ET systems or analyzer once all ET buffers are full, but once the analyzer is killed all of the events produced in the ROC come pouring out
    • This is good from event 2000 (when analyzer was turned on) to event 7299, where one single ROC telnet connection output vqwk read/write buffer error is registered, and the helicity checker sees that this is exactly where the helicity sequence is violated and helicity errors begin
    • The ROC buffer is on the order of 3k events, but further tests will see what is going on
  • Run 4779 - Test to see if maybe its actually the client's input buffer (utilized explicitly during non-blocking mode)
    • Set line 112 ("cue") of THaEtClient.C code to have 30 events long buffer instead of 10,000 as was previously increased to
    • Event 1637 gives the vqwk error (seemingly unrelated to online analyzer client running)
    • Hel errors
  • Run 4780 - Redo 4779 test
    • Made it to 2496 events before vqwk error this time
    • All but the first 100 events were backed up into the ROC memory (so it is not the client "cue")
    • The "normal" vqwk read/write buffer errors are too common for reliable testing of exactly how large the ROC memory is, but assuming ~2k events is probably safe for now
  • Run 4781 - Test et bridge input buffer size and also test to see what the helicity and event numbers and total number of events look like when relying on ROC memory for a long time
    • Edit et_bridge code and recompile to have 30 total events instead of 10,000 - open ~/et-12.0/bob/et32_2_et64.c and edit #NUMEVENTS line and recompile with make etbridge and then copy the executable over to ~/bin/etbridge (you may need to make clean and "cp ./orig/libevio.a ." and then remake)
    • This still backs up both ETs as before, and ROC 23 still stays active for a few thousand events as before
    • Now looking at extreme backed up ROC behavior I see in the rcgui that ROC 23 eventually hits a wall and reads "0" event data taking rate after long enough
    • Interestingly once the ~few thousand event ADC read/write buffer error is reported (event 2494) the ROC will resume reading data (as reported in RC gui)
    • It resumes and then the RC gui ROC event rate drops to zero again, and the ADC hiccup happens again resuming ROC event reading again (so its a cycle)
    • Maybe this has to do with a chunk getting grabbed, or maybe it has to do with the ADCs legitimately having a problem sometimes, and probably related to starting or stopping writing events (the ADCs have the same kind of hiccup when beginning a new run very often, in fact so often that I told the helicity deadtime checking code to skip the first 30 events so that this issue doesn't confound its pattern recognizer)
    • Still once you kill the client thousands of events come pouring through the ET system from the ROC
    • Events are definitely skipped in these ADC hiccups, and the helicity sequence is broken

Blocking Bridge, non-Blocking Client

  • Run 4782 - Testing blocking bridge and non-blocking client, and also testing if the blocking bridge chunk size affect ADC hiccup
    • Set non-blocking client "cue" to 30, but ET bridge #NUMEVENTS back up to 10,000
    • This should act like a blocking client station, since the cue length is the same as the total number of events in the 64 bit ET
    • In fact it does act like a blocking client station
  • Run 4783 - now set 64 bit ET client total number of events to 50 to be larger than client cue for proper non-blocking client station test (64 bit ET chunk still == 100 events here though, not a source of any problem I can see, this chunk is just a max n_events per request, not the minimum as in the case of client station chunk)
    • Now grand central in 32 bit and 64 bit ET are both happy and see all of the events, no vqwk hiccups
    • The non-blocking client analyzer does get bypassed when the analyzer speed is slower than the helicity flip rate, and this means no deadtime in the DAQ, but missed events in analyzer
  • Run 4785 (4784 CODA failed) - see how harmful setting the chunk size of the ET bridge is for the ability of the bridge to transfer data at necessary rates from one ET to the other (still at 30 Hz injector supplied helicity information)
    • Set ET Bridge chunk size to 1 (with -c parameter)
    • This has no new affect, chunk = 1 is sufficient at 30 Hz (queue is still 30 from earlier)
    • But the problem before was actually too small of a queue size (so too many events show up before the analyzer has a chance to use them, causing the queue to fill up and lose events - see prior testing notes)