Difference between revisions of "Alarm Documentation"
m (→Hall A and Detector Alarm Handler)
|Line 139:||Line 139:|
* Most of the interesting parameters, like refresh time, cooldown time, and the files used to store history and alarm information, are configurable here.
* Most of the interesting parameters, like refresh time, cooldown time, and the files used to store history and alarm information, are configurable here.
= Hall A and
= Hall A and Alarm =
Revision as of 21:28, 14 July 2019
- 1 Parity Alarms
- 1.1 Kinds of EPICs Alarms
- 1.2 Kinds of JAPAN (Online Analysis) Alarms
- 1.3 CODA Data Taking Alarms
- 2 How The Parity Alarm Handler Works
- 3 Parity Alarm Details
- 4 Hall A and Target Alarm Handlers
Kinds of EPICs Alarms
Injector + Parity Quality Beam (PQB)
Hall A Beamline + Monitors
HRSs and Detectors
Kinds of JAPAN (Online Analysis) Alarms
Detector Saturation Problems
Missed Trigger Problems
CODA Data Taking Alarms
Run Number sanity check
Time since last CODA run start
How The Parity Alarm Handler Works
The Experimental Alarm Handler is a Python GUI. The purpose of this alarm handler is to interface with the various data collection, analysis, and display programs that are in use during the experiment. We expect that this GUI should be run in the background at all times, and that whenever an alarm is activated a sound will play on speakers and the GUI will display a red indicator for the alarm that went off.
The user is responsible for finding the tripped alarm in question and using this GUI to find out the nature of the alarm and assess the situation.
Upon opening the Alarm Handler, the user is greeted with the front page, featuring under all-clear circumstances a happy green thumbs up, and in alarming circumstances a red alarm indicator.
The main page features several columns of alarm status indicators, these have the name, category, and type of alarm indicated, along with a status indicator and a button whose text is the value of the analysis/tracked quantity.
The key features are the status indicator and the nature of the button. Clicking the button brings the alarm handler's focus onto that alarm. The button has a right click context menu (which requires continuous pressing to keep it from vanishing, contrary to normal context menu behavior), whose options are to open an information panelfor, "Acknowledge", or "Silence" the alarm in question (which doesn't have to be the one currently in focus). The status indicator has 4 states: OK, Alarmed, Acknowledged and cooling down, and Silenced. The status indicator also serves as a button for the purpose of acknowledging an alarm or un-silencing.
Alarm Status Indicator/Button
The Status Indicator is intended to instantly convey the relevant information of the given alarm.
- Red: indicates that the alarm has been tripped, and the box will display the alarm status (High, Low, Exactly, etc.) in this case.
- While Red, the status indicator serves as a button for acknowledging the alarm. When clicked, the alarm will be acknowledgedand a cooldown period will begin in which the alarm status won't affect the global Alarm Status and the Orange cooldown indicator will be activated.
- Green: indicates that the alarm is ok.
- While Green, the status indicator serves as a button for refreshing the screen (but not the alarm statuses). This isn't particularly functional, but it may be useful in some circumstances.
- Yellow: indicates that the alarm has been silenced by the user and cannot generate further alarms until unsilenced.
- While Yellow, the status indicator will display in text the underlying alarm status that has been silenced, and it serves as a button for un-silencing. When clicked, the alarm will resume its normal alarm checking routine.
- Orange: indicates that the alarm has been acknowledged and that a cooldown period has begun. During this cooldown period the alarm status will not affect the Global Alarm status, primarily for the convenience of the user.
- While Orange, the status indicator will also display the approximate number of seconds until the cooldown is complete and the alarm is un-acknowledged. Also the indicator will serve as a button to manually un-acknowledge the alarm and turn back on its alarm-checking routine.
Alarm Information Panel
Another of the right-click context menu options is "Information", to display that alarm's information. The alarm parameters will be displayed in a white background box to the right of the existing GUI. This parameter display can be removed/deactivated by clicking the "Reset GUI" button on the top right control buttons bar."
Along the top of the Alarm Handler are 4 control buttons. These allow the user some control over the internal parameters of the program
- Alarm Status/Find Alarm: indicates the current alarm status through its color.
- It can be either "OK" == Green, "Silenced" or "Paused" == Yellow, or "Alarmed" == Red.
- Clicking on this button will bring the most recently triggered alarm into focus.
- Alarm Checker/Turn Off (On): displays the current status of the alarm loop - when it is Yellow that means the alarm loop is paused, otherwise the loop is active.
- Clicking on this button will toggle the alarm loop status, and the lower line of text indicates what you will do by clicking it.
- Note that the alarm loop is discritized into N seconds, where N is user defined, default = 10 seconds.
- Silencer/Turn On (Off): will silence the Global Alarm status entirely off and make the GUI effectively deactivated, but it will continue to perform the alarm checking loop and updating the displayed values.
- When silenced this button and the Alarm Status button will both turn yellow as an indicator to the user.
- Reset GUI: serves as a refresh button for the GUI. It does not affect the alarm loop, but it will re-poll the GUI's underlying text-file memory storage and reset all values to match what is contained within.
Please note that it is not safe to try to make any edits to the alarm parameters using the pop-up dialog boxes in expert mode or the underlying text-file while the alarm loop is active, as those changes will be overwritten by the alarms status storing stage of the alarm loop itself. Similarly it is not safe to run two instances of the Alarm Handler without pointing them to separate alarm.csv memory files, to avoid write/read clashes.
Parity Alarm Details
The underlying data format for the Alarm Handler is a text file CSV that encodes the alarm's parameters along with some organizational hierarchical book-keeping. An example is shown below.
The purpose of including the organizational hierarchy is two-fold (though it could be done differently and the backend is rudimentary/unnecessarily complicated).
- First, it is designed to enable organizing similar kinds of alarms into groups, and to simplify the control over responding to, editing, and generating new groups of alarms in the Expert Alarm Handler page
- Second, it is utilized (externally) to store the channel map information for the JAPAN PREXII/CREX analysis alarm handler, and can be used similarly for other extensions as well.
The alarm parameters (which live in the 5th column of the 2D CSV file) are the information and control points for the Alarm Handler's alarm loop routine. The alarm loop will parse the list of parameters as it goes through its chain of possible kinds of alarm analyses to perform.
After the alarm loop routine has been performed and the information has been updated it will be stored back in the original alarm memory file, overwriting the previous alarm data. User actions, such as silencing or acknowledging an alarm, will also trigger the write process, allowing those parameter changes to be stored and kept track of as well.
Types of Alarms
The alarm categorization can be optimally utilized at 1st order (the first column) by separating different kinds of alarms into groups of either similar "Kinds" of value checks (Magnet Currents, analysis measurement read-backs, DAQ status, etc.) or similar "Kinds" of techniques of checking them (EPICS, JAPAN analysis, System Calls, etc.).
Categorization at 2nd order can be separated into "Channels", where the categories of things to be checked in each "Kind" can be listed (a list of beam line monitors, for example).
Categorization at 3rd order should lastly be separated into "Types", which will be the "Alarm" definitions themselves. As it is simply a tier of alarm categorization, each Type/Alarm gets exactly 1 parameter to be checked and compared to expectation parameters.
- It is likely that one "Kind"->"Channel" could have multiple different "Types" of Alarms for it, given different running conditions (in which case the innactive alarm could be silenced temporarily) or for comparing different types of checks (for example, widths, yields, or asymmetry means for a single detector, or multiple diagnostic parameters for a beamline magnet)."
- A feature has been added which optimizes this multi-conditional alarm limit idea, using up to two layers of "Case" conditions to define which set of alarm limits is used.
An example of what the alarms look like in the CSV file used to generate the alarms is given here:
Hall A,Beam,Beam Energy,Alarm Status,OK Hall A,Beam,Beam Energy,HighHigh,960 Hall A,Beam,Beam Energy,High,951 Hall A,Beam,Beam Energy,Value,950.397 Hall A,Beam,Beam Energy,Low,949 Hall A,Beam,Beam Energy,LowLow,940 Hall A,Beam,Beam Energy,Alarm Type,EPICS Hall A,Beam,Beam Energy,Variable Name,HALLA:p Hall A,Beam,Beam Energy,User Silence Status,Alert Hall A,Beam,Beam Energy,User Notify Status,OK
Parity Alarm Functionality
The default needed parameters in order to function as an alarm are:
- Value: where the result of obtaining the raw data to be analyzed and compared to alarm limits is stored and updated and checked from.
- Alarm Status: where the resultant status generated by the alarm looping routine is stored. This is the information that is used to determine the local and global alarm statuses and is used to communicate the situation to the user.
- An alarm status of "OK" means everything is fine, and otherwise the alarm status takes on the name of the violated limit (so if an alarm is low, the alarm status will read "Low", or if it fails the exactly == cut then it will read "Exactly")
- Technically the Alarm Status parameter is optional, and if it is left out then the alarm handler will not perform any alarm routine at all, and this "alarm" functions as a simple value indicator (which may be useful for difficult to pin down parameters or for things that are nice to know (like injector slow control settings and things for checklist checking) but that don't need alarms put on them.
- Alarm Type: the flag that tells the alarm loop routine which mode to use for obtaining the fresh raw values. This parameter is currently limited to "EPICS" and "External".
- External refers to using an additional alarm CSV file to read the alarm information from. External files should not contain any user parameters, as this Alarm Handler does not edit the external file and always updates its internal values with what it finds in the externals. The only external file currently in use is from the Online Analysis instance of JAPAN running on apar@adaq1.
Kinds of Alarm Limit Checks
Currently implemented in the Alarm Handler are several forms of value checks. The code allows for comparing a value against other values and determining if the raw value in question is outside of bounds. The supported limit parameters are:
- Low: checks that the value isn't lower than this number
- High: ""... but higher
- LowLow: ""... identically, but it gives the user additional information about how significant the violation was
- HighHigh: ""... the same but high. And not that the alarm status will latch, meaning the most recent non-"OK" alarm status will persist until the user Acknowledges the status, so having distinctions like this can be useful
- Exactly: checks whether the raw value == the given number exactly (and the alarm status will say exactly if this isn't true, because the value is exactly not this number). Because Python is highly extensible, this "Exactly" value can actually be any object at all and is not limited to doubles (EPICS string outputs are particularly useful for this)
- Difference High or Low: This is a kind of alarm that checks a "Difference Reference Value" determined by an EPICs "Difference Reference Variable Name" and then compares the difference of the "Value" of the alarm EPICs variable to the High or Low value given here
- Case and Double Case: These are reference EPICs variables that are checked to establish which of the Low, High, Exactly, etc. alarm limits should be utilized. The return value of the EPICs call to "Case Variable Name" goes into "Case Value" and is used as the string appended to the Low, High, Exactly, etc. name to be chosen from
Additionally, in the JAPAN Alarm handler analysis there are two more parameters, Ring-Length and Tolerance, which respectively represent the number of events that the analyzer will remember the alarm violations status of, and how many violations of the alarm status within that time frame it takes in order to set the alarm status to alarmed, or to cool down and reset to "OK".
In CODA/RCND/RCDB type alarms there is a parameter to store the current Parity DAQ run number, which is used to obtain RCDB data about the current CODA run. The time since last CODA run start is computed by the alarm handler itself and is in Epoch time (which can be hard to read, but is easy to convert)
Optional Alarm Parameters
Lastly, there are several parameters which are entirely optional, but which give significant control over the alarm reporting to the user:
- User Silence Status: gives the status ("Alert" or "Silenced") that the user has set for a given alarm. Without these parameters to serve as status storage the independent alarm silence status will not persist from one instance of the loop to the next.
- User Notify Status: used to store (similarly to silence status) whether or not the user has recently acknowledged the alarm state.
- If the user has then this parameter will take on a "Cooldown" status, which is the word Cooldown plus the number of seconds remaining in the cooldown.
- If the user has not acknowledged the status then this will latch onto the most recent non-"OK" alarm status and remember it until the alarm is acknowledged. This alarm acknowledge status is exactly the parameter used to determine the alarm indicator and Global Alarm status, so it actually is necessary for proper functioning of the Alarm Handler (though that can be changed if really needed)
Parity Alarms History and Settings
A latched alarm state will be stored in the Alarm History of the active alarm handler, which is also stored in a text file.
- This text file can be backed up (and simultaneously cleared out) by clicking on the alarm history tab's save and clear button.
- The alarm history stores the alarm value and the UNIX Epoch time at which the alarm occurs.
- The alarm history backup files also contain in their names the Epoch time at which they were saved.
Parity Alarm Settings can be controlled from the parity alarm settings tab.
- Right clicking on one of the settings will allow the user to change the value of that setting.
- Many of the settings will require an alarm handler reboot to actually change the behavior of the alarm handler.
- Most of the interesting parameters, like refresh time, cooldown time, and the files used to store history and alarm information, are configurable here.