Correlation Alarms

An Alarm corresponds to a row in the main dashboard gui.

The alarm lifecycle consists of phases, enumerated in AlarmPhase.

|#yellow|PENDING|

start

:start pending timer;
->pending timer expired;

|#pink|FINALIZED|

if (down?) is (yes) then
  #lightgreen:notification\n(if CRITICAL);
else (no)
  if (short\nlived?) is (no) then
    #lightgreen:notification\n(if CRITICAL);
  else (yes)
  endif
    :close;
    end
endif


:coalesce alarms,
process blacklists;
repeat :alarm monitoring;
repeat while (up?) is (no) not (yes)
:clear;

:start kill timer;
->kill timer expired;
:close;

end

class dashboard.correlation.alarm.Alarm(db_id, uuid, endpoints, assessor, phase, severity, dirty, published, description, previous_state=None, final_severity=None, alarm_group=None, devoured=None, publish_update=<function publish_alarm_update>)
COALESCE_WINDOW_HOURS = 24

Similar alarms that occur within the coalescing window may be grouped together

FLOOD_SLACK_TIME = 300

don’t accept new endpoints that are created longer than this number of seconds after the earliest endpoint in this alarm

KILL_PERIOD = 30

after the alarm is finalized, if the alarm is up for this amount of time then discard it

PENDING_PERIOD = 60

the period during which the alarm description can change

SHORT_LIVED_ALARM_THRESHOLD = 120

if the alarm has been down for less than this value when finalizing, automatically close it

apply_changes(severity=None, description=None, state=None, session=None)
begin_lifecycle(session)

Begin the Alarms lifecycle by initializing its phase handler. This method must be called (and only called once per Alarm) after the Alarm has been created and correct initial phase has been set.

Parameters:

session – a SQLAlchemy Session object

can_accept_endpoint(endpoint)

Check if this alarm’s state allows the endpoint to be added.

Namely:

  • it’s within FLOOD_SLACK_TIME

  • if the alarm isn’t PENDING, then the severity doesn’t increase and the description doesn’t change

This is only called after the endpoint has been correlated with one or more endpoints already present in this alarm.

Returns:

true if this alarm can accept the endpoint

cleanup_after_being_devoured(session)
clear_current_timer(name=None)

Stops and removes the current timer, if it exists

Parameters:

name – Optional, only stop the current timer if its timer_name matches name

property contacts
property dashboard_service
property description
devour_alarms(alarms_to_devour)

Take over all endpoints in alarms_to_devour, and free all resources from those alarms (e.g. stop timers, etc.).

Parameters:

alarms_to_devour

discard_endpoint(endpoint)

Discard endpoint from self.endpoints.

Parameters:

endpoint

Returns:

True iff the endpoint was found and removed

dumpd()

Create a json-serializable dict that can be used to re-create this object.

Returns:

a json-serializable dict

property duration

Generalised method, should be overridden if there is a better field to be sorted on e.g. when they will always come from the same device use systemUpTime

WARNING: this method will fail in KILL_ME state

Returns:

the time difference in seconds between the first trap and the current time or the last traps if the alarm is closed

classmethod from_cache(endpoint_or_cached_dict)
classmethod from_endpoint(endpoint_or_cached_dict)
handle(**kwargs)
property init_time
initialize_phase(new_phase, session)
kill_alarm()
learn_from_db(session)

workaround for the first time we reload the cache after introduction of phase (DBOARD3-242)

this method should only be called when loading the cache

property locations
notify_endpoint_state_changed()
property phase
phase_handler
property project
publish_state()
property published
release_all()

Release any resources that will prevent this object from being garbage collected.

This method leaves this object in a corrupt state.

property resource_ids
restart_kill_timer()
restart_pending_timer()
restart_timer(name, timeout_seconds, handler)
save_if_dirty(session)
property services
set_published()
property severity
property state
timer = None
update_alarm()