Failover Agent


The failover agent provides a generic high availability option to VOLTTRON. When the primary platform becomes inactive the secondary platform will start an installed agent.

Standard Failover

There are two behavior patterns implemented in the agent. In the default configuration, the secondary instance will ask Volttron Central to verify that the primary instance is down. This helps to avoid a split brain scenario. If neither Volttron Central nor the other failover instance is reachable then the failover agent will stop the agent it is managing. These states are shown in the tables below.

Primary Behavior

  VC Up VC Down
Secondary Up start start
Secondary Down start stop

Secondary Behavior

  VC Up VC Down
Primary Up stop stop
Primary Down Verify with VC before starting stop

Simple Failover

There is also a simple configuration available that does not involve coordination with Volttron Central. The secondary agent will start its managed agent if believes the primary to be inactive. The simple primary always has its managed agent started.


Failover behavior is set in the failover agent’s configuration file. Example primary and secondary configuration files are shown below.

{                                           |    {
    "agent_id": "primary",                  |        "agent_id": "secondary",
    "simple_behavior": true,                |        "simple_behavior": true,
    "remote_vip": "tcp://",   |        "remote_vip": "tcp://",
    "remote_serverkey": "",                 |        "remote_serverkey": "",
    "agent_vip_identity": "platform.driver",|        "agent_vip_identity": "platform.driver",
    "heartbeat_period": 10,                 |        "heartbeat_period": 10,
    "timeout": 120                          |        "timeout": 120
}                                           |    }
  • agent_id - primary or secondary
  • simple_behavior - Switch to turn on or off simple behavior. Both instances should match.
  • remote_vip - Address where remote_id can be reached.
  • remote_serverkey - The public key of the platform where remote_id lives.
  • agent_vip_identity - The vip identity of the agent that we want to manage.
  • heartbeat_period - Send a message to remote_id with this period. Measured in seconds.
  • timeout - Consider a platform inactive if a heartbeat has not been received for timeout seconds.