Message Bus Debugging Specification
NOTE: This is a planning document, created prior to implementation of the VOLTTRON Message Debugger. It describes the tool’s general goals, but it’s not always accurate about specifics of the ultimate implementation. For a description of Message Debugging as implemented, with advice on how to configure and use it, please see Message-Debugging.
Description
VOLTTRON agents send messages to each other on the VOLTTRON message bus. It can be useful to examine the contents of this message stream while debugging and troubleshooting agents and drivers.
In satisfaction of this specification, a new Message Monitor capability will be implemented allowing VOLTTRON agent/driver developers to monitor the message stream, filter it for an interesting set of messages, and display the contents and characteristics of each message.
Some elements below are central to this effort (required), while others are useful improvements (optional) that may be implemented if time permits.
Feature: Capture Messages and Display a Message Summary
When enabled, the Message Monitor will capture details about a stream of routed messages. On demand, it will display a message summary, either in real time as the messages are routed, or retrospectively.
A summary view will convey the high level interactions occurring between VOLTTRON agents as conversations that may be expanded for more detail. A simple RPC call that involves 4 message send/recv segments will be displayed as a single object that can be expanded. In this way, the message viewer will provide a higher-level view of message bus activity than might be gleaned from verbose logs using grep.
Pub/sub interactions will be summarized at the topic level with high-level statistics such as the number of subscribers, # of messages published during the capture period, etc. Drilling into the interaction might show the last message published with the ability to drill deeper into individual messages. A diff display would show how the published data is changing.
Summary view
- 11:09:31.0831 RPC set_point charge.control platform.driver
| - params: ('set_load', 10) return: True
- 11:09:31.5235 Pub/Sub devices/my_device platform.driver 2 subscribers
| - Subscriber: charge.control
| - Last message 11:09:31.1104:
[
{
'Heartbeat': True,
'PowerState': 0,
'temperature': 50.0,
'ValveState': 0
},
...
]
| - Diff to 11:09:21.5431:
'temperature': 48.7,
The summary’s contents and format will vary by message subsystem.
RPC request/response pairs will be displayed on a single line:
(volttron) d1:volttron myname$ msmon —agent='(Agent1,Agent2)'
Agent1 Agent2
2016-11-22T11:09:31.083121+00:00 rpc: devices/my_topic; 2340972387; sent 2016-11-22T11:09:31.277933+00:00 responded: 0.194 sec
2016-11-22T11:09:32.005938+00:00 rpc: devices/my_topic; 2340972388; sent 2016-11-22T11:09:32.282193+00:00 responded: 0.277 sec
2016-11-22T11:09:33.081873+00:00 rpc: devices/my_topic; 2340972389; sent 2016-11-22T11:09:33.271199+00:00 responded: 0.190 sec
2016-11-22T11:09:34.049139+00:00 rpc: devices/my_topic; 2340972390; sent 2016-11-22T11:09:34.285393+00:00 responded: 0.236 sec
2016-11-22T11:09:35.053183+00:00 rpc: devices/my_topic; 2340972391; sent 2016-11-22T11:09:35.279317+00:00 responded: 0.226 sec
2016-11-22T11:09:36.133948+00:00 rpc: devices/my_topic; 2340972392; sent 2016-11-22T11:09:36.133003+00:00 dequeued
When PubSub messages are displayed, each message’s summary will include its count of subscribers:
(volttron) d1:volttron myname$ msmon —agent=(Agent1)
Agent1
2016-11-22T11:09:31.083121+00:00 pubsub: devices/my_topic; 2340972487; sent; 2 subs
2016-11-22T11:09:32.005938+00:00 pubsub: devices/my_topic; 2340972488; sent; 2 subs
2016-11-22T11:09:33.081873+00:00 pubsub: devices/my_topic; 2340972489; sent; 2 subs
2016-11-22T11:09:34.049139+00:00 pubsub: devices/my_topic; 2340972490; sent; 2 subs
2016-11-22T11:09:35.053183+00:00 pubsub: devices/my_topic; 2340972491; sent; 2 subs
While streaming output of a message summary, a defined keystroke sequence will “pause” the output, and another keystroke sequence will “resume” displaying the stream.
Feature: Capture and Display Message Details
The Message Monitor will capture a variety of details about each message, including:
Sending agent ID
Receiving agent ID
User ID
Message ID
Subsystem
Topic
Message data
Message lifecycle timestamps, in UTC (when sent, dequeued, responded)
Message status (sent, responded, error, timeout)
Message size
Other message properties TBD (e.g., queue depth?)
On demand, it will display these details for a single message ID:
(volttron)d1:volttron myname$ msmon --id='2340972390'
2016-11-22T11:09:31.053183+00:00 (Agent1)
INFO:
Subsystem: 'pubsub',
Sender: 'Agent1',
Topic: 'devices/my_topic',
ID: '2340972390',
Sent: '2016-11-22T11:09:31.004986+00:00',
Message:
[
{
'Heartbeat': True,
'PowerState': 0,
'temperature': 50.0,
'ValveState': 0
},
{
'Heartbeat':
{
'units': 'On/Off',
'type': 'integer'
},
'PowerState':
{
'units': '1/0',
'type': 'integer'
},
'temperature':
{
'units': 'Fahrenheit',
'type': 'integer'
},
'ValveState':
{
'units': '1/0',
'type': 'integer'
}
}
]
A VOLTTRON message ID is not unique to a single message. A group of messages in a “conversation” may share a common ID, for instance during RPC request/response exchanges. When detailed display of all messages for a single message ID is requested, they will be displayed in chronological order.
Feature: Display Message Statistics
Statistics about the message stream will also be available on demand:
Number of messages sent, by agent, subsystem, topic
Number of messages received, by agent, subsystem, topic
Feature: Filter the Message Stream
The Message Monitor will be able to filter the message stream display to show only those messages that match a given set of criteria:
Sending agent ID(s)
Receiving agent ID(s)
User ID(s)
Subsystem(s)
Topic - Specific topic(s)
Topic - Prefix(es)
Specific data value(s)
Sampling start/stop time
Other filters TBD
User Interface: Linux Command Line
A Linux command-line interface will enable the following user actions:
Enable message tracing
Disable message tracing
Define message filters
Define verbosity of displayed-message output
Display message stream
Begin recording messages
Stop recording messages
Display recorded messages
Play back (re-send) recorded messages
Feature (not implemented): Watch Most Recent
Optionally, the Message Monitor can be asked to “watch” a specific data element. In that case, it will display the value of that element in the most recent message matching the filters currently in effect. As the data to be displayed changes, the display will be updated in place without scrolling (similar to “top” output):
(volttron) d1:volttron myname$ msmon —agent='(Agent1)' --watch='temperature'
Agent1
2016-11-22T11:09:31.053183+00:00 pubsub: my_topic; 2340972487; sent; 2 subs; temperature=50
Feature (not implemented): Regular Expression Support
It could help for the Message Monitor’s filtering logic to support regular expressions. Regex support has also been requested (Issue #207) when identifying a subscribed pub/sub topic during VOLTTRON message routing.
Optionally, regex support will be implemented in Message Monitor filtering criteria, and also (configurably) during VOLTTRON topic matching.
Feature (not implemented): Message Stream Record and Playback
The Message Monitor will be able to “record” and “play back” a message sequence:
Capture a set of messages as a single “recording”
Inspect the contents of the “recording”
“Play back” the recording – re-send the recording’s messsage sequence in VOLTTRON
Feature (not implemented): On-the-fly Message Inspection and Modification
VOLTTRON message inspection and modification, on-the-fly, may be supported from the command line. The syntax and implementation would be similar to pdb (Python Debugger), and might be written as an extension to pdb.
Capabilities:
Drill-down inspection of message contents.
Set a breakpoint based on message properties, halting upon routing a matching message.
While halted on a breakpoint, alter a message’s contents.
Feature (not implemented): PyCharm Debugging Plugin
VOLTTRON message debugging may also be published as a PyCharm plugin. The plugin would form a more user-friendly interface for the same set of capabilities described above – on-the-fly message inspection and modification, with the ability to set a breakpoint based on message properties.
User Interface (not implemented): PCAP/Wireshark
Optionally, we may elect to render the message trace as a stream of PCAP data, thereby exploiting Wireshark’s filtering and display capabilities. This would be in accord with the enhancement suggested in VOLTTRON Issue #260.
User Interface (not implemented): Volttron Central Dashboard Widget
Optionally, the Message Monitor will be integrated as a new Volttron Central dashboard widget, supporting each of the following:
Enable/Disable the monitor
Filter messages
Configure message display details
Record/playback messages
User Interface (not implemented): Graphical Display of Message Sequence
Optionally, the Volttron Central dashboard widget will provide graphical display of message sequences, allowing enhanced visualization of request/response patterns.
Grabbing Messages Off the Bus
This tool depends on reading and storing all messages that pass through the VIP router. The Router class
already has hooks that allow for the capturing of messages at various points in the routing workflow. The
BaseRouter abstract class defines issue(self, topic, frames, extra)
. This method is called from BaseRouter.route
and BaseRouter._send
during the routing of messasges. The topic
parameter (not to be confused with a
message topic found in frames
) identifies the point or state in the routing worflow at which the issue was called.
The defined topics
are: INCOMING, OUTGOING, ERROR and UNROUTABLE. Most messages will result in two calls, one
with the INCOMING topic as the message enters the router and one with the OUTGOING topic as the message is
sent on to its destination. Messages without a recipient are intended for the router itself and do not result
in an OUTGOING call to issue
.
Router.issue
contains the concrete implementation of the method. It does two things:
It writes the topic, frames and optional extra parameters to the logger using the FramesFormatter.
It invokes
self._tracker.hit(topic, frames, extra)
. The Tracker class collects statistics by topic and counts the messages within a topic by peer, user and subsystem.
The issue method can be modified to optionally publish the issue
messages to an in-process ZMQ address
that the message-viewing tool will subscribe to. This will minimize changes to core VOLTTRON code and minimize
the impact of processing these messages for debugging.
Message Processor
The message processor will subscribe to messages coming out of the Router.issue() method and process these messages based on the current message viewer configuration. Messages will be written to a SQLite db since this is packaged with Python and currently used by other VOLTTRON agents.
Message Viewer
The message viewer will display messages from the SQLite db. We need to consider whether it should also subscribe to receiving messages in real-time. The viewer will be responsible for displaying message statistics and will provide a command line interface to filter and display messages.
Message Db Schema
message(id, created_on, issue_topic, extras, sender, recipient, user_id, msg_id, subsystem, data)
msg_id will be used to associate pairs of incoming/outgoing messages.
Note
data will be a jsonified list of frames, alternatively we could add a message_data table with one row per frame.
A session table will track the start and end of a debug session and, at the end of a session, record statistics on the messages in the session.
session(id, created_on, name, start_time, end_time, num_messages)
The command line tool will allow users to delete old sessions and select a session for review/playback.