|
|
@@ -0,0 +1,170 @@
|
|
|
+ Okay, this is a little rough but here goes. There are two main hubs
|
|
|
+of data flow involved in this system:
|
|
|
+
|
|
|
+ The first is local to the embedded system and is implemented by the
|
|
|
+ipc_server daemon (source located in the commhub/ directory).
|
|
|
+
|
|
|
+ The second is a WAN based synchronization service that is responsible
|
|
|
+for striving for the best possible balance between always-ready and eventual
|
|
|
+consistency between the database state stored on the bus (the client) and
|
|
|
+the master database state maintained by the garage (the server).
|
|
|
+
|
|
|
+---------------------------------- Interprocess Communication Hub -----------
|
|
|
+
|
|
|
+ This module provides majordomo-like subscription based interprocess
|
|
|
+communication between local modules using a UNIX domain socket in
|
|
|
+SOCK_SEQPACKET mode. This server functions as a smart router/switch
|
|
|
+allowing each module to address any/all other modules with any
|
|
|
+of the following addresing modes:
|
|
|
+
|
|
|
+ 1) Mailbox Name (this will deliver to every process that has subscribed to the named mailbox).
|
|
|
+ 2) "_BROADCAST" (this will deliver to all connected clients)
|
|
|
+ 3) ">nnn" (this will deliver the message to the client with PID == nnn)
|
|
|
+ 4) ":module" (this will deliver to all clients registered with the supplied module name).
|
|
|
+
|
|
|
+ The server also manages a number of "special" mailboxes that execute
|
|
|
+specific communication flow management functions. These mailboxes are as
|
|
|
+follows (these are #define'd string constants that resolve to the actual
|
|
|
+special mailbox names):
|
|
|
+
|
|
|
+ 1) MAILBOX_SUBSCRIBE (subscribes the sending client to the mailbox named in the payload)
|
|
|
+ 2) MAILBOX_UNSUBSCRIBE (unsubscribes the sending client to the specified mailbox)
|
|
|
+ 3) MAILBOX_BROADCAST (sends the message to all connected clients)
|
|
|
+ 4) MAILBOX_WIRETAP (payload == "ON" to select wiretap mode, anything else to deselect).
|
|
|
+
|
|
|
+ When a client select wiretap mode it will receive an unmodified copy of
|
|
|
+each and every message from that point forward (until wiretap mode is
|
|
|
+deselected). This includes all special management messages and PID or
|
|
|
+module addressed message. This allows for easy message flow debugging
|
|
|
+(including debugging the communication hub itself) and for robust on-the-fly
|
|
|
+selectable logging in the field without any need to recompile or even
|
|
|
+interrupt the system. This allows us to attach to a running system in any
|
|
|
+state (including failure states) and transparently observe data flow between
|
|
|
+modules.
|
|
|
+
|
|
|
+ The server listens on a socket located at /tmp/commhub. Clients connect
|
|
|
+using the connect_to_message_server() function which returns a file
|
|
|
+descriptor that represents the connection to the IPC server. This function
|
|
|
+makes the socket connection and registers the client process with the server
|
|
|
+by both PID and module name.
|
|
|
+
|
|
|
+ Once a client is registered, messages can be sent to the server by
|
|
|
+defining a message structure and populating it with the prepare_message()
|
|
|
+function call and then calling the send_message() function to dispatch the
|
|
|
+message to the server.
|
|
|
+
|
|
|
+ Incoming messages can be retrieved with the get_message() function after
|
|
|
+a call to poll() indicates POLLIN on the fd for the connection to the
|
|
|
+server. If there are no other file descriptors to poll, or a quick
|
|
|
+non-blocking check is desirable, there is a provided poll wrapper called
|
|
|
+message_socket_status() which can be passed the fd associated with the
|
|
|
+connection and a mask of MSG_SEND | MSG_RECV which will specify whether a
|
|
|
+message can be sent and/or whether there is a message waiting to be received
|
|
|
+on that connection at the present moment.
|
|
|
+
|
|
|
+ An important feature of this architecture is that it allows for the
|
|
|
+modules communicating through the hub to be started, stopped, and replaced
|
|
|
+independently and transparently. If the modules are designed with sensible
|
|
|
+opaque interfaces this allows for a remarkable about of flexibility in
|
|
|
+implementation and testing as modules can be fashioned to function as test
|
|
|
+jigs to test other individual modules or complex subsystems of modules.
|
|
|
+Test jigs can even be built to emulate fault conditions to test system
|
|
|
+recovery and failure-workaround.
|
|
|
+
|
|
|
+ Modules can also be built to hide hardware-servicing routines. For
|
|
|
+instance, there could be a module that was responsible for communicating
|
|
|
+with every peripheral that lives in the passenger interface unit (RFID,
|
|
|
+Magstripe, Cash Vault, and Passenger Facing Display). This module can
|
|
|
+simply multiplex and demultiplex the serial communications with those
|
|
|
+peripherals and create mailboxes to receive any data that needs to be
|
|
|
+transmitted to those peripherals and send any input received from those
|
|
|
+peripherals to some interface-specified mailboxes (one per peripheral and/or
|
|
|
+one per message type). The beauty of this abstraction layer is two-fold.
|
|
|
+First, it allows things like rebooting the PIU to happen without the need to
|
|
|
+interrupt or reload any other process (minimizing collateral damage from
|
|
|
+unneccesary depdendencies). The second place this pays off is in the case
|
|
|
+where a hardware change causes a peripheral to migrate from one subsystem to
|
|
|
+another: In this case, only the modules that directly service I/O to those
|
|
|
+peripherals need to be aware of this change... all other modules will
|
|
|
+continue to use the agreed upon message passing interface and the fact that
|
|
|
+the (for instance) cash-valut relay has been moved from the Passenger
|
|
|
+Interface Unit to the Driver Interface Unit will not matter to any of the
|
|
|
+other modules that need to make use of that service.
|
|
|
+
|
|
|
+ This design was heavily influenced by the elegance of the Erlang
|
|
|
+ERTS framework. My goal here to make a very lightweight and portable
|
|
|
+C-based system that provides a similar type of modularization and
|
|
|
+abstraction such that we can have maximum ongoing implementation flexibility
|
|
|
+and robust testing similar to that provided by the ERTS framework without
|
|
|
+incurring the overhead (and hassle) of maintaining out own cross-compiled
|
|
|
+ARM port of ERTS (and without compelling all the other developers to learn
|
|
|
+Erlang (which with such a tight development schedule would be madness)).
|
|
|
+
|
|
|
+
|
|
|
+-------- Client (bus) to Server (garage) Synchronization System ---------
|
|
|
+
|
|
|
+
|
|
|
+ This module actually consists of a server component in the garage which
|
|
|
+is responsible for maintaining a master database which is considered the
|
|
|
+'absolute truth' and always contains the canonical system state. There is a
|
|
|
+second component which lives on the bus (the client) which is intermittently
|
|
|
+connected to the garage (as limited by network availability). The goal of
|
|
|
+these two modules to to maintain synchronization with as great a degree of
|
|
|
+accuracy as is practical while taking the following constraints into
|
|
|
+account:
|
|
|
+
|
|
|
+ 1) Connectivity is spotty. A client may loose touch with the server at
|
|
|
+any time, and therefore must ALWAYS be in a state where it can function
|
|
|
+autonomously with a reasonable degree of predictability (it must not confuse
|
|
|
+or thwart the drivers or riders). This means there is a possibility of
|
|
|
+accepting a fare (for instance) on a pass that may have been used up during
|
|
|
+the communication outage, or other such conditions. We must always be sure
|
|
|
+to err in favor of permitting 'normal' system functions. It acceptable
|
|
|
+to occasionally give out what turns out to be a free ride on a recently
|
|
|
+expired pass, it is not acceptable to EVER reject a fare on a valid pass.
|
|
|
+
|
|
|
+ 2) Due to the multitude of busses and the intermittent nature of the
|
|
|
+connectivity, each client bus can only make relative declarations (Rider X
|
|
|
+has used 1 ride, decrement their pass) and the burden of aggregating and
|
|
|
+calculating the resulting system state lies on the server. Each bus may
|
|
|
+individually update its own state, but it must always allow an update from
|
|
|
+the server to override any local changes. An example scenario:
|
|
|
+
|
|
|
+ 2a) Rider-X boards Bus-Y with an n-Ride pass with 7 rides left.
|
|
|
+ 2b) Bus-Y decrements its LOCAL count for that pass to 6 rides.
|
|
|
+ 2c) Bus-Y transmits a message to the server "Rider-X's pass has used 1 ride"
|
|
|
+ 2d) Rider-X' (Rider-X's evil twin) gets on Bus-Z with another copy of the same pass.
|
|
|
+ 2e) Bus-Z decrements its LOCAL count for that pass to 6.
|
|
|
+ 2f) Bus-Z transmits a message to the server "Rider-X's pass has used 1 ride"
|
|
|
+ 2g) The server receives both decrement messages and transmits to ALL
|
|
|
+ busses the message "Rider-X's pass now has 5 rides left"
|
|
|
+ 2h) Bus-Y overwrites its LOCAL copy of Rider-X's pass count with the new
|
|
|
+ one from the server (5).
|
|
|
+ 2i) Bus-Y overwrites its LOCAL copy of Rider-X's pass count with the new
|
|
|
+ one from the server (5).
|
|
|
+
|
|
|
+ 3) For this method to work, the server must serialize all incoming events
|
|
|
+that have been transmitted from the individual busses and apply each change
|
|
|
+as a transaction in the master database. This is a related but distinct
|
|
|
+process from the next step.
|
|
|
+
|
|
|
+ 4) In incrementing serial number must be kept for each change in the
|
|
|
+master database. Each time a client bus checks in with the server it should
|
|
|
+supply the serial number of the last successfully integrated transaction (a
|
|
|
+transaction is counted as successfully integrated by the client once the
|
|
|
+local database has been updated and the changes have been successfully
|
|
|
+sync'd to secondary storage). The server mush then supply that client with
|
|
|
+a batch of all changes with a serial number greater than the supplied key.
|
|
|
+The client will then integrate those changes, synchronize to secondary
|
|
|
+storage, and then increment its own key. In this manner a client can go an
|
|
|
+arbitrary amount of time without checking in and then receive a batch of all
|
|
|
+messages that need processing in the mean time.
|
|
|
+
|
|
|
+ 5) Either end (server or client) always has the right to request a full
|
|
|
+synchronization where the client database is wiped and replaced wholesale by
|
|
|
+the server copy, and the serial number is updated to the newest in the
|
|
|
+transmitted 'full' copy. This can be invoked if a client hasn't received
|
|
|
+incremental updates in some time, or it can be used as a troubleshooting
|
|
|
+measure if there is a suspected transmission or storage error whereby the
|
|
|
+client appears to have in incomplete or incorrect database.
|
|
|
+
|