Okay, this is a little rough but here goes. There are two main hubs of data flow involved in this system: The first is local to the embedded system and is implemented by the ipc_server daemon (source located in the commhub/ directory). The second is a WAN based synchronization service that is responsible for striving for the best possible balance between always-ready and eventual consistency between the database state stored on the bus (the client) and the master database state maintained by the garage (the server). ---------------------------------- Interprocess Communication Hub ----------- This module provides majordomo-like subscription based interprocess communication between local modules using a UNIX domain socket in SOCK_SEQPACKET mode. This server functions as a smart router/switch allowing each module to address any/all other modules with any of the following addresing modes: 1) Mailbox Name (this will deliver to every process that has subscribed to the named mailbox). 2) "_BROADCAST" (this will deliver to all connected clients) 3) ">nnn" (this will deliver the message to the client with PID == nnn) 4) ":module" (this will deliver to all clients registered with the supplied module name). The server also manages a number of "special" mailboxes that execute specific communication flow management functions. These mailboxes are as follows (these are #define'd string constants that resolve to the actual special mailbox names): 1) MAILBOX_SUBSCRIBE (subscribes the sending client to the mailbox named in the payload) 2) MAILBOX_UNSUBSCRIBE (unsubscribes the sending client to the specified mailbox) 3) MAILBOX_BROADCAST (sends the message to all connected clients) 4) MAILBOX_WIRETAP (payload == "ON" to select wiretap mode, anything else to deselect). When a client select wiretap mode it will receive an unmodified copy of each and every message from that point forward (until wiretap mode is deselected). This includes all special management messages and PID or module addressed message. This allows for easy message flow debugging (including debugging the communication hub itself) and for robust on-the-fly selectable logging in the field without any need to recompile or even interrupt the system. This allows us to attach to a running system in any state (including failure states) and transparently observe data flow between modules. The server listens on a socket located at /tmp/commhub. Clients connect using the connect_to_message_server() function which returns a file descriptor that represents the connection to the IPC server. This function makes the socket connection and registers the client process with the server by both PID and module name. Once a client is registered, messages can be sent to the server by defining a message structure and populating it with the prepare_message() function call and then calling the send_message() function to dispatch the message to the server. Incoming messages can be retrieved with the get_message() function after a call to poll() indicates POLLIN on the fd for the connection to the server. If there are no other file descriptors to poll, or a quick non-blocking check is desirable, there is a provided poll wrapper called message_socket_status() which can be passed the fd associated with the connection and a mask of MSG_SEND | MSG_RECV which will specify whether a message can be sent and/or whether there is a message waiting to be received on that connection at the present moment. An important feature of this architecture is that it allows for the modules communicating through the hub to be started, stopped, and replaced independently and transparently. If the modules are designed with sensible opaque interfaces this allows for a remarkable about of flexibility in implementation and testing as modules can be fashioned to function as test jigs to test other individual modules or complex subsystems of modules. Test jigs can even be built to emulate fault conditions to test system recovery and failure-workaround. Modules can also be built to hide hardware-servicing routines. For instance, there could be a module that was responsible for communicating with every peripheral that lives in the passenger interface unit (RFID, Magstripe, Cash Vault, and Passenger Facing Display). This module can simply multiplex and demultiplex the serial communications with those peripherals and create mailboxes to receive any data that needs to be transmitted to those peripherals and send any input received from those peripherals to some interface-specified mailboxes (one per peripheral and/or one per message type). The beauty of this abstraction layer is two-fold. First, it allows things like rebooting the PIU to happen without the need to interrupt or reload any other process (minimizing collateral damage from unneccesary depdendencies). The second place this pays off is in the case where a hardware change causes a peripheral to migrate from one subsystem to another: In this case, only the modules that directly service I/O to those peripherals need to be aware of this change... all other modules will continue to use the agreed upon message passing interface and the fact that the (for instance) cash-valut relay has been moved from the Passenger Interface Unit to the Driver Interface Unit will not matter to any of the other modules that need to make use of that service. This design was heavily influenced by the elegance of the Erlang ERTS framework. My goal here to make a very lightweight and portable C-based system that provides a similar type of modularization and abstraction such that we can have maximum ongoing implementation flexibility and robust testing similar to that provided by the ERTS framework without incurring the overhead (and hassle) of maintaining out own cross-compiled ARM port of ERTS (and without compelling all the other developers to learn Erlang (which with such a tight development schedule would be madness)). -------- Client (bus) to Server (garage) Synchronization System --------- This module actually consists of a server component in the garage which is responsible for maintaining a master database which is considered the 'absolute truth' and always contains the canonical system state. There is a second component which lives on the bus (the client) which is intermittently connected to the garage (as limited by network availability). The goal of these two modules to to maintain synchronization with as great a degree of accuracy as is practical while taking the following constraints into account: 1) Connectivity is spotty. A client may loose touch with the server at any time, and therefore must ALWAYS be in a state where it can function autonomously with a reasonable degree of predictability (it must not confuse or thwart the drivers or riders). This means there is a possibility of accepting a fare (for instance) on a pass that may have been used up during the communication outage, or other such conditions. We must always be sure to err in favor of permitting 'normal' system functions. It acceptable to occasionally give out what turns out to be a free ride on a recently expired pass, it is not acceptable to EVER reject a fare on a valid pass. 2) Due to the multitude of busses and the intermittent nature of the connectivity, each client bus can only make relative declarations (Rider X has used 1 ride, decrement their pass) and the burden of aggregating and calculating the resulting system state lies on the server. Each bus may individually update its own state, but it must always allow an update from the server to override any local changes. An example scenario: 2a) Rider-X boards Bus-Y with an n-Ride pass with 7 rides left. 2b) Bus-Y decrements its LOCAL count for that pass to 6 rides. 2c) Bus-Y transmits a message to the server "Rider-X's pass has used 1 ride" 2d) Rider-X' (Rider-X's evil twin) gets on Bus-Z with another copy of the same pass. 2e) Bus-Z decrements its LOCAL count for that pass to 6. 2f) Bus-Z transmits a message to the server "Rider-X's pass has used 1 ride" 2g) The server receives both decrement messages and transmits to ALL busses the message "Rider-X's pass now has 5 rides left" 2h) Bus-Y overwrites its LOCAL copy of Rider-X's pass count with the new one from the server (5). 2i) Bus-Y overwrites its LOCAL copy of Rider-X's pass count with the new one from the server (5). 3) For this method to work, the server must serialize all incoming events that have been transmitted from the individual busses and apply each change as a transaction in the master database. This is a related but distinct process from the next step. 4) In incrementing serial number must be kept for each change in the master database. Each time a client bus checks in with the server it should supply the serial number of the last successfully integrated transaction (a transaction is counted as successfully integrated by the client once the local database has been updated and the changes have been successfully sync'd to secondary storage). The server mush then supply that client with a batch of all changes with a serial number greater than the supplied key. The client will then integrate those changes, synchronize to secondary storage, and then increment its own key. In this manner a client can go an arbitrary amount of time without checking in and then receive a batch of all messages that need processing in the mean time. 5) Either end (server or client) always has the right to request a full synchronization where the client database is wiped and replaced wholesale by the server copy, and the serial number is updated to the newest in the transmitted 'full' copy. This can be invoked if a client hasn't received incremental updates in some time, or it can be used as a troubleshooting measure if there is a suspected transmission or storage error whereby the client appears to have in incomplete or incorrect database.