| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170 |
- Okay, this is a little rough but here goes. There are two main hubs
- of data flow involved in this system:
- The first is local to the embedded system and is implemented by the
- ipc_server daemon (source located in the commhub/ directory).
- The second is a WAN based synchronization service that is responsible
- for striving for the best possible balance between always-ready and eventual
- consistency between the database state stored on the bus (the client) and
- the master database state maintained by the garage (the server).
- ---------------------------------- Interprocess Communication Hub -----------
- This module provides majordomo-like subscription based interprocess
- communication between local modules using a UNIX domain socket in
- SOCK_SEQPACKET mode. This server functions as a smart router/switch
- allowing each module to address any/all other modules with any
- of the following addresing modes:
- 1) Mailbox Name (this will deliver to every process that has subscribed to the named mailbox).
- 2) "_BROADCAST" (this will deliver to all connected clients)
- 3) ">nnn" (this will deliver the message to the client with PID == nnn)
- 4) ":module" (this will deliver to all clients registered with the supplied module name).
- The server also manages a number of "special" mailboxes that execute
- specific communication flow management functions. These mailboxes are as
- follows (these are #define'd string constants that resolve to the actual
- special mailbox names):
- 1) MAILBOX_SUBSCRIBE (subscribes the sending client to the mailbox named in the payload)
- 2) MAILBOX_UNSUBSCRIBE (unsubscribes the sending client to the specified mailbox)
- 3) MAILBOX_BROADCAST (sends the message to all connected clients)
- 4) MAILBOX_WIRETAP (payload == "ON" to select wiretap mode, anything else to deselect).
- When a client select wiretap mode it will receive an unmodified copy of
- each and every message from that point forward (until wiretap mode is
- deselected). This includes all special management messages and PID or
- module addressed message. This allows for easy message flow debugging
- (including debugging the communication hub itself) and for robust on-the-fly
- selectable logging in the field without any need to recompile or even
- interrupt the system. This allows us to attach to a running system in any
- state (including failure states) and transparently observe data flow between
- modules.
- The server listens on a socket located at /tmp/commhub. Clients connect
- using the connect_to_message_server() function which returns a file
- descriptor that represents the connection to the IPC server. This function
- makes the socket connection and registers the client process with the server
- by both PID and module name.
- Once a client is registered, messages can be sent to the server by
- defining a message structure and populating it with the prepare_message()
- function call and then calling the send_message() function to dispatch the
- message to the server.
- Incoming messages can be retrieved with the get_message() function after
- a call to poll() indicates POLLIN on the fd for the connection to the
- server. If there are no other file descriptors to poll, or a quick
- non-blocking check is desirable, there is a provided poll wrapper called
- message_socket_status() which can be passed the fd associated with the
- connection and a mask of MSG_SEND | MSG_RECV which will specify whether a
- message can be sent and/or whether there is a message waiting to be received
- on that connection at the present moment.
- An important feature of this architecture is that it allows for the
- modules communicating through the hub to be started, stopped, and replaced
- independently and transparently. If the modules are designed with sensible
- opaque interfaces this allows for a remarkable about of flexibility in
- implementation and testing as modules can be fashioned to function as test
- jigs to test other individual modules or complex subsystems of modules.
- Test jigs can even be built to emulate fault conditions to test system
- recovery and failure-workaround.
- Modules can also be built to hide hardware-servicing routines. For
- instance, there could be a module that was responsible for communicating
- with every peripheral that lives in the passenger interface unit (RFID,
- Magstripe, Cash Vault, and Passenger Facing Display). This module can
- simply multiplex and demultiplex the serial communications with those
- peripherals and create mailboxes to receive any data that needs to be
- transmitted to those peripherals and send any input received from those
- peripherals to some interface-specified mailboxes (one per peripheral and/or
- one per message type). The beauty of this abstraction layer is two-fold.
- First, it allows things like rebooting the PIU to happen without the need to
- interrupt or reload any other process (minimizing collateral damage from
- unneccesary depdendencies). The second place this pays off is in the case
- where a hardware change causes a peripheral to migrate from one subsystem to
- another: In this case, only the modules that directly service I/O to those
- peripherals need to be aware of this change... all other modules will
- continue to use the agreed upon message passing interface and the fact that
- the (for instance) cash-valut relay has been moved from the Passenger
- Interface Unit to the Driver Interface Unit will not matter to any of the
- other modules that need to make use of that service.
- This design was heavily influenced by the elegance of the Erlang
- ERTS framework. My goal here to make a very lightweight and portable
- C-based system that provides a similar type of modularization and
- abstraction such that we can have maximum ongoing implementation flexibility
- and robust testing similar to that provided by the ERTS framework without
- incurring the overhead (and hassle) of maintaining out own cross-compiled
- ARM port of ERTS (and without compelling all the other developers to learn
- Erlang (which with such a tight development schedule would be madness)).
- -------- Client (bus) to Server (garage) Synchronization System ---------
- This module actually consists of a server component in the garage which
- is responsible for maintaining a master database which is considered the
- 'absolute truth' and always contains the canonical system state. There is a
- second component which lives on the bus (the client) which is intermittently
- connected to the garage (as limited by network availability). The goal of
- these two modules to to maintain synchronization with as great a degree of
- accuracy as is practical while taking the following constraints into
- account:
- 1) Connectivity is spotty. A client may loose touch with the server at
- any time, and therefore must ALWAYS be in a state where it can function
- autonomously with a reasonable degree of predictability (it must not confuse
- or thwart the drivers or riders). This means there is a possibility of
- accepting a fare (for instance) on a pass that may have been used up during
- the communication outage, or other such conditions. We must always be sure
- to err in favor of permitting 'normal' system functions. It acceptable
- to occasionally give out what turns out to be a free ride on a recently
- expired pass, it is not acceptable to EVER reject a fare on a valid pass.
- 2) Due to the multitude of busses and the intermittent nature of the
- connectivity, each client bus can only make relative declarations (Rider X
- has used 1 ride, decrement their pass) and the burden of aggregating and
- calculating the resulting system state lies on the server. Each bus may
- individually update its own state, but it must always allow an update from
- the server to override any local changes. An example scenario:
- 2a) Rider-X boards Bus-Y with an n-Ride pass with 7 rides left.
- 2b) Bus-Y decrements its LOCAL count for that pass to 6 rides.
- 2c) Bus-Y transmits a message to the server "Rider-X's pass has used 1 ride"
- 2d) Rider-X' (Rider-X's evil twin) gets on Bus-Z with another copy of the same pass.
- 2e) Bus-Z decrements its LOCAL count for that pass to 6.
- 2f) Bus-Z transmits a message to the server "Rider-X's pass has used 1 ride"
- 2g) The server receives both decrement messages and transmits to ALL
- busses the message "Rider-X's pass now has 5 rides left"
- 2h) Bus-Y overwrites its LOCAL copy of Rider-X's pass count with the new
- one from the server (5).
- 2i) Bus-Y overwrites its LOCAL copy of Rider-X's pass count with the new
- one from the server (5).
- 3) For this method to work, the server must serialize all incoming events
- that have been transmitted from the individual busses and apply each change
- as a transaction in the master database. This is a related but distinct
- process from the next step.
- 4) In incrementing serial number must be kept for each change in the
- master database. Each time a client bus checks in with the server it should
- supply the serial number of the last successfully integrated transaction (a
- transaction is counted as successfully integrated by the client once the
- local database has been updated and the changes have been successfully
- sync'd to secondary storage). The server mush then supply that client with
- a batch of all changes with a serial number greater than the supplied key.
- The client will then integrate those changes, synchronize to secondary
- storage, and then increment its own key. In this manner a client can go an
- arbitrary amount of time without checking in and then receive a batch of all
- messages that need processing in the mean time.
- 5) Either end (server or client) always has the right to request a full
- synchronization where the client database is wiped and replaced wholesale by
- the server copy, and the serial number is updated to the newest in the
- transmitted 'full' copy. This can be invoked if a client hasn't received
- incremental updates in some time, or it can be used as a troubleshooting
- measure if there is a suspected transmission or storage error whereby the
- client appears to have in incomplete or incorrect database.
|