Architecture.txt 9.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170
  1. Okay, this is a little rough but here goes. There are two main hubs
  2. of data flow involved in this system:
  3. The first is local to the embedded system and is implemented by the
  4. ipc_server daemon (source located in the commhub/ directory).
  5. The second is a WAN based synchronization service that is responsible
  6. for striving for the best possible balance between always-ready and eventual
  7. consistency between the database state stored on the bus (the client) and
  8. the master database state maintained by the garage (the server).
  9. ---------------------------------- Interprocess Communication Hub -----------
  10. This module provides majordomo-like subscription based interprocess
  11. communication between local modules using a UNIX domain socket in
  12. SOCK_SEQPACKET mode. This server functions as a smart router/switch
  13. allowing each module to address any/all other modules with any
  14. of the following addresing modes:
  15. 1) Mailbox Name (this will deliver to every process that has subscribed to the named mailbox).
  16. 2) "_BROADCAST" (this will deliver to all connected clients)
  17. 3) ">nnn" (this will deliver the message to the client with PID == nnn)
  18. 4) ":module" (this will deliver to all clients registered with the supplied module name).
  19. The server also manages a number of "special" mailboxes that execute
  20. specific communication flow management functions. These mailboxes are as
  21. follows (these are #define'd string constants that resolve to the actual
  22. special mailbox names):
  23. 1) MAILBOX_SUBSCRIBE (subscribes the sending client to the mailbox named in the payload)
  24. 2) MAILBOX_UNSUBSCRIBE (unsubscribes the sending client to the specified mailbox)
  25. 3) MAILBOX_BROADCAST (sends the message to all connected clients)
  26. 4) MAILBOX_WIRETAP (payload == "ON" to select wiretap mode, anything else to deselect).
  27. When a client select wiretap mode it will receive an unmodified copy of
  28. each and every message from that point forward (until wiretap mode is
  29. deselected). This includes all special management messages and PID or
  30. module addressed message. This allows for easy message flow debugging
  31. (including debugging the communication hub itself) and for robust on-the-fly
  32. selectable logging in the field without any need to recompile or even
  33. interrupt the system. This allows us to attach to a running system in any
  34. state (including failure states) and transparently observe data flow between
  35. modules.
  36. The server listens on a socket located at /tmp/commhub. Clients connect
  37. using the connect_to_message_server() function which returns a file
  38. descriptor that represents the connection to the IPC server. This function
  39. makes the socket connection and registers the client process with the server
  40. by both PID and module name.
  41. Once a client is registered, messages can be sent to the server by
  42. defining a message structure and populating it with the prepare_message()
  43. function call and then calling the send_message() function to dispatch the
  44. message to the server.
  45. Incoming messages can be retrieved with the get_message() function after
  46. a call to poll() indicates POLLIN on the fd for the connection to the
  47. server. If there are no other file descriptors to poll, or a quick
  48. non-blocking check is desirable, there is a provided poll wrapper called
  49. message_socket_status() which can be passed the fd associated with the
  50. connection and a mask of MSG_SEND | MSG_RECV which will specify whether a
  51. message can be sent and/or whether there is a message waiting to be received
  52. on that connection at the present moment.
  53. An important feature of this architecture is that it allows for the
  54. modules communicating through the hub to be started, stopped, and replaced
  55. independently and transparently. If the modules are designed with sensible
  56. opaque interfaces this allows for a remarkable about of flexibility in
  57. implementation and testing as modules can be fashioned to function as test
  58. jigs to test other individual modules or complex subsystems of modules.
  59. Test jigs can even be built to emulate fault conditions to test system
  60. recovery and failure-workaround.
  61. Modules can also be built to hide hardware-servicing routines. For
  62. instance, there could be a module that was responsible for communicating
  63. with every peripheral that lives in the passenger interface unit (RFID,
  64. Magstripe, Cash Vault, and Passenger Facing Display). This module can
  65. simply multiplex and demultiplex the serial communications with those
  66. peripherals and create mailboxes to receive any data that needs to be
  67. transmitted to those peripherals and send any input received from those
  68. peripherals to some interface-specified mailboxes (one per peripheral and/or
  69. one per message type). The beauty of this abstraction layer is two-fold.
  70. First, it allows things like rebooting the PIU to happen without the need to
  71. interrupt or reload any other process (minimizing collateral damage from
  72. unneccesary depdendencies). The second place this pays off is in the case
  73. where a hardware change causes a peripheral to migrate from one subsystem to
  74. another: In this case, only the modules that directly service I/O to those
  75. peripherals need to be aware of this change... all other modules will
  76. continue to use the agreed upon message passing interface and the fact that
  77. the (for instance) cash-valut relay has been moved from the Passenger
  78. Interface Unit to the Driver Interface Unit will not matter to any of the
  79. other modules that need to make use of that service.
  80. This design was heavily influenced by the elegance of the Erlang
  81. ERTS framework. My goal here to make a very lightweight and portable
  82. C-based system that provides a similar type of modularization and
  83. abstraction such that we can have maximum ongoing implementation flexibility
  84. and robust testing similar to that provided by the ERTS framework without
  85. incurring the overhead (and hassle) of maintaining out own cross-compiled
  86. ARM port of ERTS (and without compelling all the other developers to learn
  87. Erlang (which with such a tight development schedule would be madness)).
  88. -------- Client (bus) to Server (garage) Synchronization System ---------
  89. This module actually consists of a server component in the garage which
  90. is responsible for maintaining a master database which is considered the
  91. 'absolute truth' and always contains the canonical system state. There is a
  92. second component which lives on the bus (the client) which is intermittently
  93. connected to the garage (as limited by network availability). The goal of
  94. these two modules to to maintain synchronization with as great a degree of
  95. accuracy as is practical while taking the following constraints into
  96. account:
  97. 1) Connectivity is spotty. A client may loose touch with the server at
  98. any time, and therefore must ALWAYS be in a state where it can function
  99. autonomously with a reasonable degree of predictability (it must not confuse
  100. or thwart the drivers or riders). This means there is a possibility of
  101. accepting a fare (for instance) on a pass that may have been used up during
  102. the communication outage, or other such conditions. We must always be sure
  103. to err in favor of permitting 'normal' system functions. It acceptable
  104. to occasionally give out what turns out to be a free ride on a recently
  105. expired pass, it is not acceptable to EVER reject a fare on a valid pass.
  106. 2) Due to the multitude of busses and the intermittent nature of the
  107. connectivity, each client bus can only make relative declarations (Rider X
  108. has used 1 ride, decrement their pass) and the burden of aggregating and
  109. calculating the resulting system state lies on the server. Each bus may
  110. individually update its own state, but it must always allow an update from
  111. the server to override any local changes. An example scenario:
  112. 2a) Rider-X boards Bus-Y with an n-Ride pass with 7 rides left.
  113. 2b) Bus-Y decrements its LOCAL count for that pass to 6 rides.
  114. 2c) Bus-Y transmits a message to the server "Rider-X's pass has used 1 ride"
  115. 2d) Rider-X' (Rider-X's evil twin) gets on Bus-Z with another copy of the same pass.
  116. 2e) Bus-Z decrements its LOCAL count for that pass to 6.
  117. 2f) Bus-Z transmits a message to the server "Rider-X's pass has used 1 ride"
  118. 2g) The server receives both decrement messages and transmits to ALL
  119. busses the message "Rider-X's pass now has 5 rides left"
  120. 2h) Bus-Y overwrites its LOCAL copy of Rider-X's pass count with the new
  121. one from the server (5).
  122. 2i) Bus-Y overwrites its LOCAL copy of Rider-X's pass count with the new
  123. one from the server (5).
  124. 3) For this method to work, the server must serialize all incoming events
  125. that have been transmitted from the individual busses and apply each change
  126. as a transaction in the master database. This is a related but distinct
  127. process from the next step.
  128. 4) In incrementing serial number must be kept for each change in the
  129. master database. Each time a client bus checks in with the server it should
  130. supply the serial number of the last successfully integrated transaction (a
  131. transaction is counted as successfully integrated by the client once the
  132. local database has been updated and the changes have been successfully
  133. sync'd to secondary storage). The server mush then supply that client with
  134. a batch of all changes with a serial number greater than the supplied key.
  135. The client will then integrate those changes, synchronize to secondary
  136. storage, and then increment its own key. In this manner a client can go an
  137. arbitrary amount of time without checking in and then receive a batch of all
  138. messages that need processing in the mean time.
  139. 5) Either end (server or client) always has the right to request a full
  140. synchronization where the client database is wiped and replaced wholesale by
  141. the server copy, and the serial number is updated to the newest in the
  142. transmitted 'full' copy. This can be invoked if a client hasn't received
  143. incremental updates in some time, or it can be used as a troubleshooting
  144. measure if there is a suspected transmission or storage error whereby the
  145. client appears to have in incomplete or incorrect database.