korrelan, you're used to highly parallel systems, you got a comment on this?
I've only just noticed this
In theory when your architecture is up and running any machine can exchange information or functions with any other machine. There has to be some kind of synchronisation between the various elements.
I use a ram disk on a ‘master’ machine to control my cluster. Each of my six, four core nodes runs a standard bench mark (for load balancing/ some are doing other tasks/ vision/ etc) and then flags the master it’s ready for it’s first chunk. The master splits the data load into blocks appropriate to each nodes resources/ performance and allocates it to the node. A flag system on the master is updated by each node when it’s finished processing the data. When all nodes have returned they have finished, the time cycles for the task are re-calculated and adjusted appropriately and the next block is sent. Through load balancing optimum throughput is eventually achieved.
I have evening access to a 1500+ core cluster and the above method works great. Think of it as a botnet designed for parallel processing. Data transmission and local storage/ network speeds can make a big difference to a nodes performance so Load Balancing is key. Parallel processing across nodes only becomes efficient over a certain work load; using fewer nodes can sometimes be more efficient when network latency is a governing factor to overall system performance.
You could use a similar method; each small application would register through TCP pipe with the master machine. You could either check on each overall program cycle for the synchronising of data transfer/ processing, or each node could also notify the master how many cycles it requires to complete an approximate task, this would drastically cut down on delay loops waiting for data.
A second similar method…
I've recently had to write a server-less network system. Each node can exchange data with any other node with no central server. So if a node gets destroyed the data within the network remains intact and the network looses no data/ functionality. All nodes seek other nodes in range and exchange encrypted data packets from any node to any other node. Once a node enters the arena/ range it’s instantly updated with all network telemetry/ data. Once a node has received a packet it’s flagged as read, this again spreads across the network instructing nodes to delete that packet and stop transmitting it. Basically all nodes know all information at all times, as quickly a possible, and every node has complete picture of the state of the entire network. Also if a node stops responding, etc all other nodes are notified on a priority basis. It was a pain to write.
So the two methods are basically the same except one is server less… and much more complicated to code.
I suppose you could have a hybrid of the two methods, but keeping track of where the master synchronisation flag stack is as more nodes join and leave collections would be a nightmare.
Anyway… whether it is a single master or a globally shared metric, something some were has to keep track of available functions/ resources otherwise you are going to hit massive bottlenecks and waste a lot of processor cycles, one frozen node could bring the whole lot down.
Edit: Also once my nodes have been allocated their quota of cortical columns only the transient data between columns and a little extra for neurogenesis, etc is transmitted over the networks to other nodes. This usually forms a packet of a predictable size, ish.
Perhaps if you where to locally sync the nodes to master time pulse and then break any data down into a regular sized packet (10K), if each node only exchanged data at this set size but sent the full file size in the first packet other nodes could calculate how many system cycles would be required by each node to do a set job.
Hmmmm… Gonna need some more thought