That won’t work in all cases. For instance, if you get messages from devices whi...

easytiger · on Oct 2, 2020

But in any case you can't rely on the order of message ingress to your system to represent anything meaningful either? It would have to ensure that the key for defining order would have some hard logical ordering purpose for which time is not relevant or useful.

jlokier · on Oct 2, 2020

The order of message ingress can still be meaningful even if device clocks are skew or jump due to rebooting, reimaging, network time sync, frequency drift, etc.

A hard logical order arises from interactions. E.g. if the device receives a message, does something locally, goes through a clock change, and then sends a message dependent on one it fetched earlier, that's a logical order with out-of-order clock.

Or if a device gets a message, processes, sends something to another device, that one processes too then sends another message back to the original source, there's a logical order but with three different clocks. Even if the clocks are synchronised, there will be some drift and the messages may be processed fast enough that the drift puts their timestamps out of order.

easytiger · on Oct 2, 2020

That guarantee/assumption can never really be made.

Message A on event a' from system a might be sent to system b effecting event b' and thus message B to be sent by <insert medium here>, and consumed and correlated by consumer software MC on hardware mc.

However system B might take longer to flush it's hardware/software buffer and the message arrives at mc before message A, for example.

I've encountered this many times. That data has no meaning in itself except in the meta.

> Even if the clocks are synchronised, there will be some drift and the messages may be processed fast enough that the drift puts their timestamps out of order.

If you are consuming from sources which you cannot control the accuracy of the clocks, then you must inherently either reduce your reliance on the need for accuracy (many Windows systems have horrendous clock discipline in their software implementation) or find a way of ensuring accuracy. E.g. close proximity NTP or close proximity PTP etc etc.

Hope that makes sense.

jlokier · on Oct 2, 2020

I think you and I are agreeing, but it's not obvious ;-)

> However system B might take longer to flush it's hardware/software buffer and the message arrives at mc before message A, for example.

There are two message As in your system, the one sent to system b, and the one consumed by hardware mc. Let's call them Ab and Amc.

In that situation, message B is a consequence of message Ab which can be tracked, and at system mc (depending on semantics) it might be necessary to use that tracking to process message Amc before message B, at least logically.

For example message Ab/Amc might be "add new user Foo the our database with standard policy", and system B might react with "ok, my job is to set the policy for new users, I'll tell mc to add permission Bar to user Foo then activate Foo".

That works out fine as long as the logical order relation is maintained, and system mc, the database, processes message Amc first, regardless of arrival order.

Dependency tracking can ensure that (without clocks or timestamps), even if messages are transmitted independently. For example by message B containing a header that says it comes logically after message A.

The pubsub queue can also guarantee that order (without clocks or timestamps or dependency tracking), provided all messages go through the same pubsub system, and Ab+Amc are fanned-out by the pubsub system rather than sent independently by A to each destination. All bets are off if Ab and Amc take other routes.

> If you are consuming from sources which you cannot control the accuracy of the clocks, then you must inherently either reduce your reliance on the need for accuracy (many Windows systems have horrendous clock discipline in their software implementation) or find a way of ensuring accuracy. E.g. close proximity NTP or close proximity PTP etc etc.

If you think Windows is bad, try just about any cloud VM, which has a virtual clock and is stalled all the time in the guest, including just after the guest reads its virtual clock and before using the value :-)

I prefer systems which rely on logical ordering guarantees as much as possible, so clock drift doesn't matter.

When you are rely on a global clock order to ensure correct behaviour, you have to slow down some operations to accommodate for clock variance across the system, as well as network latency variance (because it affects clock synchronisation).

If you rely on logical order, then there's no need for time delays; no upper speed limit. Instead you have to keep track of dependencies, or have implicit causal dependencies. And it's more robust on real systems because clock drift and jumps don't matter.

In practice you need some clock dependence for timeouts anyway, and there are protocol advantages when you can depend on a well synchronised clock. So I prefer to mix the two for different layers of the protocol, to get advantages of both. Unfortunately for correct behaviour, timeouts are often the "edge case" that isn't well tested or correctly implemented at endpoints.