Internal Network Failure causing all requests to fail [MAINTENANCE SCHEDULED]
We've upgraded our network orchestrator server, and have scheduled the upgrade of our SDN software on Tuesday, December 10, 2024 starting at 11am GMT. The site may be unavailable for up to 4 hours during this time.
We're upgrading the server our network orchestrator is running on, and there may be minor outages. - Cyan 06/11/2024 14:05
For those interested: We tracked down the outage being caused due to an out of memory event on our network orchestrator system, which caused a very very very rare event of a key rotation failing to write new keys properly. Bad keys were then pushed out to every node causing our internal network to fail, as no node could talk to another. We will upgrade our network orchestrator node with additional memory to prevent this failure from happening again, and look into upgrading the version of our SDN software which I'm assuming catches this error before it can cause problems. - Cyan 06/11/2024 13:42
We had a long outage due an issue with our internal network. All nodes stopped communicating with each other. We fixed this, but are still internally finding the cause to make sure it doesn't happen again. - Cyan 06/11/2024 08:55 GMT
Incident started - 06/11/2024 04:09 GMT
We're upgrading the server our network orchestrator is running on, and there may be minor outages. - Cyan 06/11/2024 14:05
For those interested: We tracked down the outage being caused due to an out of memory event on our network orchestrator system, which caused a very very very rare event of a key rotation failing to write new keys properly. Bad keys were then pushed out to every node causing our internal network to fail, as no node could talk to another. We will upgrade our network orchestrator node with additional memory to prevent this failure from happening again, and look into upgrading the version of our SDN software which I'm assuming catches this error before it can cause problems. - Cyan 06/11/2024 13:42
We had a long outage due an issue with our internal network. All nodes stopped communicating with each other. We fixed this, but are still internally finding the cause to make sure it doesn't happen again. - Cyan 06/11/2024 08:55 GMT
Incident started - 06/11/2024 04:09 GMT