[Community-Discuss] Afrinic Services DOWN

Nishal Goburdhan nishal at controlfreak.co.za
Sat Jun 1 19:00:58 UTC 2019

On 31 May 2019, at 18:46, francis asiboh via Community-Discuss wrote:

> Dear board members and all members of the community
> All Afrinic services (Whois, RPKI, Afrinic.net, etc..) were down 
> yesterday

you didn’t mention DNS.  if you did, it would have invalidated the 
“all” in your sentence.

> the 30th of May 2019 for a very long time. Board, where is the 
> Disaster
> Recovery Strategy in this particular kind of incident ?

the first thing you learn about a disaster recovery plan, is knowing 
_when_ to deploy it  (ie. what is classed as a disaster).   afrinic 
suffered from an outage of services in JNB for 3h53m.  nothing more.  it 
was not a disaster by any stretch of the imagination.  being hyperbolic, 
doesn’t help anyone.

at worst, your mail _to_ afrinic might have been slightly delayed (heck, 
you wouldn’t even get the 4h smtp warning!).  if you were doing 
validation, your RPKI cache would have not had the _most up to date_ 
ROAs  (but validation would have *still* worked!), and you wouldn’t 
have had been able to make a few DNS or WHOIS updates.

meanwhile, the internet, still carried on ..

> Since now, no Root Cause Analysis were sent for transparency to the
> community.

this is a good thing to ask for;  and i expect that this will be made 
available, as was the RFO after the last incident.
however, any RFO also includes a “how are we going to make sure that 
this is not going to happen again” part.  that bit actually takes time 
for analysis, planning, and, sign-off.  if anything, *this* is the part 
that you want afrinic to actually spend time and effort on, rather than 
simply whipping out a half-assed response to an outage.
so, it’s reasonable to believe that *this* part of the report, isn’t 
necessarily complete.  yet.

however, if you prefer to fit into the current climate ..
.. maybe the BoD is still debating whether to release ..

> I am surprised that the Infrastructure Unit Manager, Mr Cedric MBEYET
> turned a deaf ear and did not learnt his lesson from the last incident
> where RPKI services were down.

if you read the outage incident from the last time, you _should_ be able 
to figure out that an incident that occurred previously, occurred in 
mauritius, and, is, in no way, related to thursday’s incident in 
johannesburg.  which is in a different city.  thousands of kilometres 
away.  the golden hint _should_ have been, that last time around, it was 
a certificate renewal that failed;  which won’t impact afrinic’s 
website, or whois services, or .. that were not available now.

unless, you have direct evidence that they are related.  in which case, 
for transparency, you should release that, eh?

> Instead of taking care of Afrinic services, the current board of 
> directors
> is busy hiding public document from its members.

the BoD is not meant to run afrinic operations;  they have enough to do. 
  whilst i share your distaste for their actions, in reading some of the 
shocking revelations that are emerging in other threads, this, at least, 
is not something you should be equating to the BoD.

read the RFO when it’s released;  and, if you think there’s 
something that’s glaringly poorly done, then, feel free to point it 
out.  posturing, without data, is just poor form.

network engineer  (retired)

ps.  btw, even with almost 4h of outage, depending on what you’re 
measuring, afrinic are still on track for 99.9% uptime, even if 
there’s a second 4h outage this year.

pps.  i happen to know that afrinic *does* have at least one copy of a 
disaster recovery plan.  i am somewhat still familiar with the contents 
of this, and, given the nature of what is involved in activating that 
(and then, reverting) in cedrick’s shoes, i would have made the same 
call for a 4h outage.  you’re free to disagree.

More information about the Community-Discuss mailing list