[Community-Discuss] Afrinic Services DOWN

Gregoire Ehoumi gregoire.ehoumi at yahoo.fr
Sat Jun 8 13:26:41 UTC 2019


Nishal,I read many times your inputs in response to the AFRINIC last outage and could not understand the rationale and what you were explaining and defending.AFRINIC as a Regional Internet Registry?? has committed to some technical expertise as described in section 5 of the ICP-2=====5) Technical expertiseThe new RIR must be technically capable of providing the required allocation and registration services to the community in its region. Specific technical requirements include provisioning by the RIR of:???production grade global Internet connectivity, in order to provide access to all services offered and for exchange of registry data to and from the other RIR-whois database server(s);???DNS servers to support Reverse DNS delegation;???suitable internal infrastructure for operational purposes; and???enough technically capable staff to ensure appropriate service levels to the LIRs, and to the Internet community.============This commitment was transferred in the AFRINIC Service Level Commitment (SLC) where at section 3, AFRINIC commits itself?? to 99.8% of services and network availability(*)I am teaching you nothing through my references above, as a former CTO of AFRINIC, you are supposed to know. These stuffs are may be good for the community. Upon this condition, how could we understand that AFRINIC encountered a downtime of almost 4 hours which you tried to undermine.The post-mortem report published (**)?? said:" Upon further investigation we determined that all AFRINIC???s equipment at our main data centre in Johannesburg had lost power "So these are the?? questions which come to mind:Had the whole datacenter lost all power sources ?If not, how come that AFRINIC lost all power sources to its equipment?Why did it take 4 hours to restore power ?the report further said:" It is unfortunate that this incident happened while the AFRINIC infrastructure enhancement plan is still in his implementation phase "This sounds like an acceptance of non-readiness to meet the service level. I urge AFRINIC to take this commitment as serious as it should be and deploy the efforts and resources for that. Let???s respect separation of roles and responsibilities and take responsibilities seriously. AFRINIC is 15 years old, AFRICA should no longer accept such?? breaches.--Gregoire(*) https://www.afrinic.net/commitment(**) https://lists.afrinic.net/pipermail/announce/2019/002063.html------ Original message------From: Nishal GoburdhanDate: Sat, Jun 1, 2019 3:03 PMTo: asiboh.francis at elude.in;Cc: community-discuss at afrinic.net;Subject:Re: [Community-Discuss] Afrinic Services DOWNOn 31 May 2019, at 18:46, francis asiboh via Community-Discuss wrote:

> Dear board members and all members of the community
> All Afrinic services (Whois, RPKI, Afrinic.net, etc..) were down 
> yesterday

you didn???t mention DNS.  if you did, it would have invalidated the 
???all??? in your sentence.


> the 30th of May 2019 for a very long time. Board, where is the 
> Disaster
> Recovery Strategy in this particular kind of incident ?

the first thing you learn about a disaster recovery plan, is knowing 
_when_ to deploy it  (ie. what is classed as a disaster).   afrinic 
suffered from an outage of services in JNB for 3h53m.  nothing more.  it 
was not a disaster by any stretch of the imagination.  being hyperbolic, 
doesn???t help anyone.

at worst, your mail _to_ afrinic might have been slightly delayed (heck, 
you wouldn???t even get the 4h smtp warning!).  if you were doing 
validation, your RPKI cache would have not had the _most up to date_ 
ROAs  (but validation would have *still* worked!), and you wouldn???t 
have had been able to make a few DNS or WHOIS updates.

meanwhile, the internet, still carried on ..


> Since now, no Root Cause Analysis were sent for transparency to the
> community.

this is a good thing to ask for;  and i expect that this will be made 
available, as was the RFO after the last incident.
however, any RFO also includes a ???how are we going to make sure that 
this is not going to happen again??? part.  that bit actually takes time 
for analysis, planning, and, sign-off.  if anything, *this* is the part 
that you want afrinic to actually spend time and effort on, rather than 
simply whipping out a half-assed response to an outage.
so, it???s reasonable to believe that *this* part of the report, isn???t 
necessarily complete.  yet.

however, if you prefer to fit into the current climate ..

. maybe the BoD is still debating whether to release ..



> I am surprised that the Infrastructure Unit Manager, Mr Cedric MBEYET
> turned a deaf ear and did not learnt his lesson from the last incident
> where RPKI services were down.

if you read the outage incident from the last time, you _should_ be able 
to figure out that an incident that occurred previously, occurred in 
mauritius, and, is, in no way, related to thursday???s incident in 
johannesburg.  which is in a different city.  thousands of kilometres 
away.  the golden hint _should_ have been, that last time around, it was 
a certificate renewal that failed;  which won???t impact afrinic???s 
website, or whois services, or .. that were not available now.


unless, you have direct evidence that they are related.  in which case, 
for transparency, you should release that, eh?



> Instead of taking care of Afrinic services, the current board of 
> directors
> is busy hiding public document from its members.

the BoD is not meant to run afrinic operations;  they have enough to do. 
  whilst i share your distaste for their actions, in reading some of the 
shocking revelations that are emerging in other threads, this, at least, 
is not something you should be equating to the BoD.

read the RFO when it???s released;  and, if you think there???s 
something that???s glaringly poorly done, then, feel free to point it 
out.  posturing, without data, is just poor form.

--n.
network engineer  (retired)

ps.  btw, even with almost 4h of outage, depending on what you???re 
measuring, afrinic are still on track for 99.9% uptime, even if 
there???s a second 4h outage this year.

pps.  i happen to know that afrinic *does* have at least one copy of a 
disaster recovery plan.  i am somewhat still familiar with the contents 
of this, and, given the nature of what is involved in activating that 
(and then, reverting) in cedrick???s shoes, i would have made the same 
call for a 4h outage.  you???re free to disagree.

_______________________________________________
Community-Discuss mailing list
Community-Discuss at afrinic.net
https://lists.afrinic.net/mailman/listinfo/community-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.afrinic.net/pipermail/community-discuss/attachments/20190608/6486800f/attachment.html>


More information about the Community-Discuss mailing list