[Community-Discuss] 06 April 2019 RPKI incident - Postmortem report

Saul Stein saul at enetworks.co.za
Wed Apr 10 09:55:46 UTC 2019


Agreed.



There is a bigger issue at stake here: I have yet to see any evidence that 
AFRINIC takes RPKI seriously.

The last issue I had, when no ROAs could be added, deleted etc, it was 
admitted that the issue was known about for over two weeks without anything 
on the announce list or being fixed! After escalation to the CEO and others 
it was fixed in a couple of hours!



RPKI is serious and needs to be taken seriously. We can’t continuously be 
having issues with it. It  is like customs at immigration being offline!



Cheers

Saul



From: Mark Tinka [mailto:mark.tinka at seacom.mu]
Sent: 10 April 2019 08:32 AM
To: community-discuss at afrinic.net
Subject: Re: [Community-Discuss] 06 April 2019 RPKI incident - Postmortem 
report



Thanks, Cedrick.

A question that is, perhaps, obvious... are you able to take the human 
component out of this? If 2 reminders were not enough to get the humans to 
act, I'm not sure the current methodology is sustainable.

Mark.

On 8/Apr/19 17:46, Cedrick Adrien Mbeyet wrote:

Dear AFRINIC community,



Find below postmortem report on the incident that happen on 06 April 2019.



The AFRINIC RPKI engine has an offline part that has to be renewed on a 
monthly bases. The process is known, documented and automated reminders set. 
The system is set to send 2 reminders each month, one 15 days prior to the 
expiry date and the second one 7 days before expiry. On the 2nd half of 
March, the monitoring system sent a reminder to perform the offline refresh 
but this was not acted upon.





On Saturday 06 April 2019,  Certificate revocation List (CRL) and the 
manifest file of AFRINIC RPKI repository expired (around 07:24AM UTC). Our 
monitoring system picked this up. The immediate action was to generate new 
certificates and manifest file and upload them onto RPKI engine system.



The failure was as a result of human error, no changes were made on the 
system but we have taken additional steps to the existing process to ensure 
that this does not happen again. We do acknowledge that it is unacceptable 
to have such a failure with critical infrastructure and necessary done in 
this regard.





We do apologize for the inconvenience caused and thank you for your patience 
in this regard.

-- 
_______________________________________________________________
Cedrick Adrien Mbeyet
Infrastructure Unit Manager, AFRINIC Ltd.
t:  +230 403 5100 / 403 5115 | f: +230 466 6758 | tt: @afrinic | w: 
www.afrinic.net <http://www.afrinic.net>
facebook.com/afrinic | flickr.com/afrinic | youtube.com/afrinicmedia
______________________________________________________




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.afrinic.net/pipermail/community-discuss/attachments/20190410/91e44319/attachment.html>


More information about the Community-Discuss mailing list