|Benefits:||Salary (based on experience) + benefits|
We are looking for an exceptional Site Reliability Engineer, with experience in building software and leading a team. If you’re highly skilled with an analytical, troubleshooting mindset, and want to join a team creating groundbreaking tech for development, we’d love to talk.
Praekelt uses technology to design and build human-centered systems that transform businesses at scale. Our products aim to reach everyone in Africa and push the boundaries of digital user experience and technology. We rely on deep synthesis and collaboration between our strategy, design, and engineering teams to produce delightful and unexpected results.
We believe in the power of co-creation. Our clients are active participants in the product development process. We partner with them to make purpose-driven, relevant, and functional experiences that achieve measurable business objectives.
You will primarily be responsible for:
- Uptime of critical sites and systems
- Maintenance of servers (security patches, backups, and keeping services in line with current best practices)
- Deployment of new products
- Troubleshooting of performance issues
- Knowledge of configuration automation software such as puppet, salt, or ansible (we are currently using salt on our legacy systems)
- Assisting with resources to facilitate developers’ jobs, and keep them operational. This includes continuous integration systems (Travis CI), software deployment and basic troubleshooting of code (not the correction of it), and creation and management of software repositories (Github, PPAs, and PyPi).
- Ensuring servers are patched against security exploits in time, managing secure access to servers and repositories for partners and internal staff, and secure interconnection between systems (IPSEC and OpenVPN).
- Ensuring our servers are configured in a documented and repeatable way. Performing firmware updates to hardware where advised by vendors. Replacing failed components. Ensuring the overall architecture is appropriate to the requirements of projects, is easily maintainable in the long term, and provides appropriate levels of redundancy. Ensuring all servers have at least 7 working daily backups.
- Provide uptime assurance after hours on standby. SRE also ensures monitoring is correctly configured on all servers. While we can’t guarantee the recovery time of any failure, as this may fall outside of our control, we endeavour to investigate any issue which results in the critical unavailability of a production service within 20 minutes.
- General support (problems, password changes, etc) of office infrastructure, Gmail, Slack, and Jira.
- Site load testing, unit testing, and disaster recovery testing, and quality assurance on a system level including backend performance, deployment sanity, security, scalability, and stability.
- Advise on and/or contribute to new or emerging technologies that might be relevant to Praekelt.
- Work well within cross-functional teams in order to produce world-class products and programmes that empower end-users.
Tech and service design partners that make digital things for real people.
- An honours degree in Computer Science or Engineering (or equivalent experience).
- Excellent knowledge of Linux (we standardise on Ubuntu LTS with occasional exposure to RHEL/OEL).
- Knowledge of database administration and maintenance (primarily PostgreSQL, occasional MySQL).
- Clustered services and high availability techniques (HTTP load balancers, VRRP, DNS failover services, etc)
- Some knowledge of networking (iproute2, IPTables, IPSec, OSPF)
- An in-depth understanding of common internet protocols (HTTP, DNS)
- Experience in cloud computing(GCP and AWS)
- Configuration management (Salt or equivalent DevOps stack)
- Some knowledge of development tools and revision control. We make extensive use of Git and Github for everything. The majority of our software is written in Python (Asyncio and Django) and Elixir.
- Extensive knowledge of Docker and Kubernetes.
- Knowledge of Helm, Terraform and Vault.
- Experience in monitoring and observability tools such as prometheus, datadog, newrelic
- The ability to remain calm under pressure
- Passionate about the digital industry and an avid consumer of digital media – always in the know about the latest trends, technologies and platforms
- The ability to think strategically, act quickly, multi-task and work collaboratively in an environment that values creativity and flexibility.
Services we currently use:
- Docker, Dockerhub,Kubernetes and Rancher
- Github and Travis CI
- Prometheus, Datadog, Newrelic, Sentry
- Memcached, Rabbitmq, Redis, Nginx
- AWS and GCP
- Postgres and Mysql (AWS RDS and GCP SQL)
- Citrix Xenserver and Ubuntu
Praekelt is committed to creating a fully inclusive and diverse environment that embraces difference and cultivates inclusivity. We actively encourage applicants of all races, ethnicities, religions, ages, genders, sexual orientations, and abilities.