site reliability engineer jobs

This is a remote position.

Interested in helping us change the world of payments forever? The Stellar Development Foundation (SDF) is looking for a talented, experienced, and hands-on Site Reliability Engineer to join our team. In this role, you will be ensuring the reliability of our services, building infrastructure to enable our team's production and testing environments, and greasing the rails of our systems to ensure they're robust, efficient, and easy to deploy.

You will:

Maintain, improve, scale and secure our AWS infrastructure and Ubuntu Linux systems.
Assist our development teams in running, packaging, deploying and troubleshooting applications
Work with developers on streamlining deployment processes with Jenkins and other tooling
Maintain, monitor and improve our Kubernetes clusters.
Work with development teams on migrating applications to Kubernetes.
Be responsible for maintenance and improvements to multiple internal services, for example Kubernetes, Prometheus, ELK and LDAP.
Monitor, triage and respond to alerts in our 24/7/365 environment.
Participate in design and code reviews, and ensure that the foundation for our services is best in class.
Evaluate new technologies, design and implement as appropriate.
Identify automation opportunities and implement by creating custom or by using off the shelf solutions.

Requirements:

You have 3+ years of experience of working in cloud-based systems operations, as a Linux systems administrator, SRE or DevOps engineer.
You’re very comfortable with Linux command line
You're a natural at troubleshooting and debugging - no issue is impossible to solve.
You have a good understanding of computer networking, TCP/UDP, load balancing, distributed computing, web services, and the fundamental protocols used by the internet (HTTP, HTTPS, DNS, etc.).
You have experience supporting production workloads and are familiar with monitoring concepts and tooling. You’re able to take part in an on-call rotation
You're proficient in at least one scripting language and you are familiar with a few (Ruby, Perl, Python, Bash, etc.).
You have first-hand experience with configuration management tools (Puppet, Chef, etc.), preferably Puppet.
You're always willing to do what it takes to help your teammates - especially in stressful situations.
You're enthusiastic about working in a small, growing team, you are open, empathetic, and care about putting the best ideas forward in a collaborative and helpful manner.
You can work independently and are able to deliver results without supervision

Nice to have

Familiarity with Docker and Kubernetes
Experience with Prometheus and Grafana
Experience with AWS
Ability to understand Go, C++ and TypeScript source code
Experience with CI pipelines and Jenkins
Deb or RPM packaging experience

Why work for us

You’ll have a lot of autonomy in the team
You’ll work with kubernetes in production and we’ll help you get up to speed if needed
You will be able to make visible impact quickly and will have a strong influence on the team’s direction, tooling, processes and technology choices
You will work on many open source projects that aim to improve financial inclusion on a global scale

What other positions do people seeking jobs search for?

Site Reliability Engineer