Senior Site Reliability Engineer


The Wikimedia Foundation is looking for a Senior Site Reliability Engineer to join our team, reporting to the manager of the Search Platform team. As the Senior Site Reliability Engineer, you will be responsible for operating the platforms and services of the Search Platform team. In this role, you will maintain, upgrade and scale our Elasticsearch clusters, our RDF Graph endpoint and their supporting data infrastructure.

The Search Platform team is dedicated to “helping people easily discover knowledge on Wikipedia and its sister projects by providing tools and infrastructure for casual readers and expert users with precise needs, while maintaining a strong emphasis on privacy.” We are a multi-skill team composed of software engineers, a computational linguist, and a site reliability engineer. We are a fully remote team spread across UTC-8 to UTC+1 timezones, taking care of Search and the Wikidata Query Service,  timezones. When the world is not locked down by a pandemic, we see each other in person two or three times per year.

You are responsible for:

  • Deployment, scaling, monitoring, provisioning, and support of our Search and SPARQL endpoints
  • Developing and maintaining automation tools and processes
  • Providing guidance and expertise to the team on productionizing and operating our applications
  • Configuration management and deployment tools
  • Ensuring the continuous improvement and evolution of services on our platform
  • Monitoring of systems and services, optimization of performance and resource utilization
  • Incident response, diagnosis and follow-up on system outages or alerts
  • Assisting  in software updates

Skills and Experience:

  • 5+ years experience in an SRE/Operations/DevOps role as part of a team
  • Experience with shell and any scripting languages used in an SRE context (Python, Go, Bash, Ruby, etc., we use primarily Python)
  • Comfortable with Open Source configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack)
  • Good understanding of Linux systems
  • Experience in automating tasks and processes, identifying process gaps, and finding automation opportunities
  • Open to supporting JVM-based applications
  • Strong English language skills and ability to work independently, as an effective part of a globally distributed team
  • B.S. or M.S. in Computer Science or related field or equivalent in related work experience

Qualities that are important to us:

  • Be passionate about free culture / open source

Additionally, we’d love it if you have:

  • Experience with search
  • Experience with event streaming platforms (Kafka or similar)
  • Experience with Linux kernel tuning for high traffic loads
  • Developing/contributing to Free and Open Source software, or being part of an open-source community

The Wikimedia Foundation is… 

…the nonprofit organization that hosts and operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge, free of interference. We host the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive financial support from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.

Tagged as: 5+ Years