Site Reliability Engineer

If you have ever watched a television show or live event or enjoyed a movie in VOD on your phone or tablet, this experience was likely brought to you through an ATEME solution.

We are ATEME (PARIS: ATEME). We are the video delivery leader helping leading content providers, service providers and pure streaming players boost their engagement, acquire new viewers, and create new sources of revenues. Leveraging our continuous investment in R&D and innovation, we empower our customers to deliver a high quality of experience on any screen.

Delivering video experiences also has an impact on our world. That’s why our multiple award-winning engineering teams design efficient and flexible solutions that cut waste, with no compromise on quality. So that viewers can enjoy their unique experiences – and the world we live in – well into the future.

Thanks to a strong CSR policy that reinforces our mission to “Make the entertainment and video experience captivating, greener, and accessible to everyone,” we strive every day to build a better and more sustainable future for our planet, our people, and our ecosystem.

At ATEME, we value innovation, collaboration, empowerment, agility, and everyone’s contributions. We offer cross-culture enrichment thanks to employees of 30 different nationalities. We consider the globe as our playground and we facilitate mobility internationally, especially in our offices in France, Sao Paulo, Denver, New York and Singapore.

Be part of our team and join our fantastic journey!

Position overview:

ATEME is looking for a Site Reliability Engineer (SRE) to join our team in Belgrade, Serbia, working on-site at a major broadcast customer. This role is designed to facilitate and support the customer directly on their premises in Belgrade, ensuring the reliability and performance of ATEME’s video delivery solutions.

Core Responsibilities:

• Support the evolution of the platforms in collaboration with development teams and clients, within an agile framework.

• Maintain and improve existing infrastructures, ensuring issues are promptly resolved and systems are properly documented.

• Ensure service continuity and guarantee a high level of system availability.

• Participate in infrastructure automation, monitoring, and observability enhancements.

• Deploy new projects, upgrades, and technologies in production and staging environments.

Use Case–Specific Responsibilities:

• Perform technical monitoring, system supervision, and Level 2 troubleshooting, with escalation when standard procedures are insufficient.

• Handle operational calls from the customer.

• Coordinate and manage change requests, ensuring smooth and controlled rollouts.

• Provide monthly downtime and system health reporting.

• Conduct regular backups, restore validations, disaster recovery (DR), and redundancy tests.

• Perform system security scans, assess vulnerabilities, and apply mitigation measures.

• Manage pre-production staging environments on the main site and POP sites.

Engineering & Reliability Focus:

• Design and execute test plans for every system upgrade.

• Create upgrade procedures aimed at minimizing operational impact, including scripts, Method of Procedure (MOP), and rollback strategies.

• Enhance and evolve the system of alerts, alarms, and observability metrics.

• Calculate and report uptime, service-level compliance, and reliability metrics.

Requirements:

Technical Skills:

• Basic knowledge in Linux/Unix system administration and networking fundamentals.

• Interest in learning infrastructure automation tools (e.g., Ansible, Terraform, helm, helmfile, ...).

• Familiarity with monitoring and observability concepts tools (e.g., Prometheus, Grafana.).

• Awareness of backup, restore, and disaster recovery principles.

• Familiarity with at least one cloud provider (AWS, GCP, Azure, Huawei) or hybrid infrastructures.

• Some scripting experience (Bash, Python, or equivalent) for simple automation and troubleshooting tasks.

• Understanding of basic system and application security best practices.

Soft Skills:

• Eagerness to learn and develop troubleshooting and problem-solving skills.

• Good communication and collaboration skills to work within a team.

• Ability to document tasks procedures, in a clear and structure way.

• Motivated, proactive, and curious, with an interest in reliability and continuous improvement.

• Ability to communicate effectively in English (speaking and writing).

Location: On-site in Belgrade, Serbia

EQUAL EMPLOYMENT OPPORTUNITY

ATEME SA and all its subsidiaries is an Equal Opportunity / Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status.

Site Reliability Engineer

Position overview:

Use Case–Specific Responsibilities:

Engineering & Reliability Focus:

Requirements:

Technical Skills:

Soft Skills:

About ATEME

Site Reliability Engineer

Already working at ATEME?

Site Reliability Engineer

Position overview:

Use Case–Specific Responsibilities:

Engineering & Reliability Focus:

Requirements:

Technical Skills:

Soft Skills:

Our jobs

About ATEME

Site Reliability Engineer

Already working at ATEME?