We are looking for a Site Reliability Engineer to work with one of the our leading client.
Job Description:
Demonstrated experience working in multi-cloud environments, with a strong focus on Google Cloud Platform (GCP).
Expertise with GKE, IAC, Kubernetes, and Terraform is essential.
Proven track record in designing, building, and maintaining core services and infrastructure.
Experience in the Financial Services or Banking sectors is highly desirable. Skills:
Confidence in independently troubleshooting complex system issues.
Ability to operate with a high level of autonomy and responsibility in fast-paced environments with evolving objectives.
Strong problem-solving abilities, particularly in identifying and addressing data quality issues.
A demonstrated commitment to continuous learning and driving improvements across teams.
Excellent communication and collaboration skills, with a strong attention to detail
Description for Internal Candidates:
Security - ensure high levels of security by design, along with architecting a platform which supports monthly patching and vulnerability management to meet company approved information security policies and procedures.
Lifecycle Support - support management of IT assets to ensure they are fully supported, including planning upgrades or replacements prior to end of life, to avoid increased risk or service interruption.
Availability - achieve SLA's by building and maintaining services with no Single Points of Failure, identifying weak or failing components for replacement before they cause incidents.
Capacity Support - configure and monitor infrastructure usage over time and with alerts to ensure we are always one step ahead' of demand.
Incident Support - configure and respond to monitoring alerts for issues with any devices, supporting incidents and escalating when required.
Problem Resolution - provide recommendations to avoid future incidents, including timely delivery of agreed solutions.
Configuration and Assets - maintain configuration repositories, including network diagrams, IT asset management system and agreed documentation.
Change Management - support the wider project and change programme, design and deliver agreed improvements following governance processes and industry best practices including documentation.
Releases - ensure all changes are released or made into controlled environments following agreed and repeatable processes, including roll-back to a known working state.