We are seeking a SeniSRE & DevOps Engineer (L2) with deep expertise in Azure PaaS, AKS, cloud infrastructure. This role is responsible fdesigning, automating, operating production-grade platforms, driving reliability, scalability, security, cost efficiency at scale.
Key Responsibilities
Platform Reliability & SRE Practices
• Define, implement, enforce SLIs, SLOs, SLAs, errbudgets.
• Architect high availability, resilience, fault-tolerant Azure platforms.
• Lead incident management, on-call strategy, blameless postmortems.
• Drive toil reduction through automation self-healing systems.
• Perform capacity planning, load testing, resilience testing.
DevOps & Platform Engineering
• Design maintain enterprise-grade CI/CD pipelines (app & infra).
• Build platform blueprints, reusable modules, golden paths.
• Lead Infrastructure as Code using Terraform and/Bicep.
• Implement advanced deployment strategies:
o Blue/Green
o Canary releases
o Feature flags
• Standardize release environment provisioning across teams.
Azure Cloud Engineering (PaaS & Infra)
Azure Infrastructure
• Architect manage:
o Management Groups, Subscriptions, Resource Groups
o VNets, NSGs, ASGs, Route Tables
o VNet Peering, Private Link, Private Endpoints
o VPN Gateway, ExpressRoute
o Application Gateway (WAF), Front Door, Traffic Manager
o VM Scale Sets, Azure Bastion
o Backup Site Recovery
• Design secure identity using:
o Microsoft Entra ID (Azure AD)
o RBAC, Managed Identities, PIM
Azure PaaS & Application Platform
• Own AKS production clusters:
o Cluster lifecycle, upgrades, scaling, networking, security
• Implement container platforms using:
o ACR, Helm, Kustomize
• Design solutions using:
o App Service, Functions, Logic Apps
o API Management
o Service Bus, Event Hubs, Event Grid
o Azure SQL, Cosmos DB, Managed PostgreSQL/MySQL
o Azure Cache fRedis
• Implement secure secret certificate management with Key Vault.
Observability & Operational Excellence
• Build operate advanced observability stacks:
o Grafana, Mimir, Loki, Tempo, Thanos, Vector
o Azure MonitLog Analytics
• actionable dashboards, alerts, runbooks.
• Enable distributed tracing alert hygiene.
• Continuously reduce MTTR through automation telemetry.
Security, Governance & Compliance (DevSecOps)
• Enforce secure-by-design cloud architectures.
• Implement governance using:
o Azure Policy
o Tagging standards
o Budgets cost controls
• Integrate security CI/CD:
o IaC scanning
o Container dependency scanning
• Support audits, compliance, regulatory requirements.
• Use Defender fCloud Sentinel (preferred).
Cost & Efficiency (FinOps)
• Drive cloud cost optimization through:
o Rightsizing
o Autoscaling
o Consumption-aware PaaS design
• Implement budgets, alerts, cost governance standards.
Required Skills & Experience
Mandatory
• 6+ years of experience in SRE / DevOps / Cloud Engineering
• Deep hands-on expertise with Azure PaaS & Infrastructure
• Strong AKS production operations experience
• Advanced Terraform and/Bicep (modular, reusable)
• CI/CD expertise with Azure DevOps GitHub Actions
• Deep understanding of Docker Kubernetes internals
• Strong scripting skills (PowerShell, Bash, Python)
• Proven ownership of 24x7 production incidents
Preferred / Nice-to-Have
• Azure Landing Zone implementation experience
• Service Mesh (Istio / OSM)
• OpenTelemetry & advanced observability
• Microsoft Defender fCloud, Sentinel
• Azure certifications: AZ 400, AZ 305
更多