Principal responsibilities:
-Design, implement, maintain highly available, scalable infrastructure solutions, leveraging automation to streamline operations.
-Monitsystem performance, proactively identify potential issues, drive incident response root cause analysis.
-Collaborate with cross-functional teams (development, product, security) to integrate reliability best practices the entire software lifecycle.
-Develop manage automation scripts, CI/CD pipelines, infrastructure-as-code (IaC) frameworks to enhance efficiency reduce manual intervention.
-Optimize cloud resources, cost management, disaster recovery strategies to ensure business continuity.
Qualifications :
-Experience: Minimum 5 years in IT operations Site Reliability Engineering, with a focus on infrastructure management system optimization.
-Technical Skills: Proficiency in operation control tools such as Ansible, Puppet, Chef, Terraform, Prometheus, Grafana, ELK Stack.
-Strong scripting skills in Python, Shell, similar languages.
Cloud Competency: Solid experience with majcloud platforms (AWS, Azure, GCP), including services like EC2, Lambda, Kubernetes, containerization.
-Problem-Solving: Proven ability to troubleshoot complex issues across distributed systems, networks, applications.
-Communication: Excellent written verbal communication skills, with the ability to collaborate effectively in a fast-paced, dynamic environment.
Preferred Qualifications:
-3+ years of dedicated experience in cloud service operations, with expertise in cloud-native architectures microservices.
-Certifications in AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect, equivalent.
-Experience with service mesh technologies (e.g., Istio) observability tools (e.g., Jaeger).
-Familiarity with DevOps culture practices, including agile methodologies continuous improvement frameworks.
-Bonus: Proven experience in developing IT operation maintenance tools using Python, demonstrating the ability to automate complex workflows solve real - world problems.
更新于 2025-12-16
查看更多崗位職責(zé)
Duties Responsibilities:
- Deploy manage Kubernetes clusters cloud-native applications
- Maintain infrastructure as code, ensuring scalability, reliability, security.
- Collaborate closely with development operation teams to ensure seamless integration between infrastructure application code.
- Maintain observability monitoring solutions flog aggregation, metrics, tracing.
- Contribute to the continuous improvement of automation infrastructure practices.
- On-call duty: you will be responsible fmaintaining the infrastructure. This includes being available fon-call rotations to troubleshoot resolve production issues as needed.
- Stay up-to-date with the latest cloud technologies industry trends.
職責(zé)與要求:
- 部署和管理 Kubernetes 集群及云原生應(yīng)用;
- 維護(hù)基礎(chǔ)設(shè)施即代碼(Infrastructure as Code),確保系統(tǒng)的可擴(kuò)展性、可靠性與安全性;
- 與開發(fā)和運(yùn)維團(tuán)隊(duì)緊密協(xié)作,確?;A(chǔ)設(shè)施與應(yīng)用代碼的無(wú)縫集成;
- 維護(hù)可觀察性及監(jiān)控方案(日志聚合、指標(biāo)采集、鏈路追蹤);
- 參與并推動(dòng)自動(dòng)化與基礎(chǔ)設(shè)施實(shí)踐的持續(xù)優(yōu)化;
- 輪班值守:需參與基礎(chǔ)架構(gòu)的維護(hù)工作,包括按輪班安排隨時(shí)響應(yīng)并解決生產(chǎn)環(huán)境問(wèn)題;
- 持續(xù)跟進(jìn)*新的云技術(shù)與行業(yè)趨勢(shì)。
Required Skills & Experience(E2 - DevOps Engineer):
- Bachelors degree above in a relevant field.
- Excellent communication collaboration abilities in English language.
- 3+ years of related professional experience.
- Skilled in Linux/Unix system management developing operational scripts (Python/Shell/etc.).
- Familiarity in containerization technologies cloud platforms (Azure/Alibaba Cloud/etc.).
- Good experience with Kubernetes, including deployment, scaling, troubleshooting.
- Expertise fInfrastructure as Code (IaC) managing cloud resources with terraform.
- First hands on experience with observability tools such as Grafana Loki.
- Familiarity with SAP BTP (Business Technology Platform) is a plus.
- Good troubleshooting problem-solving skills.
- Willingness to engage in on-call duty.
招聘要求(E2 - DevOps工程師)
- 計(jì)算機(jī)或相關(guān)專業(yè)本科及以上學(xué)歷。
- 具備優(yōu)秀的英語(yǔ)溝通與協(xié)作能力。
- 3年以上相關(guān)領(lǐng)域工作經(jīng)驗(yàn)。
- 熟練掌握Linux/Unix系統(tǒng)管理及運(yùn)維腳本開發(fā)Python/Shell等。
- 熟練掌握Kubernetes集群的部署、擴(kuò)縮容及故障排查。
- 熟悉容器化技術(shù)及主流云平臺(tái)(Azure/阿里云等)。
- 熟練使用Terraform實(shí)現(xiàn)基礎(chǔ)設(shè)施即代碼(IaC),管理云資源。
- 熟練使用Grafana、Loki等可觀測(cè)性工具。
- 熟悉SAP Business Technology Platform優(yōu)先。
- 優(yōu)秀的故障排查與問(wèn)題解決能力。
- 愿意參與輪班值守(On-call Duty)。
更新于 2026-03-28
查看更多崗位職責(zé)