Principal responsibilities:
-Design, implement, maintain highly available, scalable infrastructure solutions, leveraging automation to streamline operations.
-Monitsystem performance, proactively identify potential issues, drive incident response root cause analysis.
-Collaborate with cross-functional teams (development, product, security) to integrate reliability best practices the entire software lifecycle.
-Develop manage automation scripts, CI/CD pipelines, infrastructure-as-code (IaC) frameworks to enhance efficiency reduce manual intervention.
-Optimize cloud resources, cost management, disaster recovery strategies to ensure business continuity.
Qualifications :
-Experience: Minimum 5 years in IT operations Site Reliability Engineering, with a focus on infrastructure management system optimization.
-Technical Skills: Proficiency in operation control tools such as Ansible, Puppet, Chef, Terraform, Prometheus, Grafana, ELK Stack.
-Strong scripting skills in Python, Shell, similar languages.
Cloud Competency: Solid experience with majcloud platforms (AWS, Azure, GCP), including services like EC2, Lambda, Kubernetes, containerization.
-Problem-Solving: Proven ability to troubleshoot complex issues across distributed systems, networks, applications.
-Communication: Excellent written verbal communication skills, with the ability to collaborate effectively in a fast-paced, dynamic environment.
Preferred Qualifications:
-3+ years of dedicated experience in cloud service operations, with expertise in cloud-native architectures microservices.
-Certifications in AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect, equivalent.
-Experience with service mesh technologies (e.g., Istio) observability tools (e.g., Jaeger).
-Familiarity with DevOps culture practices, including agile methodologies continuous improvement frameworks.
-Bonus: Proven experience in developing IT operation maintenance tools using Python, demonstrating the ability to automate complex workflows solve real - world problems.
更新于 2025-12-16
查看更多崗位職責(zé)
1、Design, build maintain batch real-time data systems pipelines
2、Develop ETL (extract, transform, load) processes to help extract manipulate data from multiple sources
3、Maintain optimize the data infrastructure required faccurate extraction, transformation, loading of data from a wide variety of data sources
Automate data workflows such as data ingestion, aggregation, ETL processing
4、Prepare raw data in OLAP databases a consumable dataset fboth technical non-technical stakeholders
5、Partner with data scientists functional leaders in different business units to deploy machine learning models
6、Ensure data accuracy, integrity, privacy, security, compliance through quality control procedures
7、Monitdata systems performance implement optimization strategies
Leverage data controls to maintain data privacy, security, compliance, quality fallocated areas of ownership
Required Knowledge Skills
1、Advanced SQL skills & experience in relational databases database design
2、Strong proficiency in data pipeline workflow management
3、Experience building deploying machine learning models
4、Experience working with Kubernetes container platform
5、Great numerical, analytical problem-solving skills
6、Excellent communicationganizational skills
7、Proven ability to work independently with a team
IT Knowledge Applications
1、Experience working with data streaming related platforms (e.g., Apache Kafka, ksqlDB, Apache Pino, etc.)
2、Experience working with large data sets distributed computing (e.g., Hadoop/Spark, Presto/Trino, Superset, etc.)
3、Proficiency in programming languages, e.g., Python, Java, Go, etc.
更新于 2026-01-13
查看更多崗位職責(zé)