Keep in touch
Loading...
menu

[HCM] Senior Machine Learning Systems Engineer

Why Join Us?

We are building AI platforms that empower teams to develop, monitor, and scale intelligent systems efficiently. As a Senior ML Systems Engineer, you will design and maintain the internal backbone that supports the end-to-end ML lifecycle — from data annotation to model deployment.

This role bridges infrastructure, observability, and developer experience — helping Data Scientists, ML Engineers, and Production Engineers work faster, safer, and with deeper insight into their systems and data. You’ll be working across distributed systems, automation, and platform tooling that make large-scale AI operations seamless and reliable.

What You’ll Do

Design and maintain distributed systems supporting large-scale data and model workflows.

Build automation and internal tools for cluster management, job scheduling, and data synchronization.

Develop microservices, APIs, or SDKs to connect internal ML tools and services.

Contribute to documentation and establish best practices for operations and reliability.

Implement monitoring and alerting for compute, storage, and network resources.

Collaborate with cross-functional teams to ensure platform stability and alignment between data, infra, and ML pipelines.

Participate in capacity planning, scaling, and incident response to improve reliability and developer productivity.

Work closely with global teams to align tooling, standards, and system governance.

What We’re Looking For

Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related field.

3+ years of experience in systems, platform, or full-stack engineering roles.

Strong programming skills in Python, JavaScript/TypeScript, or Go.

Hands-on experience with Linux systems, Docker/Kubernetes, and cloud services (AWS/GCP/Azure).

Familiarity with web frameworks (FastAPI, Flask, React, or similar) for internal dashboards and tools.

Experience integrating observability stacks (Prometheus, Grafana, ELK, or OpenTelemetry).

Understanding of data management workflows and metadata tracking.

Fluent English communication to collaborate with international teams.

Preferred Qualifications

Experience supporting GPU clusters or ML workloads at scale.

Experience in observability systems (Prometheus, Grafana, OpenTelemetry).

Experience with ML experimentation or monitoring platforms (MLflow, Kubeflow).

Familiarity with data orchestration tools and message queues (Kafka, Pub/Sub).

Familiarity with CI/CD and automation frameworks for internal tools.

Proven ability to design tools that improve developer and researcher productivity.

Apply via email: send your CV to contact@kwise.io

Buy
 
  • Chia sẻ qua viber bài: [HCM] Senior Machine Learning Systems Engineer
  • Chia sẻ qua reddit bài:[HCM] Senior Machine Learning Systems Engineer

Key Watch

Weekly Wrap

Loading...