Cloud Infrastructure Engineer
About Us
We are an innovative startup at the forefront of applied artificial intelligence. Our mission is to build "synthetic engineers"—highly specialized AI agents designed to tackle critical challenges within the manufacturing sector. By creating a new generation of digital engineering expertise, we are empowering companies to enhance productivity, solve complex problems, and drive future innovation. We are looking for passionate individuals to join us in building the future of manufacturing.
Role Description
As a Cloud Infrastructure Engineer, you are the architect of our platform's foundation. You will be responsible for designing, building, and maintaining the scalable, secure, and resilient cloud infrastructure that underpins our entire operation—from our collaborative user-facing platform to our intensive AI model training and inference workloads. You will empower our engineering teams with automated, robust systems, ensuring our platform is ready to support our first enterprise clients and scale to a global, multi-tenant public offering.
Key Responsibilities
- Build and Manage Cloud Infrastructure: Design, deploy, and manage our core cloud infrastructure using Infrastructure as Code (IaC) principles.
- Automate Everything: Develop and maintain CI/CD pipelines to automate application deployment, system configuration, and testing processes.
- Ensure Reliability and Performance: Implement comprehensive monitoring, logging, and alerting to ensure high availability, performance, and system health.
- Container Orchestration: Manage and scale our containerized applications and services using Docker and Kubernetes.
- Support AI/ML Workloads: Collaborate with AI engineers to provision and manage the infrastructure required for large-scale model training and real-time inference.
- Uphold Security Best Practices: Implement and enforce security policies, manage access controls, and ensure the infrastructure is compliant and secure against threats.
- Optimize for Cost and Efficiency: Continuously monitor and optimize our cloud resource utilization to ensure cost-effectiveness as we scale.
Required Qualifications
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- 3+ years of hands-on experience in a Cloud Engineering, DevOps, or Site Reliability Engineering (SRE) role.
- Deep expertise with a major cloud provider (AWS, GCP, or Azure).
- Strong, hands-on experience with Infrastructure as Code tools like Terraform or AWS CDK.
- Proven experience with containerization (Docker) and orchestration systems (Kubernetes).
- Solid understanding of CI/CD principles and experience with tools like GitLab CI, Jenkins, or CircleCI.
- Proficiency in a scripting language such as Python or Bash.
Preferred Qualifications
- Professional cloud certifications (e.g., AWS Certified Solutions Architect, Certified Kubernetes Administrator).
- Experience building and managing infrastructure for computationally intensive AI/ML workloads.
- Deep knowledge of cloud networking, security groups, and identity and access management (IAM).
- Experience with observability tools like Prometheus, Grafana, or Datadog.
- Previous experience in a B2B SaaS startup, especially with designing multi-tenant architectures.
What We Offer
- A competitive salary and equity package.
- A pivotal role in a fast-growing startup with a clear and impactful mission.
- The opportunity to work on cutting-edge AI technology and solve real-world problems in a critical industry.
- A collaborative, innovative, and supportive team environment.
- Flexible work arrangements.
If you are an infrastructure expert who is passionate about building the highly available, scalable foundation for a world-changing AI platform, we encourage you to apply.
