Senior Software Engineer, Platform
ROLE SUMMARY
Firmus Technologies is seeking a Senior Software Engineer focussing on Platform Engineering to join our Engineering and Technology team. You will drive the enhancement of our observability capabilities to achieve ClusterMAX Platinum tier recognition from SemiAnalysis. You will also enhance internal tooling to improve developer and operations productivity. This role is ideal for a self-starter with passion for building things from first principles. You naturally break down complex problems into their fundamental truths to uncover novel and elegant solutions—rather than relying on conventional patterns.
KEY RESPONSIBILITIES
- Drive and collaborate with AI/ML engineers to develop and integrate AI/ML application-level monitoring from the ground up, including model accuracy tracking and performance observability.
- Develop purpose-built Prometheus exporters to provide necessary granularity for robust low-level components and interconnect fabric monitoring.
- Build and enhance internal tooling to automate workflows, improve developer and operations productivity, and streamline platform operations (e.g., dashboards, CLI tools, automation scripts, self-service portals).
- Continuously improve automated test coverage and effectiveness by adopting new testing frameworks, tools, and best practices.
- Own net-new product experiments (e.g., VR with Meta Quest), driving innovation from concept to production deployment and mass adoption.
- Contribute to the adoption and integration of AI-augmented development tools and workflows.
SKILLS AND EXPERIENCE
- Bachelor's degree in computer science or a related technical field.
- 7+ years of experience as Software Engineer, with a minimum of 3 years in a dedicated Platform/Observability engineering focus role.
- Demonstrated strong proficiency on the following areas:
- Modern application development frameworks and languages (e.g., Go, Python, Node.js).
- Advanced querying and optimization using SQL, PromQL, LogQL, GraphQL.
- Observability stack (e.g., Loki, Grafana, Tempo, Prometheus, Thanos, ClickHouse).
- Data streaming (e.g., Kafka, Pulsar).
- Automated unit, integration, security, load and end-to-end testing frameworks (e.g., Pytest, JUnit, K6, Go test, Cypress) and integrating tests into CI/CD pipelines.
- Cloud platforms (e.g., AWS, Azure, or GCP).
- Containerization technologies (e.g., Docker).
- Experience with AI-augmented development tools and workflows.
- Working knowledge on configuration management and CI/CD (e.g., Ansible, GitHub Actions, Jenkins, ArgoCD)
- Clear and effective English communication, written and spoken.
- Bonus Points:
- Familiarity with Linux internals, networking stacks, distributed storage and high-performance computing.
- Experience in high-growth startups or regulated industries with robust security and data privacy requirements, including SOC 2 Type 2 and ISO 27001.
About Sustainable Metal Cloud
Our vision is to move cloud computing towards net zero, with solutions forged through advanced technology. Partnering with NVIDIA to provide large-scale GPU AI infrastructure.
WHY YOU'LL LOVE WORKING HERE
Our team shares a passion for possibility, knowing that our technology enables ideas across the world. Ideas that can reshape the course of progress and break down traditional boundaries.