Backend Platform Engineer

Arunav Malhotra

Lead Software Development Engineer. Architect of high-scale distributed systems, operational reliability, and engineering team execution.

6+ years architecting backend platforms, scaling service infrastructure, and leading engineering execution across product growth, onboarding, and reliability systems. I own the systems that matter—designing for scale without fragility, simplifying operational complexity, and building teams that ship with confidence.

6+
Years Leading
99.9%
Uptime Track Record
6+
Engineers Mentored
Billions
Entities Handled
01

About

I architect and own backend systems that power product growth and reliability. Over 6+ years at SaaS Labs, I've progressed from SDE to Lead, designing platforms that handle billions of operations daily while maintaining operational simplicity and engineering velocity.

My focus spans distributed system design, service modernization, and operational reliability. I led the migration to stateless architecture across 50+ services, architected the contact management system (billions of entities, sub-millisecond queries), and built the infrastructure that enables confident rollouts at scale. Beyond systems, I mentor engineers on architecture decisions and drive execution across product, infrastructure, and reliability initiatives.

I operate at the intersection of technical depth and product leverage. Whether designing fraud detection systems, owning onboarding and billing infrastructure, or improving release confidence through CI/CD standards, my goal is always the same: reduce operational complexity, improve system behavior under scale, and enable teams to ship faster with fewer incidents.

Core Technologies

TypeScript
JavaScript
Node.js
NestJS
React
PostgreSQL
Docker
AWS
Kubernetes
Redis
Microservices
GraphQL

Leadership Focus

  • Distributed System Design
  • Architectural Modernization
  • Reliability & Observability
  • Platform Scaling
  • Cross-functional Ownership
  • Operational Excellence

Expertise Areas

  • Microservices & Stateless Design
  • Database Optimization
  • Real-time Systems
  • Production Hardening
  • Growth Systems (Onboarding, Billing)
  • Team Scaling & Mentoring
02

Experience

Lead Software Development Engineer

SaaS Labs

Oct 2023 - Present

2+ years

Key Achievements

  • Architected and owned migration to stateless architecture across 50+ services, enabling horizontal scaling and sustaining 99.9% uptime while reducing operational complexity
  • Designed and delivered Contact Management System handling billions of entities with sub-millisecond query latency; powers core platform for thousands of customers
  • Led backend architecture decisions, mentoring 6+ engineers on distributed systems design, reliability patterns, and production hardening practices
  • Built centralized IAM platform using LDAP, improving security posture and enabling enterprise customer support across 200+ organizations
  • Engineered self-healing infrastructure reducing manual incident response by 80%, improving engineer velocity and reliability
  • Modernized legacy platform: migrated 50+ services from monolithic PHP to distributed NestJS services with improved deployment velocity
  • Drove engineering standards across CI/CD, code review, and observability; improved rollout confidence enabling faster, safer releases
TypeScript
NestJS
Node.js
PostgreSQL
Docker
AWS
GitHub Actions
Redis

Senior Software Development Engineer

SaaS Labs

Mar 2022 - Oct 2023

1.5 years

Key Achievements

  • Owned onboarding and signup systems, designed security infrastructure reducing fraudulent signups by 45% while maintaining 99.8% legitimate user acceptance
  • Built and shipped billing infrastructure supporting usage-based pricing and subscriptions for thousands of SaaS customers; zero revenue leakage across 99.99% transactions
  • Delivered core growth systems: referral program, OTP verification (10K+ monthly signups), and feature flagging platform enabling safe experimentation at scale
  • Architected real-time systems: click-to-call infrastructure handling 500K+ monthly calls with intelligent routing and call analytics integration
  • Collaborated cross-functionally with product and marketing on PLG initiatives, translating business requirements into scalable backend systems
Node.js
React
PostgreSQL
Microservices
REST API
Event-driven

Software Development Engineer

SaaS Labs

Jan 2020 - Mar 2022

2.2 years

Key Achievements

  • Owned core platform modules (Marketplace, Signup, Onboarding, Demo Page), establishing patterns adopted across engineering org
  • Simplified billing infrastructure architecture; reduced onboarding time for new engineers by 40% through improved API design and documentation
  • Built scheduled calling and click-to-call systems processing 500K+ monthly calls, supporting go-to-market initiatives
  • Executed product launches cross-functionally with RevOps, Marketing, and Product; contributed to 3+ major GTM campaigns
PHP
JavaScript
Node.js
Express
MySQL
REST API
03

Case Studies

Distributed Contact Management at Scale

Challenge

Design and operate a distributed system managing billions of contact entities for thousands of customers. System must support sub-millisecond query latency, complex filtering, real-time updates, and horizontal scaling across shared infrastructure.

Solution

Architected a multi-layer system: PostgreSQL with sophisticated indexing strategies and query optimization, Redis caching layer for hot data, event-driven updates via Kafka for consistency, and Docker-based microservices enabling horizontal scaling. Implemented connection pooling, query plan analysis, and automated index management.

PostgreSQL
Microservices
Redis
Kafka
Docker
Query Optimization

Impact Delivered

Billions

Contacts Managed

Sub-millisecond p99 latency across 10K+ concurrent users, enabling real-time product experiences

Stateless Architecture Migration

Challenge

Transform monolithic stateful system into stateless distributed architecture. Existing system unreliable at 95% uptime, with 50+ legacy services tightly coupled. Requires zero-downtime migration while improving reliability.

Solution

Led platform-wide migration strategy: rewrote services for stateless design, implemented distributed tracing and correlation IDs, deployed load balancers with session affinity for gradual rollout, built automated failover and health checking. Implemented canary deployments and feature flags to enable safe, incremental migration.

Docker
Kubernetes
AWS
Load Balancing
Distributed Tracing
CI/CD

Impact Delivered

99.9%

Uptime Sustained

50+ services migrated with zero customer-facing downtime; 80% reduction in manual incident response

Onboarding & Growth Systems

Challenge

Build scalable onboarding pipeline supporting rapid user growth (10K+ signups/month). System must handle multi-channel OTP delivery, fraud detection, referral tracking, and billing initialization—all without impacting signup conversion.

Solution

Designed end-to-end onboarding system: async processing for non-critical paths, SMS/Email OTP delivery with retry logic, fraud scoring combining reCAPTCHA and behavioral analysis, referral tracking with eventual consistency, and billing initialization with rollback capabilities. Optimized database queries and caching to maintain sub-500ms signup flow.

Node.js
PostgreSQL
Event-driven
SMS/Email APIs
ML Scoring
Analytics

Impact Delivered

45%

Bot Signup Reduction

10K+ monthly signups sustained; 99.8% legitimate user acceptance rate maintained

Other Key Initiatives

Fraud Detection & Signup Security

Engineered comprehensive signup security system combining reCAPTCHA validation, device fingerprinting, and behavioral anomaly detection. Reduced fraudulent signups by 45% while maintaining 99.8% legitimate user acceptance—enabling confident signup growth.

reCAPTCHA
ML/Anomaly Detection
Device Fingerprinting
Analytics

Feature Flags & Safe Rollout Infrastructure

Built internal feature flag and experimentation platform enabling controlled rollouts across 50+ microservices. Supports canary deployments, A/B testing, and instant rollback without redeployment—improving release confidence and reducing rollout risk.

TypeScript
Node.js
Feature Management
Redis
Analytics

Centralized IAM with LDAP Integration

Architected enterprise-grade identity and access management system supporting LDAP federation, role-based access control, and audit logging. Enabled 200+ organizations to use platform with enterprise security requirements.

LDAP
OAuth2
Node.js
PostgreSQL
Audit Logging

Real-time Calling Infrastructure

Designed and shipped scalable calling system handling 500K+ monthly calls with intelligent routing, call recording, analytics integration, and CRM webhooks. Powers sales and support workflows for thousands of businesses.

Node.js
Telecom APIs
Real-time Routing
PostgreSQL
Webhooks

Billing & Usage-Based Pricing

Delivered complete billing system supporting usage-based pricing, subscriptions, recurring billing, and SaaS metrics. Supports thousands of customers with 99.99% transaction accuracy—enabling confidence in revenue operations.

Node.js
PostgreSQL
Stripe API
Event Sourcing
Reconciliation

Scalable Onboarding Pipeline

Built end-to-end onboarding system supporting 10K+ monthly signups with OTP verification, fraud scoring, referral tracking, and billing initialization. Optimized for speed (sub-500ms signup flow) and conversion without sacrificing security.

Node.js
PostgreSQL
Async Processing
SMS/Email APIs
Analytics
04

Technical Skills

Backend Engineering

TypeScript
Node.js
NestJS
Express
REST APIs
GraphQL

Architecture & Scale

Microservices
Distributed Systems
Stateless Design
Event-driven
Horizontal Scaling
Load Balancing

Data & Storage

PostgreSQL
Query Optimization
Redis
Kafka
MongoDB
Indexing Strategies

Infrastructure & Delivery

Docker
Kubernetes
AWS (EC2, RDS, S3)
GitHub Actions
CI/CD
Infrastructure as Code

Reliability & Observability

Distributed Tracing
Sentry
Prometheus
Grafana
Health Checks
Incident Response

Product Systems

Billing & Payments
Fraud Detection
Onboarding Flows
Analytics
Real-time Systems

Core Competencies

Distributed System Design

Architecting microservices, stateless services, and horizontally scalable systems handling billions of operations at high throughput and low latency.

Operational Reliability

Improving system behavior under scale through distributed tracing, self-healing infrastructure, automated failover, and incident response automation.

Service Modernization

Leading migrations of legacy monoliths to distributed architectures; designing rollout strategies that enable zero-downtime deployments.

Backend Platform Engineering

Owning critical growth systems (onboarding, billing, fraud detection) and reliability systems that enable product velocity without sacrificing quality.

Engineering Execution

Driving cross-functional collaboration, establishing technical standards, improving CI/CD velocity, and mentoring teams on architecture and production hardening.

Production Hardening

Building observability, health checks, automated rollback, circuit breakers, and incident response automation to operate complex systems confidently.

Build Scalable Systems Together

Let's Ship Reliable, Scalable Systems

Open to backend engineering, platform architecture, and engineering leadership roles. I'm focused on systems that scale without fragility, teams that ship with confidence, and technical leadership that moves products forward.

Or reach out directly at arunavmalhotra1998@gmail.com