RELIABILITY AGENT

AMITH

Site Reliability Engineer

Infrastructure Commander • Chaos Warrior • DevOps Champion

THREAT LEVEL:

Site Reliability Engineer with production experience at PhonePe, managing large-scale Linux infrastructure, NGINX edge systems, distributed databases, and observability platforms. Experienced in on-call operations, incident response, HA cluster design, and infrastructure automation across bare metal environments. Strong focus on reliability, performance, and reducing operational toil through internal tooling.

AMITH

COMBAT HISTORY

Site Reliability Engineer

PhonePe Private Limited, Bangalore

July 2024 - Present

  • Participate in 24x7 on-call rotations for staging and production environments, responding to incidents and troubleshooting issues to maintain service availability
  • Manage the NGINX edge layer serving millions of requests per second, supporting configuration changes and zero-downtime deployments
  • Administer distributed data systems including MySQL Galera, Percona, and Aerospike, handling schema changes, truncates, checksums, and production maintenance tasks
  • Built and maintained observability infrastructure for 800+ hosts using Prometheus, Grafana, InfluxDB, Zabbix, Riemann, and PMM, improving infrastructure visibility and issue detection
  • Designed and operated high-availability clusters for Percona/MariaDB, RabbitMQ, Elasticsearch, and Aerospike on bare metal infrastructure with KVM/QEMU
  • Developed internal automation tools including a Slackbot that improved operational efficiency by 70%, a JIRA request portal, and a CMR generator that reduced documentation work from hours to seconds

Site Reliability Engineer Intern

PhonePe Private Limited, Bangalore

February 2024 - July 2024

  • Completed training in Linux systems, distributed databases, and configuration management tools
  • Developed monitoring project for system states, including services, CPU usage, and memory usage, using SaltStack for configuration management and MongoDB for data storage

POWERS & ABILITIES

📊

OBSERVABILITY ARCHITECT

Expert in building observability infrastructure for 800+ hosts. Prometheus, Grafana, InfluxDB expertise.

Prometheus Grafana InfluxDB Zabbix Riemann PMM
🗄️

DISTRIBUTED DATABASE MASTER

Manage and operate high-availability clusters for MySQL Galera, Percona, Aerospike, and RabbitMQ.

MySQL Galera Percona Aerospike RabbitMQ Elasticsearch MariaDB
⚙️

CONFIGURATION AUTOMATION

Infrastructure automation and configuration management using SaltStack, Ansible, and Bash scripting.

SaltStack Ansible Bash Scripting Python Automation Infrastructure Code
🌐

EDGE LAYER OPERATOR

Manage NGINX edge layer serving millions of requests per second with zero-downtime deployments.

NGINX TCP/IP DNS Load Balancing Keepalived
🔧

BARE METAL VIRTUALIZATION

Design and operate HA clusters on bare metal infrastructure with KVM/QEMU hypervisor-based virtualization.

KVM QEMU Bare Metal HA Clusters Hypervisor
🚀

INTERNAL TOOLING ENGINEER

Developed Slackbot (70% efficiency gain), JIRA portals, and CMR generator (hours to seconds).

Python Slackbot GitLab CI/CD Git Automation Tools

COMPLETED MISSIONS AT PHONPE

⚡ ACTIVE

24x7 Production On-Call Operations

Participate in round-the-clock on-call rotations for staging and production environments, responding to incidents and troubleshooting issues to maintain service availability for millions of users.

Uptime: 99.9%+ Response Time: < 5 mins
✓ OPERATIONAL

NGINX Edge Layer Management

Manage and operate the NGINX edge layer serving millions of requests per second. Execute configuration changes and implement zero-downtime deployments with precision.

Requests/sec: Millions Downtime: Zero
✓ OPERATIONAL

Distributed Database Administration

Administer and maintain distributed data systems including MySQL Galera, Percona, and Aerospike. Execute schema changes, checksums, truncates, and production maintenance tasks.

Systems: 3 Major DBs Availability: HA Clusters
✓ COMPLETED

Observability Infrastructure (800+ Hosts)

Built and maintained comprehensive observability infrastructure for 800+ hosts using Prometheus, Grafana, InfluxDB, Zabbix, Riemann, and PMM. Dramatically improved infrastructure visibility and incident detection.

Hosts Monitored: 800+ Detection: Real-time
✓ COMPLETED

High-Availability Cluster Design

Designed and operated high-availability clusters for Percona/MariaDB, RabbitMQ, Elasticsearch, and Aerospike on bare metal infrastructure using KVM/QEMU virtualization.

Clusters: 4 Systems Reliability: Enterprise Grade
✓ COMPLETED

CMR Generator & Automation Tools

Sole developer of production automation platform generating 1500+ CMRs and database scripts. Implemented error-zero steps architecture for DB alters and deployment scripts, reducing production incidents to nearly 0%. Also developed Slackbot (70% efficiency boost) and JIRA portal.

CMRs Generated: 1500+ Production Errors: ~0% Sole Developer

ARSENAL INVENTORY

PROGRAMMING & SCRIPTING

Python
Bash Scripting
Shell
Linux Proficient

CONFIGURATION MANAGEMENT

SaltStack
Ansible
In-house Tools
Automation

MONITORING & OBSERVABILITY

Prometheus
Grafana
InfluxDB
Zabbix
Riemann
PMM

DATABASES & DATA SYSTEMS

MySQL Galera
Percona
MariaDB
Aerospike
RabbitMQ
Elasticsearch

VIRTUALIZATION & INFRASTRUCTURE

KVM
QEMU
Bare Metal
HA Systems
Hypervisor
Keepalived

VERSION CONTROL & CI/CD

Git
GitLab
GitLab CI/CD
Pipelines
Slackbot
Automation

NETWORKING & SECURITY

TCP/IP
DNS
IP Tables
Load Balancing
NGINX
HA Architecture

AGENT TRAINING HISTORY

🎓

Bachelor of Engineering with Honors (BE Hons)

Computer Science and Engineering

BMS Institute of Technology, Bangalore

GPA: 9.13 / 10 2020 – 2024

INITIATE CONTACT

Ready to join the mission? Let's collaborate on building robust, scalable infrastructure.