
Accomplished Technical Operations Manager with strong expertise in production stability, SRE practices, automation, and large-scale CloudOps. Proven leader with experience managing 21+ team members across Production Support, Project Onboarding, and Production Integration teams, working closely with Engineering, QA, Product, Cloud, and third-party partners to resolve issues and improve operational maturity. Highly skilled in monitoring, RCA, incident lifecycle management, and proactive alerting, ensuring reliable and resilient systems. Experienced with ITSM tools (JIRA, ServiceNow, Freshdesk, GLPI) and known for improving workflows through custom dashboards, CI/CD optimization, and Jenkins automation to enhance SLA compliance. Strong background in Grafana, Zabbix, Elasticsearch, AWS CloudWatch, synthetic monitoring, and cloud-native workloads, with a track record of leading high-impact incidents, mentoring teams, and building repeatable processes that reduce recurring issues and improve KPIs.
• Centralized monitoring (zabbix)
• Desktop Application development
• Python script autoamtion
• jenkins jobs schedule
• Automated Reporting
• GLPL Ticketing Tool configuration
• EDC Machine intergration
• Manage production monitoring, alerts, and incident response to ensure high system availability and SLA compliance.
• Oversee cloud and infrastructure operations including servers, applications, networking, and security.
• Implement Python-based automation to reduce manual effort and improve operational efficiency.
• Perform root cause analysis for recurring incidents and drive long-term corrective actions.
• Coordinate with development, QA, and vendors while leading the TechOps team to ensure smooth deployments and stable operations.
• Reduced recurring production issues by implementing Python-based automation and monitoring enhancements.
• Improved system uptime and SLA compliance through proactive alert tuning and incident prevention.
• Built and enhanced operations dashboards providing real-time visibility to management.
• Improved operational SLA adherence to 99%, ensuring consistent on-time issue resolution over a three-month period.
• Implemented auto-escalation alerts, reducing communication delays and significantly improving response times.
• Reduced manual effort by 60% through API-based reporting, automated monitoring, and desktop automation tools.
• SRE
• Grafana Dashboard Create & Monitor
• Synthetic monitoring Canaries
• ServiceNow (SNOW) ticketing tools
• Downtime reduced 95%
• Lead end-to-end technical operations L3 team for the BOSCH L.OS platform, overseeing incident management, RCA, performance optimization, and overall system stability.
• Drive seamless project onboarding while managing complex technical issues through advanced monitoring, proactive alerting, and detailed incident reporting categorized by severity, root cause, and resolution time.
• Conduct in-depth Root Cause Analysis (RCA) and configure AWS Synthetic Monitoring Canaries using CloudWatch and log-based analysis to eliminate recurring issues and ensure early failure detection.
• Design and deploy advanced Grafana dashboards to enhance observability, real-time tracking, and microservice-level analytics while promoting operational excellence through automation and performance tuning.
• Collaborate closely with Dev, QA, Cloud, and business teams to optimize workflows, strengthen system resilience, and implement proactive monitoring mechanisms that reduced downtime and prevented customer-impacting incidents.
• Implemented proactive measures that reduced downtime by 95%.
• Enhanced Operations Dashboard visibility, improving team efficiency.
• Reduced system downtime by 95% through proactive performance tuning and automation.
• Team Leadership and Collaboration
• Incident and Problem Management
• Intergration onboarding for a new project
• Reporting and Documentation
• Python Script Automation
• Problem Solving and Critical Thinking
• Service-Level Agreements (SLA)
• Jira service desk, Freshdesk Tickets Tool congiuration
• 98% recurring issues fixed
• Managed end-to-end application server production issues, performing detailed root cause analysis to ensure stability and prevent recurring incidents.
• Serve as the primary point of contact for clients or partners during on-boarding process and address any challenges or issues that arise during the integration process promptly and effectively
• Perform regression testing in pre-prod and ensure that any changes do not negatively impact existing systems
• Understand the technical aspects of the integration process, including data flows, APIs, and system configurations and Collaborate with technical teams to troubleshoot and resolve technical problems.
• Handled end-to-end integration for Myntra, Flipkart, and Amazon within the IMS ecosystem, including inventory data synchronization via OMUI, order processing through OMS, and timely invoice generation and delivery to clients.
• Collaborated with cross-functional teams to resolve critical issues, support product enhancements, and maintain seamless business operations.
• Supervised and guided application support teams, reducing recurring issues through strategic automation and process improvements.
• Implemented a cross-functional collaboration framework, improving communication between technical and product teams and achieving a 95% reduction in incident resolution time.
• Strengthened operational efficiency by leveraging Jenkins CI/CD automation, Kubernetes log monitoring and building a knowledge-sharing platform that enhanced collaboration and reduced troubleshooting time.
• Developed Python automation scripts and proactive monitoring dashboards that reduced daily ticket volume from 60+ to 20+, while cutting recurring issues by 98% and enabling 100% faster onboarding across teams.
• Implemented automation solutions that increased operational efficiency by 100%, significantly reducing manual effort for the 8-member support team and accelerating issue resolution.
• Enhanced CI/CD practices and cross-team collaboration, resulting in improved operational performance and a substantial reduction in Mean Time To Resolution (MTTR).
• CI/CD Pipeline Automation
• Kubernates Setup
• AWS (S3 Browser, EC2 Instance)
• Elasticsearch, Kibana
• Restful API’s
• SFTP & FTP Configurations
• Linux Commands
• Postman Setup
• Resolved complex application and integration issues through detailed RCA using Postman APIs, MySQL, Elasticsearch, Kibana, and Linux logs ensuring optimal performance and high availability.
• Developed and integrated Python + REST API automation scripts with Jenkins cron scheduling to streamline operational workflows and reduce manual intervention.
• Managed AWS EC2 instances, including deployment, configuration, optimization, and scalability improvements to support seamless cloud operations and documented technical solutions on Confluence and provided L2/L3 support for escalated issues, ensuring smooth operations and rapid issue resolution.
• Resolved complex application and integration issues through detailed RCA using Postman APIs, MySQL, Elasticsearch, Kibana, and Linux logs ensuring optimal performance and high availability.
• Developed and integrated Python + REST API automation scripts with Jenkins cron scheduling to streamline operational workflows and reduce manual intervention.
• Managed AWS EC2 instances, including deployment, configuration, optimization, and scalability improvements to support seamless cloud operations and documented technical solutions on Confluence and provided L2/L3 support for escalated issues, ensuring smooth operations and rapid issue resolution.
• AWS (EC2, ECS, Lambda, S3, IAM, CloudWatch)
• Cloud Operations & Cost Optimization
• Server & Application Migrations
• Kubernetes (Pods, Scaling, Monitoring)
• Production Support (24×7)
• Site Reliability Engineering (SRE)
• Incident & Problem Management
• Root Cause Analysis (RCA)
• SLA / KPI Management
• Change & Release Management
• Zabbix
• Grafana
• AWS CloudWatch
• Synthetic Monitoring (Canaries)
• Proactive Alerting & Health Checks
• Performance Monitoring & Optimization
• Python Automation
• Jenkins (Pipelines, Jobs, Automation)
• CI/CD Optimization
• Cron Jobs & Scheduled Tasks
• Elasticsearch
• Logstash
• Kibana (ELK Stack)
• Log Analysis & Troubleshooting
• JIRA
• ServiceNow (SNOW)
• Freshdesk
• GLPI
• Phabricator (Code Review & Change Management)
• Team Leadership (21+ members)
• Incident Bridge Leadership
• Shift & Roster Management
• Process Improvement & Documentation
• Cross-team & Vendor Coordination
• Mentoring & Knowledge Sharing
Nandus Retail is a multi-store retail organization operating POS, billing, inventory, and backend systems across multiple locations.
Project: Cloud Operations & Production Support
Supported and scaled cloud-based retail applications and POS systems, ensuring high availability, monitoring, logging, and reliable production operations.
• Managed 24×7 production operations for retail POS and backend services.
• Implemented monitoring and alerting (AWS CloudWatch, Grafana, Zabbix).
Led the EDC machine integration across all Nandus outlets, improving transaction success rates and ensuring smooth Paytm/Bank gateway connectivity.
• Set up centralized logging (ELK) for faster troubleshooting.
• Automated operational tasks using Python and Jenkins.
• Improved incident management, RCA, MTTR, and SLA compliance.
AWS, Zabbix, ELK, Jenkins, Python, GLPL
Motherson Technology Services provides IT and digital services to BOSCH, supporting enterprise applications, integrations, and production systems.
Project: Production Operations & Integrations
Supported and stabilized enterprise production applications by improving monitoring, automation, and incident response processes to ensure high availability and reliability.
• Managed 24×7 production monitoring, alerts, and incident response.
• Enhanced Grafana and Zabbix dashboards for better system visibility.
• Automated recurring operational tasks using Python and scripts.
• Coordinated releases and integrations with Engineering, QA, and vendor teams.
• Performed RCA and implemented permanent fixes to reduce recurring incidents.
AWS, Grafana, Python, JIRA, ServiceNow
Shiprocket Omni is an omnichannel retail and order management platform that helps businesses manage inventory, orders, and fulfillment across online and offline channels.
Project: Omnichannel Operations & Cloud Production Support & Integration
Supported and scaled Shiprocket Omni’s production platforms by improving monitoring, automation, logging, and incident response to ensure high availability and smooth order processing across channels.
• Managed 24×7 production operations for omnichannel order and inventory systems.
• Implemented proactive monitoring and alerting using Jarvis logs, Grafana
• Set up centralized logging (ELK) for faster troubleshooting and root cause analysis.
• Automated operational workflows using Python and Jenkins to reduce manual effort and MTTR.
• Led incident bridges, RCA, and change management to improve SLA compliance.
• Coordinated with Engineering, QA, and third-party partners for releases and integrations.
AWS, Elasticsearch, Logstash, Kibana, Python, JIRA, Jira Servicedesk, Kubernetes