Site Reliability Engineer (Azure Cloud)
Location: Hybrid, Commitment: Full-time
PlektonLabs was established with the vision of improving the world one interaction at a time. We take a pragmatic approach in helping enterprises find the right integration strategy and mobilize organizations with trailblazing technologies like MuleSoft, Salesforce. Our personalized care and industry experience in providing cover-to-cover IT consultancy services spearheads our clients into the world of digital transformation with confidence and comfort.
Be part of the team that drives technological transformations in companies ranging from F50 to Middle markets. Our culture promotes leadership in the work we do and heavy collaboration. This role will work with MuleSoft expert and IT operation to manage very advanced Azure cloud platform for MuleSoft RTF deployment.
Responsibilities
Azure Kubernetes Service (AKS):
-
Design, implement, and manage AKS ensuring high availability and performance.
-
Deploy, scale, and manage containerized applications and services.
-
Monitor and troubleshoot cluster and application performance issues.
Service Availability:
-
Ensure all Azure-based applications and services meet the company’s service-level objectives (SLOs) and SLAs.
-
Implement disaster recovery solutions within the Azure environment.
Performance Monitoring:
-
Implement and maintain Azure monitoring, alerting, and logging systems using tools like Azure Monitor and Log Analytics.
-
Analyze system performance, pinpointing bottlenecks and potential failures.
Incident Management:
-
Rapidly respond to system outages, restoring services promptly.
-
Conduct postmortem analysis to determine root causes and devise strategies to prevent recurrence.
Azure Network and Infrastructure Services:
-
Design and manage Azure Virtual Networks, VPN Gateways, and other network-related services.
-
Ensure infrastructure services are scalable, performant, and cost-effective.
Security:
-
Implement and manage Azure security services like Azure Security Center.
-
Work closely with the security team to safeguard systems from external and internal threats.
-
Maintain compliance with industry security standards.
Automation and Infrastructure as Code:
-
Automate routine tasks with scripts and tools.
-
Use Azure Resource Manager (ARM) templates or other IaC tools to provision and manage Azure resources.
MuleSoft RTF Environment (Desirable):
-
Offer support for the MuleSoft RTF environment, ensuring high availability and seamless deployments.
-
Collaborate with MuleSoft development teams for optimal service configuration and deployment.
Collaboration and Communication:
-
Work in tandem with development and operations teams to design and support scalable, secure, and reliable systems.
-
Provide insights, feedback, and architectural recommendations based on SRE best practices.
Continuous Learning and Improvement:
-
Keep updated on Azure services, best practices, and trends.
-
Regularly review and optimize systems and processes for performance, cost, and security improvements.
Qualifications
Mandatory:
-
Proven experience with Azure Kubernetes Service (AKS).
-
Deep understanding of Azure Network, Infrastructure, and Security services.
-
Proficient in scripting languages like Python, PowerShell, or Bash.
Desirable:
-
Familiarity with MuleSoft RTF environment.
-
Azure Certifications (e.g., Azure Solutions Architect, Azure Security Engineer).
-
Experience with Infrastructure as Code (IaC) tools.
Other Information:
-
-
The candidate may occasionally be required to handle on-call duties for critical incidents.
-
How to Apply
Prepare your resume and/or your portfolio links and submit your profile by using the subject line: Join with PlektonLabs | Site Reliability Engineer (Azure Cloud) and
Email to: career@plektonlabs.com