
We are currently looking to hire a [SRE Engineer] and we believe your skills and expertise are a better match for this role. We have an exciting career opportunity for you with one of our esteemed clients at [Tampa FL].
“NJTECH is a globally managed IT services, IT consulting and business solutions partner. Our "High Performance Business" strategy builds our expertise in technology and consulting. We play a major role in helping our clients to achieve their objectives at the highest level; ultimately creating sustainable value to customers.”
Role : SRE Engineer
Location : Onsite – Tampa FL
Duration: Long term
Key Responsibilities and Skills:
- Ensure high availability, scalability, and reliability of production systems through proactive monitoring and automation.
- Define, measure, and improve Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs).
- Conduct capacity planning, performance tuning, and root cause analysis for critical incidents.
- Participate in on-call rotations to maintain uptime and respond to production alerts.
🔹 Automation & Infrastructure as Code (IaC)
- Build and manage automated infrastructure using tools such as Terraform, Ansible, or CloudFormation.
- Design and implement CI/CD pipelines using Jenkins, GitLab CI, or GitHub Actions for deployment automation.
- Automate repetitive operational tasks to improve efficiency and reduce manual intervention.
🔹 Monitoring, Logging & Incident Management
- Set up and manage observability tools like Prometheus, Grafana, ELK/EFK Stack, Splunk, DataDog, or New Relic.
- Build real-time monitoring dashboards for system health, latency, and error rates.
- Implement alerting strategies to detect anomalies early and minimize downtime.
- Manage incident response, post-mortem reviews, and drive continuous improvement initiatives.
🔹 Cloud & Infrastructure Management
- Deploy, configure, and manage systems on AWS, Azure, or GCP cloud environments.
- Manage Kubernetes clusters (EKS/AKS/GKE) and Docker containers for microservices-based architectures.
- Implement auto-scaling, load balancing, and disaster recovery strategies.
- Work closely with development teams to ensure applications are designed for reliability and scalability.
🔹 Security & Compliance
- Implement security best practices, including IAM policies, secrets management, and TLS/SSL configurations.
- Conduct vulnerability scans and coordinate remediation with security teams.
- Ensure compliance with organizational standards and regulatory frameworks.
🔹 Collaboration & Continuous Improvement
- Collaborate with DevOps, development, and QA teams to design resilient architectures.
- Champion a “blameless post-mortem” culture and reliability-first mindset.
- Participate in architecture reviews and promote SRE best practices across teams.