The System Engineer will work within a team dedicated to delivering, maintaining, and improving the companies’ network, infrastructure, and application monitoring solutions. With a focus on customer success, the role involves deploying, supporting, and enhancing the observability platform. This role requires proficiency in Linux and Windows operating systems, virtual environments, and a suite of monitoring and automation tools.
Roles and Responsibilities:
System Delivery and Deployment:
– Deploy and configure on-premise and cloud Linux and Windows servers, and related services.
– Set up and manage virtualized environments, including ProxMox and Hyper-V hypervisors.
– Install and configure monitoring and observability tools such as Grafana, Prometheus, ELK Stack, SaltStack, and Telegraf.
– Integrate databases like PostgreSQL, Mimir, and Elasticsearch to support the data infrastructure.

System Maintenance and Monitoring:
– Monitor infrastructure performance and availability, using observability tools to ensure continuous operation and health of all systems.
– Manage updates, patches, and lifecycle maintenance for Linux and Windows systems.
– Troubleshoot system and application issues, providing quick, accurate resolutions to maintain uptime.

System Improvement and Optimization:
– Continuously optimize systems and infrastructure for enhanced performance, reliability, and scalability.
– Develop and maintain automation scripts and configurations (using SaltStack, Terraform and Ansible) to streamline system processes and reduce manual intervention.
– Analyze logs, metrics, and data trends to identify potential system enhancements.

Customer and Product Support:
– Serve as a technical point of contact for customers, providing Tier 2 and Tier 3 support as needed.
– Work with cross-functional teams to troubleshoot, escalate, and resolve issues impacting customer experience.
– Communicate proactively with customers about system improvements, updates, and troubleshooting processes.

Documentation and Knowledge Sharing:
– Create and update comprehensive documentation for system configurations, participate in knowledge-sharing sessions to keep the team updated on best practices, new features, and changes.

Deliverables:
Operational Excellence:
– Uptime metrics for core services meet or exceed 99.9%.
– Timely completion of scheduled system maintenance with minimal disruptions.
– Rapid and accurate resolution of issues, tracked via support and incident metrics.
Deployment and Configuration:
– Successful deployment of new systems or enhancements within project deadlines.
– Configuration management files/scripts stored securely and maintained in version control.
Documentation:
– Up-to-date system documentation, including configurations, deployment steps, and troubleshooting guides.
– Clear and comprehensive incident reports for major support cases or outages.
Customer Satisfaction:
– Positive customer feedback on system reliability and responsiveness to support requests.
– Completion of customer-facing updates, notifications, and support queries in a timely manner.

Qualifications:
– Bachelor’s degree in Computer Science, Information Systems, or related field, or equivalent practical experience.
– 3+ years of experience in system engineering, infrastructure, or observability solutions.
– Proficiency in Linux and Windows operating systems and virtualized environments.
– Experience with monitoring and observability tools (Grafana, Prometheus, ELK Stack) and automation tools (SaltStack).
– Strong knowledge of SQL and NoSQL databases (PostgreSQL, Elasticsearch).

Desired Skills:

  • system engineering
  • infrastructure
  • Linux and Windows operating systems
  • automation tools
  • SQL and NoSQL databases

Learn more/Apply for this position