Overview:
The System Engineer will work within a team dedicated to delivering, maintaining, and improving the companies’ network, infrastructure, and application monitoring solutions. With a focus on customer success, the role involves deploying, supporting, and enhancing the observability platform. This role requires proficiency in Linux and Windows operating systems, virtual environments, and a suite of monitoring and automation tools.

Roles and Responsibilities:
System Delivery and Deployment:

  • Deploy and configure on-premise and cloud Linux and Windows servers, and related services.
  • Set up and manage virtualized environments, including ProxMox and Hyper-V hypervisors.
  • Install and configure monitoring and observability tools such as Grafana, Prometheus, ELK Stack, SaltStack, and Telegraf.
  • Integrate databases like PostgreSQL, Mimir, and Elasticsearch to support the data infrastructure.

System Maintenance and Monitoring:

  • Monitor infrastructure performance and availability, using observability tools to ensure continuous operation and health of all systems.
  • Manage updates, patches, and lifecycle maintenance for Linux and Windows systems.
  • Troubleshoot system and application issues, providing quick, accurate resolutions to maintain uptime.

System Improvement and Optimization:

  • Continuously optimize systems and infrastructure for enhanced performance, reliability, and scalability.
  • Develop and maintain automation scripts and configurations (using SaltStack, Terraform and Ansible) to streamline system processes and reduce manual intervention.
  • Analyze logs, metrics, and data trends to identify potential system enhancements.

Customer and Product Support:

  • Serve as a technical point of contact for customers, providing Tier 2 and Tier 3 support as needed.
  • Work with cross-functional teams to troubleshoot, escalate, and resolve issues impacting customer experience.
  • Communicate proactively with customers about system improvements, updates, and troubleshooting processes.

Documentation and Knowledge Sharing:

  • Create and update comprehensive documentation for system configurations, participate in knowledge-sharing sessions to keep the team updated on best practices, new features, and changes.

Deliverables:
Operational Excellence:

  • Uptime metrics for core services meet or exceed 99.9%.
  • Timely completion of scheduled system maintenance with minimal disruptions.
  • Rapid and accurate resolution of issues, tracked via support and incident metrics.

Deployment and Configuration:

  • Successful deployment of new systems or enhancements within project deadlines.
  • Configuration management files/scripts stored securely and maintained in version control.

Documentation:

  • Up-to-date system documentation, including configurations, deployment steps, and troubleshooting guides.
  • Clear and comprehensive incident reports for major support cases or outages.

Customer Satisfaction:

  • Positive customer feedback on system reliability and responsiveness to support requests.
  • Completion of customer-facing updates, notifications, and support queries in a timely manner.

Qualifications:

  • Bachelor’s degree in Computer Science, Information Systems, or related field, or equivalent practical experience.
  • 3+ years of experience in system engineering, infrastructure, or observability solutions.
  • Proficiency in Linux and Windows operating systems and virtualized environments.
  • Experience with monitoring and observability tools (Grafana, Prometheus, ELK Stack) and automation tools (SaltStack).
  • Strong knowledge of SQL and NoSQL databases (PostgreSQL, Elasticsearch).

Overview:
The System Engineer will work within a team dedicated to delivering, maintaining, and improving the companies’ network, infrastructure, and application monitoring solutions. With a focus on customer success, the role involves deploying, supporting, and enhancing the observability platform. This role requires proficiency in Linux and Windows operating systems, virtual environments, and a suite of monitoring and automation tools.

Roles and Responsibilities:
System Delivery and Deployment:

  • Deploy and configure on-premise and cloud Linux and Windows servers, and related services.
  • Set up and manage virtualized environments, including ProxMox and Hyper-V hypervisors.
  • Install and configure monitoring and observability tools such as Grafana, Prometheus, ELK Stack, SaltStack, and Telegraf.
  • Integrate databases like PostgreSQL, Mimir, and Elasticsearch to support the data infrastructure.

System Maintenance and Monitoring:

  • Monitor infrastructure performance and availability, using observability tools to ensure continuous operation and health of all systems.
  • Manage updates, patches, and lifecycle maintenance for Linux and Windows systems.
  • Troubleshoot system and application issues, providing quick, accurate resolutions to maintain uptime.

System Improvement and Optimization:

  • Continuously optimize systems and infrastructure for enhanced performance, reliability, and scalability.
  • Develop and maintain automation scripts and configurations (using SaltStack, Terraform and Ansible) to streamline system processes and reduce manual intervention.
  • Analyze logs, metrics, and data trends to identify potential system enhancements.

Customer and Product Support:

  • Serve as a technical point of contact for customers, providing Tier 2 and Tier 3 support as needed.
  • Work with cross-functional teams to troubleshoot, escalate, and resolve issues impacting customer experience.
  • Communicate proactively with customers about system improvements, updates, and troubleshooting processes.

Documentation and Knowledge Sharing:

  • Create and update comprehensive documentation for system configurations, participate in knowledge-sharing sessions to keep the team updated on best practices, new features, and changes.

Deliverables:
Operational Excellence:

  • Uptime metrics for core services meet or exceed 99.9%.
  • Timely completion of scheduled system maintenance with minimal disruptions.
  • Rapid and accurate resolution of issues, tracked via support and incident metrics.

Deployment and Configuration:

  • Successful deployment of new systems or enhancements within project deadlines.
  • Configuration management files/scripts stored securely and maintained in version control.

Documentation:

  • Up-to-date system documentation, including configurations, deployment steps, and troubleshooting guides.
  • Clear and comprehensive incident reports for major support cases or outages.

Customer Satisfaction:

  • Positive customer feedback on system reliability and responsiveness to support requests.
  • Completion of customer-facing updates, notifications, and support queries in a timely manner.

Qualifications:

  • Bachelor’s degree in Computer Science, Information Systems, or related field, or equivalent practical experience.
  • 3+ years of experience in system engineering, infrastructure, or observability solutions.
  • Proficiency in Linux and Windows operating systems and virtualized environments.
  • Experience with monitoring and observability tools (Grafana, Prometheus, ELK Stack) and automation tools (SaltStack).
  • Strong knowledge of SQL and NoSQL databases (PostgreSQL, Elasticsearch).

Desired Skills:

  • infrastructure
  • observability solutions.
  • Linux
  • Windows operating systems
  • NoSQL databases
  • SQL

Learn more/Apply for this position