System Engineer - IT-Online

Overview:
The System Engineer will work within a team dedicated to delivering, maintaining, and improving the companies’ network, infrastructure, and application monitoring solutions. With a focus on customer success, the role involves deploying, supporting, and enhancing the observability platform. This role requires proficiency in Linux and Windows operating systems, virtual environments, and a suite of monitoring and automation tools.

Roles and Responsibilities:
System Delivery and Deployment:

Deploy and configure on-premise and cloud Linux and Windows servers, and related services.
Set up and manage virtualized environments, including ProxMox and Hyper-V hypervisors.
Install and configure monitoring and observability tools such as Grafana, Prometheus, ELK Stack, SaltStack, and Telegraf.
Integrate databases like PostgreSQL, Mimir, and Elasticsearch to support the data infrastructure.

System Maintenance and Monitoring:

Monitor infrastructure performance and availability, using observability tools to ensure continuous operation and health of all systems.
Manage updates, patches, and lifecycle maintenance for Linux and Windows systems.
Troubleshoot system and application issues, providing quick, accurate resolutions to maintain uptime.

System Improvement and Optimization:

Continuously optimize systems and infrastructure for enhanced performance, reliability, and scalability.
Develop and maintain automation scripts and configurations (using SaltStack, Terraform and Ansible) to streamline system processes and reduce manual intervention.
Analyze logs, metrics, and data trends to identify potential system enhancements.

Customer and Product Support:

Serve as a technical point of contact for customers, providing Tier 2 and Tier 3 support as needed.
Work with cross-functional teams to troubleshoot, escalate, and resolve issues impacting customer experience.
Communicate proactively with customers about system improvements, updates, and troubleshooting processes.

Documentation and Knowledge Sharing:

Create and update comprehensive documentation for system configurations, participate in knowledge-sharing sessions to keep the team updated on best practices, new features, and changes.

Deliverables:
Operational Excellence:

Uptime metrics for core services meet or exceed 99.9%.
Timely completion of scheduled system maintenance with minimal disruptions.
Rapid and accurate resolution of issues, tracked via support and incident metrics.

Deployment and Configuration:

Successful deployment of new systems or enhancements within project deadlines.
Configuration management files/scripts stored securely and maintained in version control.

Documentation:

Up-to-date system documentation, including configurations, deployment steps, and troubleshooting guides.
Clear and comprehensive incident reports for major support cases or outages.

Customer Satisfaction:

Positive customer feedback on system reliability and responsiveness to support requests.
Completion of customer-facing updates, notifications, and support queries in a timely manner.

Qualifications:

Bachelor’s degree in Computer Science, Information Systems, or related field, or equivalent practical experience.
3+ years of experience in system engineering, infrastructure, or observability solutions.
Proficiency in Linux and Windows operating systems and virtualized environments.
Experience with monitoring and observability tools (Grafana, Prometheus, ELK Stack) and automation tools (SaltStack).
Strong knowledge of SQL and NoSQL databases (PostgreSQL, Elasticsearch).