via Lensa
$120K - 200K a year
Maintain and optimize the reliability, performance, and cost-efficiency of data platforms including Databricks and AWS, while supporting BI tools and vendor coordination.
3+ years of experience with data platform operations, hands-on with Databricks and AWS, familiarity with ingestion tools and BI platforms, strong troubleshooting skills, and knowledge of governance and security principles.
Job Overview: The Data Platform Reliability Engineer is responsible for ensuring the stability, performance, and operational reliability of UNFI's cloud-based and legacy data platforms. This role focuses on monitoring, troubleshooting, and automating operational workflows for Databricks, AWS services, and enterprise ingestion tools such as Fivetran/HVR, AWS DMS, DataStage, and Informatica, as well as supporting BI tools (Power BI, Tableau, Alteryx) and governance solutions. The engineer will work closely with external consulting partners and internal teams to maintain uptime, enforce governance standards, and optimize platform performance and cost efficiency. Job Responsibilities: Platform Reliability & Monitoring • Monitor health and performance of Databricks clusters, jobs, and workflows. • Maintain observability dashboards, alerts, and logs for AWS services and ingestion pipelines. • Respond to incidents, perform root cause analysis, and implement corrective actions. Cost and Performance Management Monitor and optimize platform costs across cloud and data services. Implement cost-control measures and provide regular reporting. Implement and maintain cost controls: cluster policies, auto-termination, right-sizing, job scheduling, storage lifecycle policies. Monitors spend and utilization for Databricks, AWS, ingestion, and BI services. Promote performance best practices. Monitoring and Observability • Build and maintain dashboards, alerts, and logs for Databricks, AWS services, ingestion pipelines, and BI refreshes. • Continuously tune alert thresholds to reduce noise and improve signal-to-action ratio. • Ensure end-to-end lineage/traceability for faster fault isolation across stages. External Support Team & Vendor Management Coordinate with external support teams for day-to-day operations and issue resolution. Coordinate with vendors for troubleshooting, service improvements, and escalations. Track and report on SLA adherence and vendor performance. Maintain operational runbooks, knowledge base, and handoff procedures between internal teams and external partners. Continuous Improvement Drive automation and efficiency in operational workflows. Optimize resource utilization and reduce manual intervention. BI Platform Operations • Support Power BI, Tableau, and Alteryx operations (gateway health, dataset refresh schedules, workspace/app permissions, data-source connectivity). • Monitor and improve dataset refresh reliability, query performance, and user access hygiene. Performs other duties as assigned. Job Requirements: Education/Certifications: • Bachelor's degree in computer science, data analytics, systems analysis, or a related field Experience: • 3+ years in data platform operations or reliability engineering. Hands-on experience with Databricks and AWS services in production environments. • Demonstrated success in maintaining high-impact data platforms, with a strong track record of managing complex environments. • Familiarity with ingestion tools (Fivetran, AWS DMS, DataStage, Informatica) and BI platforms (Power BI, Tableau, Alteryx). • Experience with SAP, master data management, and cross-functional processes across supply chain, finance, and operations Knowledge/Skills/Abilities • Strong troubleshooting and incident management skills. • Knowledge of governance, security, and RBAC principles. • Ability to work independently and collaborate with external partners. • Familiarity with Agile practices and DevOps principles. Understanding of governance, security, and privacy. • Good judgment is required for this position as there may be times when direct supervision may not be immediately available. Work Environment: Remote Role: • This position is classified as remote where the associate will perform remote work from their primary residence. Remote associates are welcome to work from the office but are not required to do so. While remote associates are not required to work from an office on a regular basis, they may be required to come to the office or other UNFI locations for necessary business reasons or if directed to do so by their manager. Physical Environment/Demands: Office Roles: • Most work is performed in a temperature-controlled office environment. • Incumbent may sit for long periods of time at a desk or computer terminal. • While performing the duties of this job, the employee is regularly required to sit; use hands to finger, handle, or feel; reach with hands and arms; and talk or hear. • Incumbent may use calculators, keyboards, telephones, and other office equipment in the course of a normal workday. • Stooping, bending, twisting, and reaching may be required in the completion of job duties. The above statements are intended to describe the general nature of the work performed by the employees assigned to this job. All employees must comply with Company policy and applicable laws. The responsibilities, duties and skills required of personnel so classified may vary within each department and/or location.
This job posting was last updated on 2/16/2026