$120K - 160K a year
Design, develop, and optimize large-scale data and application systems using Java, Python, and PySpark while ensuring performance, reliability, and scalable ETL pipelines.
10+ years experience with Java, Python, PySpark, Hadoop ecosystem, Linux, CI/CD, cloud platforms, and strong system engineering skills.
We are seeking a highly skilled System Engineer with strong experience in Java, Python, and PySpark to design, develop, and optimize large-scale data and application systems. The ideal candidate will have a solid background in system architecture, software development, and data engineering, along with hands-on experience integrating distributed systems and ensuring performance and reliability. Key Responsibilities: • Design, develop, and maintain system components and data pipelines using Java, Python, and PySpark. • Collaborate with cross-functional teams to implement scalable and resilient solutions in cloud or on-premise environments. • Develop and maintain ETL processes for data ingestion, transformation, and loading across multiple data sources. • Optimize and troubleshoot distributed applications for performance and reliability. • Implement system monitoring, logging, and alerting to ensure high availability and system integrity. • Automate deployment and configuration management using tools such as Ansible, Jenkins, or Airflow. • Participate in code reviews, contribute to technical documentation, and follow DevOps and CI/CD best practices. • Work with Big Data ecosystems (Hadoop, Spark, Hive, Kafka, etc.) to handle large-scale data processing. • Analyze and resolve complex technical issues across software, infrastructure, and data layers. Required Skills and Qualifications: • Bachelor s or Master s degree in Computer Science, Information Technology, or a related field. • 10+ years of experience in system engineering, software development, or data engineering. • Strong programming experience in Java and Python. • Expertise in PySpark for distributed data processing and transformation. • Hands-on experience with Hadoop ecosystem components such as Spark, Hive, HDFS, and Kafka. • Solid understanding of Linux/Unix systems, shell scripting, and system-level debugging. • Experience with version control systems (Git, Bitbucket) and CI/CD pipelines (Jenkins, GitLab CI). • Familiarity with cloud platforms (AWS, Azure, or Google Cloud Platform) and data orchestration tools (Airflow, Oozie). • Strong analytical and problem-solving skills, with a focus on scalability and performance tuning.
This job posting was last updated on 10/11/2025