Data Engineer

وصف الوظيفة

Our client is a pioneering technology company building a sovereign AI platform designed to scale DevOps and Agentic AI Infrastructure. This platform represents the next generation of enterprise AI solutions, combining cutting-edge machine learning capabilities with robust, secure infrastructure. The company is seeking exceptional technical talent to lead and develop various domains of this transformative platform.

We are seeking a skilled Data Engineer to design, develop, and maintain robust data pipelines that power our knowledge base and agentic AI components. The role involves building efficient ETL workflows, optimizing data storage, and ensuring high data quality to support AI model training and inference. You will collaborate closely with data scientists to meet evolving data requirements while ensuring compliance with data security and privacy standards. The ideal candidate will be responsible for monitoring and enhancing pipeline performance, contributing to the reliability and scalability of our AI data infrastructure.

إمتيازات الوظيفة

  • Be at the forefront of building sovereign AI platforms that drive digital independence and transformation.
  • Work with some of the most forward-thinking clients, engineering minds, and thought leaders in AI infrastructure.
  • Grow your impact in a purpose-driven, innovation-led culture that values agility, inclusion, and continuous learning.
  • Professional development opportunities through continuous learning, mentorship in a conducive cross-cultural work environment
  • Work on cutting-edge technology with real-world impact

متطلبات الوظيفة

  • Design, build, and maintain scalable data pipelines to support the knowledge base and agentic AI components.
  • Develop robust ETL processes for ingesting structured and unstructured data from diverse internal and external sources.
  • Implement data transformation workflows optimized for AI model training, inference, and real-time processing.
  • Design and optimize data storage architectures to ensure efficient data retrieval, scalability, and performance.
  • ·Establish and maintain data quality checks, validation rules, and monitoring mechanisms to ensure data integrity.
  • Collaborate with data scientists and ML engineers to align data infrastructure with model development requirements.
  • Ensure data security, privacy, and regulatory compliance throughout the data lifecycle, including governance best practices.
  • Monitor, troubleshoot, and continuously improve the reliability and performance of data pipelines and related systems.
  • Minimum 5 years of experience in data engineering, data infrastructure, or a closely related field.
  • Strong proficiency in Python and SQL for large-scale data processing, transformation, and analysis.
  • Hands-on experience with ETL orchestration tools such as Apache Airflow or Azure Data Factory.
  • Solid understanding of data modeling, schema design, and relational and non-relational database systems.
  • Familiarity with big data technologies such as Apache Spark, Hadoop, or similar distributed computing frameworks.
  • Experience working with cloud-based data platforms (e.g., Azure Data Lake, AWS S3, Google BigQuery) and storage services.
  • Knowledge of data privacy, governance, and security best practices in enterprise environments.
  • Bachelor’s degree in Computer Science, Data Engineering, Information Systems, or a related technical discipline.

Technical Skills:

  • Programming Languages: Python, SQL, Scala
  • ETL & Orchestration Tools: Azure Data Factory, Apache Airflow, dbt
  • Data Processing Frameworks: Apache Spark, PySpark, Pandas
  • Databases: PostgreSQL, Azure Synapse Analytics, Azure Cosmos DB
  • Data Lakes & Storage: Azure Data Lake Storage (ADLS), Delta Lake
  • Streaming Platforms: Apache Kafka, Azure Event Hubs
  • Version Control & Data Versioning: Git, DVC (Data Version Control)