Your challenges as a data scientist
- You will be designing, developing, testing and documenting the data collection framework.
- The data collection consists of (complex) data pipelines from (IoT) sensors and low/high level control components to our Data Science platform.
- You will build a monitoring solution of data pipeline which enables data quality improvement.
- You will develop scalable data pipelines to transform and aggregate data for business use, following software engineering best practices.
- For these data pipelines you will make use of the best frameworks available for data processing like Spark and Splunk.
- You develop our data services for customer sites towards a product, using (test & deployment) automation, componentization, templates and standardization in order to reduce delivery time of our projects for customers.
- The product provides insights in the performance of our material handling systems at customers all around the globe.
- You design and build a CI/CD pipeline, including (integration) test automation for data pipelines.
- In this process you strive for an ever-increasing degree of automation.
- You will work with infrastructure engineer to extend storage capabilities and types of data collection (e.g. streaming)
- You have experience in developing APIs.
- You will coach and train the junior data engineer with the state of art big data technologies.
What do we expect from you?
- Bachelor’s or master’s degree in computer science, IT or equivalent with at least 7 years relevant work experience
- Programming in Python/Scala/Java
- CI/CD, Data/Code testing (e.g., Bamboo, Artifactory, Git)
- Data Schema’s (e.g. JSON/XML/Avro)
- Storage formats (e.g. Azure Blob, SQL, NoSQL)
- Scalable data processing frameworks (e.g. Spark)
- Event processing tools like Splunk or the ELK stack
- Deploying services as containers (e.g. Docker and Kubernetes)
- Streaming and/or batch storage (e.g. Kafka, Oracle)
- Working with cloud services (preferably with Azure)