30 Credits - Design of cloud based data analytic pipelines

Scania genomgår nu en transformation från att vara en leverantör av lastbilar, bussar och motorer till en leverantör av kompletta och hållbara transportlösningar.

Background

Scania is one of the world’s leading manufacturers of trucks and buses for heavy transports. Today, we have more than 400,000 connected vehicles generating huge amount of data in real time every day. In Scania connected services and collaboration, we always look for new solutions for analyzing this kind of data. Data pre-processing is one of the most critical components to ensure good data quality. Some examples of the data pre-processing are: data enrichment, data normalization and data cleaning. Scalable data processing solution therefore plays an important role in order for Scania to stay competitive in the market.

Assignment

In this master thesis, you will be working directly with our production ready data pre-processing pipeline. The main focus of the thesis project will be:

  • Investigate current data pre-process pipeline in the cloud.
  • Explore and Implement other alternatives of existing solution.
  • Compare different alternatives w.r.t. throughput, response time, and implementation cost.
  • Analyze data analysis methods for data cleaning. (Optional, if time permits)
  • Education

    MSc within Computer Science, Statistics or similar. Knowledge about the following subjects would be a plus: statistics, machine learning, data stream processing and benchmarking. Programming experience in Python and/or R is preferred.

    Number of Students: 1 – 2

    Start Date: January – February 2019

    Estimated time needed: 20 weeks

    Contact

    Person

    Cheng Xu, Connected Services and Collaboration, cheng.xu@scania.com

    , 08-55382885

    Skicka din ansökan till med rubrikraden Ny Teknik Jobb.

    Aktuellt inom