Metadata
Title
Data Creation and Collection for Artificial Intelligence via Crowdsourcing
Category
general
UUID
db035c19e4674c789f1cfdafd947ac5d
Source URL
https://learningforlife.tudelft.nl/data-creation-and-collection-for-artificial-i...
Parent URL
https://learningforlife.tudelft.nl/our-courses/ai-data-computer-science/
Crawl Time
2026-03-23T11:23:24+00:00
Rendered Raw Markdown
# Data Creation and Collection for Artificial Intelligence via Crowdsourcing

**Source**: https://learningforlife.tudelft.nl/data-creation-and-collection-for-artificial-intelligence-via-crowdsourcing/
**Parent**: https://learningforlife.tudelft.nl/our-courses/ai-data-computer-science/

Share webpage

- Examine the use of crowdsourcing for gathering data.
- Explain how cognitive biases and other human factors influence data quality.
- Describe the use of active learning in the creation of crowdsourced training data.
- Demonstrate the design of crowdsourcing tasks with quality control mechanisms.
- Discuss the evaluation of ML models with humans in the loop.

Free

- Type
  Course
- Location
  Online
- Pacing
  Starts anytime / Self-paced
- Length

  For instructor paced courses this is the length of the course.

  For self-paced courses this is the length of the course if you spend the amount of time per week as specified. You're free to go faster or slower as you see fit.

  6 Weeks
- Effort
  4 - 5 Hours per week

[Enroll on edX](https://www.edx.org/course/ai-skills-for-engineers-data-creation-and-collection)

Loading...

Subscribe to back in stock notification 

Subscribe

- [Ujwal Gadiraju](https://learningforlife.tudelft.nl/instructors/ujwal-gadiraju/)
- [Jie Yang](https://learningforlife.tudelft.nl/instructors/jie-yang/)

- Overview

  **A one-stop shop to get started on the key considerations about data for AI! Learn how crowdsourcing offers a viable means to leverage human intelligence at scale for data creation, enrichment and interpretation, demonstrating a great potential to improve both the performance of AI systems and their trustworthiness and increase the adoption of AI in general.**

  Advances in Artificial Intelligence and Machine Learning have led to technological revolutions. Yet, AI systems at the forefront of such innovations have been the center of growing concerns. These involve reports of system failure when conditions are only slightly different from the training phase and they also trigger ethical and societal considerations that arise as a result of their use.

  Machine learning models have been criticized for lacking robustness, fairness and transparency. Such model-related problems can generally be attributed to a large extent to issues with data. In order to learn comprehensive, fine-grained and unbiased patterns, models  have to be trained on a large number of high-quality data instances with distribution that accurately represents real application scenarios. Creating such data is not only a long, laborious and expensive process, but sometimes even impossible when the data is extremely imbalanced, or the distribution constantly evolves over time.

  This course will introduce an important method that can be used to gather data for training machine learning models and building AI systems. Crowdsourcing offers a viable means of leveraging human intelligence at scale for data creation, enrichment and interpretation with great potential to improve the performance of AI systems and increase the wider adoption of AI in general.

  By the end of this course you will be able to understand and apply crowdsourcing methods to elicit human input as a means of gathering high-quality data for machine learning. You will be able to identify biases in datasets as a result of how they are gathered or created and select from task design choices that can optimize data quality. These learnings will contribute to an important set of skills that are essential for career trajectories in the field of Data Science, Machine Learning, and the broader realms of Artificial Intelligence.
- Details

  ##### Course Syllabus

  **Week 1: Crowdsourcing for High-quality Data Collection and The ImageNet Story**

  Artificial Intelligence is at the center of many recent advancements across areas such as transportation and finance. One of the reasons for this is that in the past decade we have designed methods to harness human intelligence at scale.\
  We will introduce and discuss the crowdsourcing paradigm and the importance of high-quality data.

  Topics we will cover this week:

  - The intuition behind crowdsourcing
  - The role of crowdsourcing platforms
  - The need for high-quality data for AI models
  - What is ImageNet, the gap it filled, and how it was built

  **Week 2: Quality Control Mechanisms for Crowdsourcing**

  The quality of crowdsourced human input is one of the most crucial aspects affecting the overall value of the paradigm. In this week we will discuss the challenges that make quality control difficult to guarantee.

  Topics we will cover this week:

  - Workers' motives and behaviors
  - Quality control mechanisms in crowdsourcing
  - Incentives in crowdsourcing (like gamification)
  - Cognitive aspects and psychometric methods

  **Week 3: Factors Affecting Quality in Crowdsourcing**

  Researchers and practitioners in human computation and crowdsourcing have identified several factors that affect the quality of crowdsourced data. In this week we will discuss some of the recent works in this regard.

  Topics we will cover this week:

  - Tradeoff between task pricing and quality of output
  - The role of workers' demographics, qualifications and skills
  - The importance of task clarity and work environments
  - The concepts of task packaging, task framing and task priming

  **Week 4: Human Input for Data Creation and Model Evaluation in AI**

  In this week, we will cover the importance of data collection, annotation and engineering.

  Topics we will cover this week:

  - The importance of data collection
  - Data generation
  - The role of crowdsourcing in advanced machine learning
  - Taxonomy of microtasks

  **Week 5: Reducing Worker Effort: Active Learning**

  In this week we explore the challenges of collecting large scale data and how to overcome them.

  Topics we will cover this week:

  - Approaches to reducing worker effort
  - The implications of reducing labeling effort
  - The key idea of active learning
  - Query strategies for selecting informative instances

  **Week 6: Interpreting, Evaluating, and Debugging ML models**

  In this week, we discuss strategies for evaluating, debugging, and interpreting machine learning models.

  Topics we will cover this week:

  - The notion of model interpretability
  - The role of humans in the interpretability process
  - Debugging ML pipelines and related challenges
- Qualifications

  **Chartered Engineering Competences**\
  All our online courses and programs have been matched to the competences determined by [KIVI’s Competence Structure](https://charteredengineer.nl/competence-structure/), a common frame of reference for everyone, across all disciplines, levels and roles.

  These competences apply to this course:

  - A1: Extend your theoretical knowledge of new and advancing technologies.
- Admission

  This is a Massive Open Online Course (MOOC) that runs on edX.

  ##### Prerequisites

  Some prior experience with a programming language (e.g. Python, Java) is recommended but not required.

- Course format: MOOC

  **This course is a Massive Open Online Course (MOOC)**. Our MOOCs are delivered on edX.org and are open to all. They include video lectures, readings, assignments, and community discussions. Content is free, with optional certificates and additional exercises available for a fee.

## Related Products

[Press to skip carousel](#related-slider-end) 

Press to go to carousel navigation 

[AI Skills for Engineers: Supervised Machine Learning](https://learningforlife.tudelft.nl/ai-skills-for-engineers-supervised-machine-learning/) 

6 - 8 Hours per week

Free

[AI Skills for Engineers: Data Engineering and Data Pipelines](https://learningforlife.tudelft.nl/ai-skills-for-engineers-data-engineering-and-data-pipelines/) 

5 - 7 Hours per week

Free