Data Collection and Organization

8 ECTS / 1st Semester / Portuguese

Intended learning outcomes (knowledge, skills and competences to be developed by the students):

Development of autonomous capacity to select and apply different methods of data collection, data organization, visualization and exploratory analysis.

It is intended, with this Curricular Unit (CU), and together with other mandatory CUs, to ensure the independence of students in areas of scientific research. Specifically with this CU, students should be able to: i) understand the importance of correct data management in scientific research, ii) select the appropriate data collection methods, designing and implementing these methods in order to obtain complete and high quality data, iii) classify, transform and clean data in order to promote exploratory analysis and reproducibility, iv) manage data files in a professional and secure manner, and v) conduct exploratory data analysis using multivariate analysis and visualization techniques, to promote the formulation of hypotheses.

Syllabus:

Module 1: Introduction to data

  • History of data and the new era of big data;
  • Types of data (structured, semi-structured, non-structured; discrete, continuous; quantitative, qualitative);
  • Research Data Lifecycle framework;
  • Databases, data models and information management systems;
  • Tools for data collection, data management and data analysis.

Module 2: Data collection methods

  • Quantitative methods;
  • Qualitative methods;
  • Population and sampling;
  • Sources of error and bias.

Module 3: Data organization and management

  • Classification and categorization;
  • Data transformations and data cleaning;
  • File naming, versioning, file formats and documentation;
  • Storage, security, privacy and sharing.

Module 4: Exploratory analysis and data visualization

  • Brief primer on multivariate analysis (random variables and distribution functions; moments and covariance matrix);
  • Data visualization;
  • Brief introduction to PCA and cluster analysis.