Framework for systematization of data science methods
DOI:
https://doi.org/10.15276/aait.01.2021.7Keywords:
Data science, framework, data preprocessing, data modeling, data visualization, case studyAbstract
The rapid development of data science has led to the accumulation of many models, methods, and techniques that had been
successfully applied. As the analysis of publications has shown, the systematization of data science methods and techniques is an
urgent task. However, in most cases, the results are relevant to applications in a particular problem domain. The paper develops the
framework for the systematization of data science methods, neither domain-oriented nor task-oriented. The metamodel-method-
technique hierarchy organizes the relationships between existing methods and techniques and reduces the complexity of their under-
standing. The first level of the hierarchy consists of metamodels of data preprocessing, data modeling, and data visualization. The
second level comprises methods corresponded to metamodels. The third level collects the main techniques grouped according to
methods. The authors describe the guiding principles of the framework use. It provides a possibility to define the typical process of
problem-solving with data science methods. A case study is used to verify the framework’s appropriateness. Four cases of applying
data science methods to solve practical problems described in publications are examined. It is shown that the described solutions are
entirely agreed with the proposed framework. The recommended directions for applying the framework are defined. The constraint of
the framework applying is structured or semi-structured data that should be analyzed. Finally, the ways of further research are given.