Educational data pre-processing from a domain-specific approach.

Pre-procesamiento de datos educativos desde un enfoque de dominio específico.

Main Article Content

Emilcy Juliana Hernández-Leal
Juary Costa-Rocha
Néstor Darío Duque-Méndez
Abstract

The data analysis processes to discover knowledge require pre-processing before applying techniques or algorithms to increase the quality of the data and adapt them to the formats that are best suited for processing, especially when the data comes from different sources. This article presents the experience in designing and constructing a strategy with a specific domain approach for the educational data preparation process. The study methodology included three stages: (1) design and construction of the strategy, (2) recognition and data selection, and (3) application of the strategy and review of results. The study was made up of data from the primary and secondary education system in the Norte de Santander department (Colombia). In addition, there was data referring to the enrollment process, which includes socioeconomic and family variables and data from evaluations of students' academic performance from three public educational institutions. For the two sources, data were processed from 2014 to 2018, with more than eight hundred thousand records. This work adds value in three main aspects: the scope concerning the educational level where the case study data comes from, the inclusion of the domain-specific approach in the solution, and the centralization of the data from multiple sources, resulting in data available in subsequent analysis processes. In conclusion, this work contributed both in the research field and applying knowledge in an existing case. Furthermore, it opened the possibility of carrying out subsequent tests with other data types from the educational context.

Keywords

Downloads

Download data is not yet available.

Article Details

References

A. L’Heureux, K. Grolinger, H. F. Elyamany, and M. A. M. Capretz, “Machine Learning with Big Data: Challenges and Approaches,” IEEE Access, vol. 5, pp. 7776–7797, 2017, doi: 10.1109/ACCESS.2017.2696365.

K. Kasemsap, “Knowledge discovery and data visualization: theories and perspectives,” Int. J. Organ. Collect. Intell., vol. 7, no. 3, 2017, Accessed: Jul. 28, 2020. [Online]. Available: https://www.igi-global.com/article/knowledge-discovery-and-data-visualization/182757.

M. Couceiro and A. Napoli, “Elements about exploratory, knowledge-based, hybrid, and explainable knowledge discovery,” in ormal Concept Analysis. ICFCA 2019. Lecture Notes in Computer Science, vol. 11511, Cristea D., Le Ber F., and Sertkaya B., Eds. Springer Cham, 2019, pp. 3–16, DOI:10.1007/978-3-030-21462-3_1

B. Oliveira and O. Belo, “On the specification of extract, transform, and load patterns behavior: A domain-specific language approach,” Expert Syst., vol. 34, no. 1, p. e12168, Feb. 2017, doi: 10.1111/exsy.12168.

T. Costello and L. Blackshear, “What Is ETL?,” in Prepare Your Data for Tableau, Apress, 2020, pp. 1–3.

R. Wang et al., “Review on mining data from multiple data sources,” Pattern Recognit. Lett., vol. 109, pp. 120–128, Jul. 2018, doi: 10.1016/J.PATREC.2018.01.013.

Y. Roh, G. Heo, and S. E. Whang, “A Survey on Data Collection for Machine Learning: A Big Data-AI Integration Perspective,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 4, pp. 1328–1347, Apr. 2021, doi: 10.1109/TKDE.2019.2946162.

V. Debroy, L. Brimble, and M. Yost, “NewTL: Engineering an extract, transform, load (ETL) software system for business on a very large scale,” in Proceedings of the ACM Symposium on Applied Computing, Apr. 2018, pp. 1568–1575, doi: 10.1145/3167132.3167300.

F. Bellifemine, G. Fortino, R. Giannantonio, R. Gravina, A. Guerrieri, and M. Sgroi, “SPINE: a domain-specific framework for rapid prototyping of WBSN applications,” Softw. Pract. Exp., vol. 41, no. 3, pp. 237–265, 2011, doi: 10.1002/spe.998.

G. Desolda, C. Ardito, and M. Matera, “Empowering end users to customize their smart environments: Model, composition paradigms, and domain-specific tools,” ACM Trans. Comput. Interact., vol. 24, no. 2, pp. 1–52, Apr. 2017, doi: 10.1145/3057859.

A. Iung et al., “Systematic mapping study on domain-specific language development tools,” Empir. Softw. Eng., vol. 25, no. 5, pp. 4205–4249, Sep. 2020, doi: 10.1007/S10664-020-09872-1/TABLES/9.

M. Beg, R. A. Pepper, and H. Fangohr, “User interfaces for computational science: A domain specific language for OOMMF embedded in Python,” AIP Adv., vol. 7, no. 5, p. 056025, Feb. 2017, doi: 10.1063/1.4977225.

P. Selvaraj, V. K. Burugari, D. Sumathi, R. K. Nayak, and R. Tripathy, “Ontology based Recommendation System for Domain Specific Seekers,” in Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), 2019, pp. 341–345, doi: 10.1109/i-smac47947.2019.9032634.

J. Samuelsen, W. Chen, and B. Wasson, “Integrating multiple data sources for learning analytics—review of literature,” Res. Pract. Technol. Enhanc. Learn., vol. 14, no. 1, pp. 1–20, Dec. 2019, doi: 10.1186/S41039-019-0105-4/TABLES/7.

M. S. Mussa, S. C. Souza, E. F. S. Freire, R. G. Cordeiro, and H. R. M. Hora, “Business intelligence in education: an application of pentaho software,” Rev. Produção e Desenvolv., vol. 4, no. 3, pp. 29–41, 2018, doi: 10.32358/rpd.2018.v4.274.

G. G. W. Mhon and N. S. M. Kham, “ETL Preprocessing with Multiple Data Sources for Academic Data Analysis,” Feb. 2020, doi: 10.1109/ICCA49400.2020.9022824.

J. vom Brocke, A. Hevner, and A. Maedche, “Introduction to Design Science Research,” pp. 1–13, 2020, doi: 10.1007/978-3-030-46781-4_1.

Ministerio de Educación Nacional, “Sistema educativo colombiano ,” 2020. https://www.mineducacion.gov.co/portal/Preescolar-basica-y-media/Sistema-de-educacion-basica-y-media/233839:Sistema-educativo-colombiano (accessed May 01, 2020).

M. A. Fernández Sáenz, “Desarrollo de un modelo de calidad de datos aplicado a una solución de inteligencia de negocios en una institución educativa : Caso Lambda,” Pontificia Universidad Católica del Perú, 2018.

N. D. Duque-Méndez, E. J. Hernández-Leal, A. Pérez Zapata, A. Arroyave Tabares, and D. Espinosa Gómez, “Modelo para el proceso de extracción, transformación y carga en bodega de datos. Una aplicación con datos ambientales,” Cienc. e Ing. Neogranadina, vol. 26, no. 2, pp. 95–109, 2016.

G. Jayashree and C. Priya, “Comprehensive Guide to Implementation of Data Warehouse in Education,” in Intelligent Computing and Innovation on Data Science, vol. 118, Springer, Singapore, 2020, pp. 1–8.

Instituto Colombiano para la Evaluación de la Educación (Icfes), “Portal Icfes,” 2019. https://www.icfes.gov.co/web/guest/funciones-icfes (accessed Aug. 30, 2019).

Funding data

Most read articles by the same author(s)

OJS System - Metabiblioteca |