INTRODUCTION
In Computer Science (CS), programming courses have a higher attrition rate compared with other courses (Figueiredo & García-Peñalvo, 2018; Munson & Zitovsky, 2018; Zingaro et al., 2018). The use of collaborative approaches in programming courses has shown satisfactory results by providing skills, aptitudes, and good practices, among other benefits (Suarez et al., 2021). The Computer Supported Collaborative Learning (CSCL) approach integrates artificial intelligence (AI) to improve the learning process through collaboration and information and communication technology (ICT) (Kozma, 2000; Loizzo & Ertmer, 2016; Docq & Daele, 2001; Solarte-Pabón & Machuca-Villegas, 2019; Thomas et al., 2002). According to Magnisalis et al. (2011), the CSCL approach is based on the following processes: a) Group Formation (GF support): forming student groups to obtain the best results from collaboration; b) Learning Domain (LD support): evaluating student learning activities with the help computers; c) Path of Interaction (PI support): the student receives comments on learning outcomes. The literature shows some projects that integrate CSCL processes with artificial intelligence (AI) techniques, such as Machine Learning to automatically identify groups of students according to their profiles or natural language processing for the automatic assessment of code (Casamayor et al., 2009; Costaguta & de los Angeles Menini, 2014; Black & Wiliam, 1998; Varier et al., 2017).
However, in the literature, there is no state-of-the-art technique that shows how the integration of CSCL and AI impacts and improves the learning process in programming courses. This paper presents a systematic mapping study based on tech mining (Mohammadi, 2012), which was conducted by reviewing data from scientific documents such as journals and conference proceedings from Scopus, WoS, and ScienceDirect, in addition to data extraction and analysis from repositories in the GitHub platform. The final corpus contains 316 records: 90 papers and 226 repositories. This, with the aim to understand the evolution of programming languages, technologies, tools, and strategies based on CSCL and AI to support teaching in programming courses.
This paper has three sections. The first section presents the methodology in three parts related to the elaboration of the mapping study: research questions and expected results, data protocol, and the construction of technological map visualizations. The second section shows the expected results, classified into four aspects: the commonly used programming languages in the development of technologies and learning tools based on CSCL and AI; the timeline of the evolution of projects, tools, and technologies; the strategies for CSCL processes based on AI; and the classification of the reference corpus according to the ACM Computing Classification System (CCS) (ACM, n.d.). Finally, based on the results, the third section discusses and concludes what is needed to improve the learning process in programming courses.
METHODOLOGY
This section details the process carried out to elaborate the mapping study of the learning programming process as supported by CSCL and AI. It includes the sources of information, the search functions, and the selection criteria used to construct the final reference corpus and the procedures for analysis and technological mapping.
This mapping study is based on tech mining, which aims to generate practical intelligence using analytical and visualization applications for data analysis (Choi et al., 2012; Mohammadi, 2012; Barab et al., 1997; Antonenko et al., 2012). Thus, future technologies and innovations can be anticipated, ensuring long-term competitiveness and supporting decision-making processes (Jonassen, 2012; Capelo & Dias, 2009; Fadde, 2009).
This systematic review aims to find the technological wave and cutting-edge research of CSCL and AI in the programming learning process by solving four questions of interest, as shown in Table 1.
Table 1 Research questions
ID | Research questions |
---|---|
1 | What programming languages are the most used for software development based on CSCL and AI? |
2 | What is the timeline of projects, tools, and technologies based on CSCL and AI? |
3 | How are strategies classified according to CSCL processes and software types? |
4 | How does the integration of CSCL and AI improve the learning process in programming courses? |
Source: Authors
Protocol data
Selection of data sources. Two sources of information were used to cover research and technological information, where the following was considered: i) documents from journals and conferences published on the Internet, books, and websites from reliable sources; and ii) data from the GitHub software repositories, which is the most popular open-source code platform among developers. GitHub contains information on more than 3,5 million software developers and more than 23 million repositories since 2008 (GitHub, 2018).
Search and data filters. Table 2 shows ten search functions specific to the Scopus, ScienceDirect, Web of Science (WoS), and GitHub data sources. The search functions included keywords taken from experts in this field, as well as from researchers and programming professors. In addition, the Table shows how many documents were obtained for each data source and the total records per query.
Table 2 Search functions, data source, and total records per query
ID | Queries | Source | Total records |
---|---|---|---|
1 | (”cscl AND (”programming course OR introduction to programming courses”)) | Scopus | 23 |
2 | (“cscl” AND “programming course”) AND (LIMIT-TO (SUBJAREA, ”COMP”) OR LIMIT-TO (SUBJAREA, ENGI”)) | Scopus | 16 |
3 | (”learning to programming”)) AND (cscl) AND (programming course) AND (LIMIT-TO (SUBJAREA,” COMP”)) | Scopus | 20 |
4 | (”cscl” AND ”programming course”)) AND (groups) AND (LIMIT-TO (SUBJAREA, ”COMP”) OR LIMIT-TO (SUBJAREA, SSOCI”)) | ScienceDirect | 13 |
5 | ((”cscl” AND ”programming course”) AND assessment)) | ScienceDirect | 13 |
6 | TS=(learning programming AND cscl) | Web of Science | 11 |
7 | TI=(programming course) AND TS = computer supported collaborative learning | Web of Science | 26 |
8 | (platforms to learn to program+cscl) | GitHub | 93 |
9 | (programming course+artificial intelligence+cscl) | GitHub | 145 |
10 | (platforms to programming course+cscl) | GitHub | 39 |
Source: Authors
The above queries resulted in 122 papers and 277 repositories, out of which 83 pieces of data were excluded based on four criteria such as research or project, incomplete information, missing name of publication, conference or journal, low number of citations, stars, copies, and forks. A PRISMA diagram showing the search and filters is shown in Figure 1. After filtering the information, the final corpus contained 316 records: 90 papers and 226 repositories. Table 3 shows the records filtered by each data source.
Table 3 Final corpus of references to carry out the mapping study
Equation ID | Source | Excluded | Final score | Total records |
---|---|---|---|---|
1 | Scopus | 4 | 19 | 48 |
2 | Scopus | 3 | 13 | |
3 | Scopus | 4 | 16 | |
4 | ScienceDirect | 6 | 7 | 17 |
5 | ScienceDirect | 1 | 10 | |
6 | Web of Science | 10 | 16 | 25 |
7 | Web of Science | 2 | 9 | |
8 | GitHub | 14 | 79 | 224 |
9 | GitHub | 20 | 125 | |
10 | GitHub | 19 | 20 |
Source: Authors
As seen in Figure 1, the implementation of PRIMA is divided into four sections, as follows:
Identification: this allows to identify how many records have been obtained from the queries of the different data sources.
Screening: all the information is collected, forming a data corpus.
Eligibility: according to the exclusion criteria mentioned above, the data are filtered to obtain a final corpus. In this section, the records that are not qualitative are also identified.
The item included: a dataset with structured information is obtained which can be processed for analysis.

Source: Authors
Figure 1 Flow diagram for a review of CSCL and AI in programming learning through repositories and scientific documents
Technological maps and visualizations. The systematic review of the mapping was supported by technical maps generated semi-automatically through tools and computational techniques. Technological maps were constructed based on information from the reference corpus. The systematic mapping study is represented by four technological maps that are displayed using the following visualizations: treemap (Google Inc., n.d.), Radial Tidy Tree (vega, n.d.), and fishbone (D3, n.d.).
Figure 2 shows the workflow of this study, which involved three processes. The first one, the corpus of references, contains information from repositories and scientific documents. On the one hand, the features of the repositories were extracted: citations, year, author, copies, forks, stars, and updates. On the other hand, the features of the scientific documents were extracted: year, keywords, abstract, DOI, authors, and organizations. In the second process, the extracted features were processed by a NER (name entity recognition), a technique of natural language processing (NLP) that extracts specific entities from the text (e.g., cities, names, and others). NER-Spacy (Vasiliev, 2020) was used to extract the following features from the text fields of the corpus of references: methods, type of technology, type of software, strategies, and programming languages. The index construction task aimed to divide and organize the reference corpus information by years, types of software, and software categories. The third process involved constructing technological maps using three types of visualizations: treemap, radial tidy, and fishbone. The treemap is used to display large amounts of hierarchically structured (tree structured) data. The space in the visualization is divided into rectangles that are sized and sorted by a quantitative variable. The levels in the hierarchy are displayed as rectangles that contain other rectangles. Each set of rectangles on the same level in the hierarchy represents a column or an expression in a data table. Each individual rectangle at a level in the hierarchy represents a category in a column. The radial tidy tree is a node link tree diagram of classes in a package hierarchy, positioned in polar coordinates using Vega’s tree transform. Adjusting the parameters shows layouts suitable for general trees or cluster dendrograms. Fishbone is a representation tool for categorizing the potential causes of a problem or evolution of a problem in order to identify its root causes. Typically used to identify progress analysis, a fishbone diagram combines the practice of organization charts with a mind map template.
Construction of technological maps
The technological map of programming languages represents the most used programming languages and the development trends in technologies for learning programming. This technological map was constructed by means of the cooccurrence process, in which the repositories that belong to the same programming language are grouped. In the process, the TAGs or labels of the NER are included in a vector. The set of vectors constitutes a matrix of programming language cooccurrences. Thus, if the label contains information about the programming language, a value of 1 is assigned to the vector; otherwise, it is 0. The cooccurrence matrix is transformed into a square matrix, and it becomes the input of the treemap. Each rectangle that represents a programming language consists of rectangles that represent repositories. The largest rectangle at the top left corner represents the most important programming language, and the smallest rectangle at the bottom right corner represents the least important one. The orange tone represents a repository in the upper hierarchy, while a green tone represents a repository in the lower hierarchy, based on the most cited, copied, and cloned studies, as well as on forks.
The evolution timeline depicts the technological evolution of the subject under study. It presents the dynamics of the most relevant projects, of the type of technologies, and of the programming languages used in each period. To summarize, the map shows the evolution of the technologies for learning programming in one decade. The technological map is elaborated by performing the following tasks:
The reference corpus is organized by the number of stars, cites, and forks in the repositories, as well as by the number of citations in the papers. To this effect, the fishbone presents the most relevant projects over time.
The reference corpus is grouped using a data ontology that contains categories and subcategories of the Computing Classification System (CCS) (ACM, n.d.). The fishbone shows the type of technologies extracted from the CCS.
The cooccurrence of the programming languages is determined. For each year, the languages are grouped into three categories: more relevant, relevant, and less relevant. The relevance is associated with the number of repositories that use a programming language for a given year.
Finally, the extracted information is manually organized into the fishbone.
As for the technological map according to the Computing Classification System (CCS), the reference corpus was grouped using a data ontology of the Association for Computing Machinery (ACM, n.d.). The cooccurrence was determined in order to identify which repository or paper belonged to a category of the CCS. A vector represents one reference, in which each category or subcategory appears or not. Thus, the set of vectors constitutes a matrix of CSS cooccurrences, where, if the reference contains information about the category or subcategory, a value of 1 is assigned to the vector; otherwise, it is 0. The cooccurrence matrix is transformed into a square matrix, and it becomes the input of the radial tidy tree graph. Thus, the references are shown by CCS categories and subcategories.
For the technological map of AI-based CSCL strategies, the reference corpus was grouped by CSCL strategies, methods, or algorithms. In addition, it was determined whether the reference was related to artificial intelligence. While elaborating this technological map, the references of the corpus were classified according to the following criteria:
Is the reference a strategy, a method, or an algorithm?
Does the strategy, method, or algorithm refer more than four times to the same paper or repository?
Is the strategy, method, or algorithm based on CSCL and AI?
If a reference meets the three criteria, it is stored in a segment of a vector. Then, its cooccurrence is determined. Thus, the vector is represented as a coincidence matrix, which is then transformed into a square matrix, which will in turn be the input for the radial tidy tree graph.
RESULTS
This study found the most used programming languages in the development of technologies and learning tools based on CSCL and AI (Figure 3), and it aided in the elaboration of a timeline of the evolution of the projects, the types of technologies used, and the most relevant programming languages from 2009 to 2019 (Figure 4). It also classified the information according to the categories and subcategories of the ACM Computing Classification System (CCS) (Figure 5), as well as the strategies based on CSCL processes and AI techniques (Figure 6).

Source: Authors
Figure 3 Programming languages used in the development of technologies based on CSCL and AI for improving the learning process in programming courses

Source: Authors
Figure 4 Evolution of the best projects, types of technologies, and most relevant programming languages for improving the learning process in programming courses
Technological map of programming languages
The review and analysis of the programming languages used in the development of tools based on CSCL and AI in programming courses aided that of the technological trends and specifications of the new technologies. From 2007 to 2021, 31 programming languages were used to develop 226 software projects to support the programming learning process. The most used languages were Python with 15,6%, JavaScript with 15,0%, and Java with 12,1%. Other programming languages complete the remaining 53%, such as Jupyter Notebook, R, C ++, Matlab, Objective-C, and Rust, among others.
In the development and research carried out to improve the learning of programming, there are issues related to the correct identification of the programming paradigm. Through a review of historical projects, these paradigms could be identified and grouped in order to provide a correct approach for new projects in this regard. In this review, the programming languages were classified by paradigm: 56% belong to the object-oriented paradigm, 27% to the functional paradigm, 9% to the logical paradigm, and 8% to the imperative paradigm. It is important to mention that, out of the 31 languages, 19,35% are multi-paradigm, such as Python, JavaScript, C++, Jupyter Notebook, Objetive-c, and Rust (Figure 3).
In the repositories, it was found that there is affinity between programming languages for a specific development. For Python, JavaScript, and Java, most of the projects found are focused on online learning platforms (nsoojin, n.d.; Leocardoso94, n.d.; Haghighatlari et al., 2020) and standalone platforms such as EVEA (vieiraeduardos, n.d.), Muse (sainuguri, n.d.), and Classroom (Karanval, n.d.). R and C++ complement each other in data processing and low-level programming tasks, where the software exchanges information directly with the hardware. Other languages such as Objective-C, Matlab, and Julia are integrated with Jupyter Notebook and design languages such as CSS in order to process large amounts of data and build interactive visualizations. Languages such as Python are combined with R and Java and used for technical implementations such as machine learning, natural language processing, and deep learning. JavaScript, R, and C++ are integrated to build projects that implement techniques based on neural networks, knowledge discovery in databases, and data mining.
The languages were grouped according to the computational tasks in which they are most used. That is to say, in data processing: R, Python, Node, and Java; for scaling applications: R, C++, Ruby, and Python; for data visualization: JavaScript and Python with HTML and CSS; for multiple processes (threads): Python, Ruby, and Node; and for adaptive and responsive tasks: JavaScript and Python.
Evolution timeline
The technological evolution of programming learning from 2009 to 2021was studied in the corpus of repositories. For each year, the most relevant repositories, programming languages, and technologies were identified. Figure 4 describes the projects over one decade. In the first years, the projects were related to online platforms, online courses, content management systems, web apps for learning, and massive online open courses (MOOC). In the last years, the projects focused on cloud platforms and intelligent learning management systems.
From 2009 to 2011, there were web application developments based on the client server architecture, where processes are executed on the machine and displayed in the web browser. The programming languages used to process data in the backend were C ++ and Ruby; to display in the frontend, Python, JavaScript, Java, Go, and Clojure were used. The most relevant projects involve interactive guides that use multi-user interaction and basic 3D environments to learn a programming language, such as ruby-kickstart (Cheek, 2019), go-book (Yuuta, 2019), html5-gamebook (J. Williams, 2019), and the Stanford Machine Learning Course (Pathrabe, 2019). Interactive guides are resources designed to reinforce student learning in a virtual environment through audio and video. Some of the strategies used were the intelligent tutoring system (Triantafillou et al., 2002a) and the rule-based system (Magnisalis et al., 2011).
In 2012, the improvements were made to interactive guides via games with 3D graphics, wikis, and learning assistants, where the interaction between teacher and student is supported by the computer. Programming languages such as PHP, Scala, C, and Haskell pioneered the development of platforms that support group learning in UML (unified modeling language) design problems such as jgltut (integeruser, n.d.), ElixirSchool (philss, n.d.), and ZeroBraneEduPack (pkulchenko, n.d.). These types of platforms are considered to be the basis of current projects such as Repl.it, Pastebin, and CodePen (Lafleur, 2017). On the other hand, the Learning Content Management System (LCMS) appears in Moodle (Moodle, n.d.), Chamilo (chamilo, n.d.), and Evolcampus (S.L., n.d.), which supports user management and is designed to provide educators, administrators, and learners with a single robust, secure, and integrated system to create personalized learning environments. The term e-learning became popular with virtual teaching methodologies. Some of the CSCL strategies created are adaptive hypermedia systems (AHS) (Yang & Luo, 2007) and e-learning systems (Debdi et al., 2015).
The high-performance computing processes used in online learning platforms forces companies and developers to build new architectures that allow optimizing resources. In 2013, exploration began on frameworks such as CakePHP (PHP5.3, n.d.), Kotlin (Kotlin, n.d.), and Django (django, n.d.) using the MVC model (model view template). This model allows improving the speed of platforms. The first platforms to integrate this model were Minerva (dmlc, 2019), LetsCode (Drupal, 2014), and Modern-JOGL (jvm, n.d.). These platforms support programming courses through the teaching of block programming. Moreover, new libraries were created to visualize large amounts of data as the first steps in the use of artificial intelligence techniques such as Machine Learning were taken (Pea et al., 1999).
In 2014, the common use of the Python and JavaScript frameworks significantly improved the development of analysis tools, source code tests, tools with visualizations that allow interpreting data, and integration with visualization libraries such as bootstrap, jQuery, and CSS styles. 2014 was characterized by the development of platforms for video conferences and augmented reality projects such as tparisi (2012) and Edgarjcfn (2014). Similar projects have been developed in academies and are currently integrated into Google, Coursera, and Platzi (Coursera, 2014). In this vein, some CSCL strategies include JigSaw learning (Gutwin et al., 2013; Magnisalis et al., 2011) and Predict student behavior (Desmarais & Baker, 2012).
Between 2015 and 2016, educational platforms aided by audio and video were developed under the concept of gamification (i.e., platforms that support learning through games which foster soft skills and knowledge management). The methods used in the teaching tools allow achieving goals by means of collaboration. The main developments were based on languages such as Python, Swift, JavaScript, and Java. The platforms that stand out are Free Code Camp (Porras et al., 2007), ClassroomWiki (Khandaker & Soh, 2010), and JavaStud (yrojha4ever, 2015). Other platforms, such as tpot (EpistasisLab, n.d.) and Dex (johnlee175, n.d.), use machine learning to improve students’ writing style in a programming language. For these years, different uses of the strategies were reported: the Coalition Formation Algorithm (Yang & Luo, 2007) and the Team Syntegrity Model (TSM) (hnshhslsh, 2016).
Between 2017 and 2018, it became necessary for the platforms developed in different programming languages to run in any software and hardware environment. JavaScript is a pioneer in the implementation of microservices (MS) architectures, using lightweight and portable containers for applications that work in any environment. The MS-based repositories reported for these years are Judge (Aglio, 2016), an online open-source judge API to compile and execute code with given test data; and judge (hnshhslsh, 2016), GreedExCol (Debdi et al., 2015), and uoj (vfleaking, 2016), which rely on containers with Mobi technology (formerly Docker) to optimize data I/O processes. The CSCL strategies for this period were based on two processes: evaluation with Adaptive hypermedia systems (AHS) (Triantafillou et al., 2002) and feedback with Predict student behavior (Desmarais & Baker, 2012).
Finally, from 2019 to 2021, languages such as Python, Objetive-C, and Java were updated to execute their processes under MS-based architectures through frameworks such as Angular 6, Swift 8, and Java 9. This change caused the development of learning platforms for programming to increase. However, in this context, each participant learns individually, and the professor does not have direct interaction with the students, which is why the learning approach is not appropriate in online platforms. The CSCL approach appears with platforms that integrate practice and theory for students and professors without having to be present in a classroom. This includes Sabo (tokers, 2016), Judger (University, 2015), and LeetCode (wty21cn, 2016) by Apple. Since 2018, platforms have focused on the specific processes of CSCL. For GF support, there are platforms such as: SunnyJudge (yudazilian, 2017) and timus (Soh et al., 2005): for LD support: INGInious (luvoain, n.d.), UNcode (Restrepo-Calle et al., 2018), SunnyJudge (0xyd, n.d.), and Trakla (trakla, n.d.); and, for PI support: PutongOJ and collaborative-online-judger (cqlzx, 2017). Currently, languages such as Python support CSCL processes through computational AI techniques based on prediction (Camisole 046), deep learning (RankFace) (Entropyxcy, 2017), and classification (Sun-Judge) (yudazilian, 2017). These projects, rather than being judges of a code, are able to evaluate the code of an application to know if it is optimal.
By reviewing the most relevant projects within the timeline, the projects based on CSCL and AI to improve the learning process in programming courses were identified. According to GitHub, these projects are in six software categories: software methodologies, compilation errors, software design, help with the style of the source code, artificial intelligence platforms, and virtual judges. Table 4 shows the most important projects of this review, grouped by the categories of the GitHub software, computational techniques, and CSCL processes. The relevance of a project in GitHub depends on the number of stars, copies, contributors, and forks.
Table 4 Most important projects between 2009 to 2021 grouped by the categories of the GitHub software, computational techniques, and CSCL processes
Project | Categories | ML | DL | NLP | DV | DM | CSCL process |
---|---|---|---|---|---|---|---|
ElixirSchool | LMS | X | GF | ||||
Minerva | Software methodologies | X | X | X | GF | ||
CodeBuddies | Compilation errors | X | X | LD | |||
GreedExCol | Software design | X | X | X | LD | ||
ClassroomWiki | AI platform | X | X | GF | |||
I-MINDS | Virtual judge | X | X | X | PI | ||
UNCode INGInious | Virtual judge | X | X | X | PI |
Source: Authors
Below are some works found in the review of the GitHub repositories. These projects apply some strategies based on CSCL (Table 4).
ElixirSchool (philss, n.d.) is the premier destination for people looking to learn and master the Elixir programming language. This platform aims for people with little knowledge to join people with a lot of experience. They have an algorithm that identifies which people should work in teams according to their abilities.
Minerva (dmlc, 2019) forecasts code and, based on this, it plans codes to keep students at an optimal level. To maintain this manually can take a long time and be surprisingly complex, so Minerva makes people with an optimal level periodically review their codes.
CodeBuddies (codebuddies, n.d.): community-organized hangouts for learning programming together.
GreedExCol (Debdi et al., 2015): a CSCL system designed to support collaboration in experimental optimality results for greedy algorithms. GreedExCol supports the discussion between the members of a small group of up to four students. The methodology for working with GreedExCol is as follows: first, each student performs their individual work (experimental research); then, the results obtained by each one are shared and discussed, so that they can propose the functions that are considered to be optimal.
ClassroomWiki (Khandaker & Soh, 2010): a collaborative web-based Wiki writing platform. For students, ClassroomWiki provides a web interface to write and review their group’s Wiki, as well as and a thematic forum to discuss their ideas during collaboration. When students collaborate, ClassroomWiki tracks all student activities and builds detailed models of the students who present their contributions to their groups. ClassroomWiki is based on a multiagent framework that uses student models to form groups of students and improve collaborative learning. Through CSCL processes, ClassroomWiki supports the formation of active learning groups and rubrics.
I-MINDS (Soh et al., 2008a) is a software solution for the intelligent management of classrooms or virtual groups in order to support activities both in real-time and offline, facilitating group work, adapting to the needs and background of individual users, and assisting moderation and group management. Its technology is based on intelligent software agents that autonomously interact on behalf of users. A useful tool to provide better support in CSCL processes, I-MINDS supports group formation and feedback. It does not have a code evaluator, but it does evaluate content.
UNCode (Restrepo-Calle et al., 2018): this educational environment offers summative and formative comments by evaluating the functionality of computer programs. When faced with a programming problem, students must write a solution and validate it through the automatic grading tool of the educational environment. Summative feedback is offered through verdicts of the programs, which indicate whether the syntax, semantics, and efficiency of the program are correct or have some type of error. Based on these verdicts, a rating is assigned to the solutions proposed by the students. In addition, the environment also offers formative comments through different mechanisms that support the student in the improvement of the proposed programming solutions.
Technological map according to the ACM Computing Classification System (CCS)
In the classification of the corpus of references, three results were found. First, according to the categories and subcategories of the CCS (Figure 5), 20% of the repositories and papers focus on collaborative learning with multiple agents based on cooperative methods, virtual education, and virtual classes. 16% focus on data analysis through tools, social software, and network analysis. 15,3% are e-learning and b-learning systems. For the remaining percentage, the projects are divided into the formation of groups, roles, and wikis. Second, in the analysis of projects and documents based on CSCL processes with artificial intelligence support, 15% are data mining, 12% involve natural language processing, 20% use neural networks, and 7% use predictive and statistical methods. Third, the references were classified as follows: 23% are academic courses, 12% are Application Programming Interfaces (APIs), 9% are software platforms, 13% are learning games, and 36% are platforms and learning tools.
Classification of strategies according to CSCL processes and types of software
10 strategies based on CSCL and AI were found. Each of the strategies has a specific purpose within GF, LD, and PI support. Figure 6 shows the grouping between the corpus of references and the strategies found. Each strategy is detailed below:
VALCAM algorithm. This algorithm makes use of a virtual currency (V) in the following manner. The system agent works as the provider and accountant of the virtual currency. Every time the user agents form a coalition and perform the required task, their individual and group performances are evaluated by the system and group agents, respectively. After the evaluation, the system agent rewards each user agent’s individual performance, while each group agent rewards each user agent’s performance as a group member (Soh et al., 2006a, 2008).
JigSaw learning. The JigSaw technique is a method for organizing classroom activities that makes students dependent on each other to succeed. It breaks classes into groups and breaks assignments into pieces that the group assembles to complete (a jigsaw puzzle) (Gutwin et al., 2013; Magnisalis et al., 2011).
Adaptive educational system (AES). An AES is mainly a system that aims to adapt some of its key functional characteristics (for example, content presentation and/or navigation support) to the learner’s needs and preferences. Thus, an adaptive system operates differently for different learners, considering information accumulated in individual or group learner models (Debdi et al., 2015; Triantafillou et al., 2002).
Intelligent tutoring system (ITS). It aims to provide learner tailored support during the problem-solving process, as a human tutor would do. To achieve this, ITS designers apply techniques from the broader field of Artificial Intelligence (AI) and implement extensive modeling of the problem-solving process in the specific domain of application (Triantafillou et al., 2002).
Adaptive hypermedia systems (AHS). It builds a user model of the goals, preferences, and knowledge of the individual user, and it uses this model to adapt the content of pages and the links between them to his/her needs. The variables that user models include can be classified as user-dependent, which includes those directly related to the user and define him/her as an individual, and as user-independent, which affect the user indirectly and are mainly related to the context of a user’s work with a hypermedia application (Yang & Luo, 2007).
Rule-based system. It uses the Constraint-Based Modeling (CBM) approach (i.e., it represents the domain knowledge as a set of constraints and a rule-based system) and offers adaptive/intelligent support by providing learners with hints during individual and group problem-solving processes, as well as feedback on peer interaction based on individual student contributions. Results show that CBM is an effective technique for both modeling and supporting students in developing collaboration skills (the participants acquired both declarative knowledge about good collaboration and did collaborate more effectively) (Magnisalis et al., 2011).
Coalition Formation Algorithm. In the initial state, all agents are mutually independent and not cooperative. Hereafter, as the agents acquire unceasingly more knowledge from the system and the environment, every agent may form some coalition on the basis of certain principles by consulting and comparing. Each coalition is considered as an independent entirety. All members in the coalition cooperate fully, so the coalition will be allowed to draw support from the abilities and resources of other members to complete tasks more efficiently than a single agent (Yang & Luo, 2007).
E-learning systems. E-learning can be thought of as the learning process created by interaction with digitally delivered content, services, and support. It involves the intensive use of information and communication technologies (ICTs) to serve, facilitate, and revolutionize the learning process (Debdi et al., 2015).
Predict student behavior. A prediction model whose main objectives are automatically predicting students’ performance and helping to measure and improve their goals (Abdulwahhab & Abdulwahab, 2017; Qiu et al., 2016).
Team Syntegrity Model (TSM) is a new process developed by Stafford Beer to allow groups to work together in a democratic, nonhierarchical fashion in order to capture their best thinking. It is a particularly appropriate process to use when groups are characterized by high levels of diversity (Asproth et al., 2011; Leonard, 2011).
Table 5 groups each strategy according to the type of implementation (manual or technological), the specific process of the CSCL to which it points, and the number of papers that explain its documentation or implementation.
Table 5 Final corpus of references to carry out the mapping study
Strategy | CSCL process | Tools found |
---|---|---|
Valcam algorithm | GF | 8 |
Predict student behavior | GF | 12 |
Team syntegrity model | GF | 9 |
JigSaw learning | LD | 14 |
Adaptive education system | LD | 18 |
Intelligent tutoring system | LD | 19 |
Adaptive hypermedia system | LD | 5 |
Rule-based system | PI | 15 |
Coalition formation algorithm | PI | 18 |
E-learning systems | PI | 15 |
Source: Authors
CONCLUSIONS
Based on the technological mining methodology, a reference corpus was constructed with 316 documents and repositories from 2009 to 2019. In the corpus analysis, this study obtained relevant information on how the integration of CSCL and AI improves the process of learning programming through four technological maps that show the programming languages and paradigms for software development, the evolution of the tools that support programming learning, and the classification of the corpus according to CCS categories, subcategories, and strategies based on CSCL and AI. In addition, the proposed methodology can support other research works in the construction of the state of the art. However, it is necessary to improve the process of semiautomatic construction of technological maps, as well as to implement new techniques for data extraction, storage, and analysis.
The results of this research help to understand the benefits and challenges of using the CSCL and AI approach in the programming learning process. Some strategies and tools have been identified, indicating that there are multiple and varied ways to implement this approach, such as the implementation of CSCL strategies used to form groups, to evaluate, and to provide feedback, as well as the use of AI computational techniques to control the process and measure student results using virtual judges for automatic code assessment, profile identification, code analysis, teacher simulation, active learning activities, and interactive environments, among others. However, there are still open research questions.
Regarding the GF support process, strategies are currently being implemented to group students by numerical categories, abilities and skills, student profile, and weighted maxima and minima. Moreover, these strategies have been supported by AI to automatically identify and group using Machine Learning methods (Thomas et al., 2002; Bennedsen et al., 2008; Wiggins et al., 2015; Soh et al., 2006a, 2006b; Costa et al., 2017) and probability and statistical methods (Salcedo & Idrobo, 2011; Hazzan & Dubinsky, 2003). However, this study did not find a CSCL strategy or AI-supported tool that groups computer programming students. Thus, the following question arises: How should CSCL and AI strategies be implemented to form groups in programming courses?
As for the LD support process, automatic code assessment appears in the form of virtual judges that are used in programming competitions, encouraging their integration into academic programming courses. However, virtual judges implemented as a method of teaching programming are not enough to foster logic and abstraction skills. Therefore, different learning platforms have appeared, such as I-MIND (Khandaker et al., 2006), UNCode (Restrepo-Calle et al., 2018), and INGInious (luvoain, n.d.), among others, which allow automatically assessing code, controlling the learning process, and obtaining an analysis of the results. However, this tool falls short in the evaluation of syntactic and semantic code, code style, multiparadigm compilers, and plagiarism identification. In this paper, some strategies supported by AI which could improve the code assessment process, but there are still questions to be answered: What would be the best method for automatic code assessment? How can students achieve programming proficiency through automatic code assessment?
In the case of PI support, programming content feedback is a difficult question to answer, since each student has a different programming style, but there are currently tools such as GreedExCol (Debdi et al., 2015) and ClassroomWiki (Khandaker & Soh, 2010), and I-MINDS (Soh et al., 2008a) which identify the syntactic structure tree of a programming language in order to compare it with the student’s code. This form of feedback has proven to work well. However, when evaluating by competencies or with a numerical system, this feedback is not so effective, as it needs to be more precise. AI and the CSCL could improve this process, helping to solve questions such as: How does AI improve the process of providing feedback from the source code? How could CSCL strategies be implemented in programming learning tools?
In the development and documentation of tools that support the learning of programming, there is still work to be done. On the one hand, there are no tools that implement all CSCL processes. With respect to the use of AI techniques, out of the tools found, 7% implement AI techniques, but only 1% is documented. On the other hand, there is no model that integrates the CSCL approach with AI techniques, thus allowing to implement learning activities and to observe and analyze the evolution of the system and how its users (students) improve their skills. In addition, the different tools found in this paper could be explored by professors and institutions, or new technologies could be developed from them.
In further studies, new alternatives will be explored to improve the process of learning programming, including aspects related to the implementation of tools, statistical methods, and learning analyses based on CSCL and AI that enrich and allow monitoring students’ training process, improving good programming practices, soft skills, and fostering greater collaboration and individual and group abstraction.