SciELO - Scientific Electronic Library Online

 
vol.28 issue1B. F. Skinner’s legacy, twenty years after his deathSchizophrenia, genetics, epigenesis, environment: a systematic review of unified etiological hypotheses and genetic profile; and a new algorithm to cope with the main findings author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • On index processCited by Google
  • Have no similar articlesSimilars in SciELO
  • On index processSimilars in Google

Share


Avances en Psicología Latinoamericana

Print version ISSN 1794-4724On-line version ISSN 2145-4515

Av. Psicol. Latinoam. vol.28 no.1 Bogotá Jan./June 2010

 

Bibliography profiling of undergraduate theses in a professional psychology program*

Perfil bibliográfico de las tesis de grado en un programa de psicología profesional

Jaime R. Robles**
Eugenia Csoban-Mirka***
Cristina Vargas-Irwin****

* Corresponding author: Cristina Vargas-Irwin, Fundación Universitaria Konrad Lorenz, Carrera 9 Bis # 60 - 43, Bogotá, Colombia. Telephone: 3472311, Ext. 111. E-mail: cvargas@fukl.edu

** Universidad Católica Andrés Bello, Caracas, Venezuela.

*** Universidad Católica Andrés Bello, Caracas, Venezuela.

**** Fundación Universitaria Konrad Lorenz, Bogotá, Colombia.

Recepción: 11 de noviembre de 2009
Aceptación: 5 de marzo de 2010


Abstract

The bibliographic profile of 125 undergraduate (licentiate) theses was analyzed, describing absolute quantities of several bibliometric variables, as well as within-document indexes and average lags of the references. The results show a consistent pattern across the years in the 6 cohorts included in the sample (2001-2007), with variations, which fall within the robust confidence intervals for the global central tendency. The median number of references per document was 52 (99% CI 47-55); the median percentage of journal articles cited was 55%, with a median age for journal references of 9 years. Other highlights of the bibliographic profile were the use of foreign language references (median 61%), and low reliance on open web documents (median 2%). A cluster analysis of the bibliometric indexes resulted in a typology of 2 main profiles, almost evenly distributed, one of them with the makeup of a natural science bibliographic profile and the second within the style of the humanities. In general, the number of references, proportion of papers, and age of the references are close to PhD dissertations and Master theses, setting a rather high standard for undergraduate theses.

Keywords: licentiate thesis, undergraduate thesis, bibliographic profile, bibliometric indexes, robust confidence intervals, bootstrap, cluster analysis.

Resumen

Se analizó el perfil bibliográfico de 125 tesis de grado (licenciatura), mediante la descripción de cantidades absolutas de diversas variables bibliométricas, así como los índices de los documentos y el promedio de rezago de las referencias. Los resultados muestran un patrón consistente a lo largo de los años en las seis cohortes incluidas en la muestra (2001-2007), con variaciones, las cuales caen dentro de los intervalos de confianza robustos para la tendencia global central. El número medio de referencias por documento fue 52 (47-55 IC 99%); el porcentaje medio de artículos de revistas citados fue de 55% con una edad media de referencias de revistas de 9 años. Otros aspectos destacados del perfil bibliográfico fueron el uso de referencias en lengua extranjera (61% de media) y baja dependencia de los documentos abiertos en la web (2% de media). Un análisis de conglomerados de los índices bibliométricos resultó en una tipología de los dos principales perfiles, distribuidos casi por igual, uno de ellos con la marca del perfil bibliográfico de una ciencia natural, y el segundo dentro del estilo de las humanidades. En general, el número de referencias, la proporción de artículos y la edad de referencias están cerca de las disertaciones doctorales y las tesis de maestrías, estableciendo un estándar bastante alto para las tesis de grado.

Palabras clave: tesis de licenciatura, tesis de grado, perfil bibliográfico, índices bibliométricos, intervalos de confianza robustos, análisis “bootstrap”, análisis de conglomerados.


Introduction

Undergraduate theses or dissertations are often considered the first complete exercise in scientific writing for psychology students. The process of writing a thesis is often studied from the perspective of the task-time dynamics or the personal traits associated with task completion (Klassen, Krawchuk, & Rajani, 2008; Klassen & Kuzucu, 2009; Rosario, Costa, Núñez, González-Pienda, Solano, & Valle, 2009; Seo, 2008; van der Hulst & Jansen, 2002). There is, however, another aspect of the undergraduate thesis process, which is the characterization of the resulting documents. Regardless of a variety of guidelines and standards for their elaboration, the thesis as a document has a set of empirical quantitative properties which may shed light on the types of references most frequently used in the research process, the most consulted journals in a given field, and the obsolescence rate of journals (Buchanan & Herubel, 1994; Vallmitjana & Sabate, 2008). Additionally, quantitative analysis of bibliographies, which is commonly referred to as bibliography profiling (Buchanan & Herubel, 1994), may provide feedback to existing guidelines for thesis elaboration or aid in the creation of new ones.

The bibliographic information contained in the citations constitutes one of the main aspects of the quantitative properties of the thesis as a document. The main aim of this paper is to describe the distribution of several bibliometric indicators for undergraduate theses, providing reference statistics useful for establishing standards for this type of document.

Studying undergraduate theses poses a series of challenges. One of them is the lack of uniformity of undergraduate programs, which vary widely across countries and institutions. Key differences are the length, course load and end-title awarded at the completion of the program. In this case, the theses studied are performed within the context of a five-year, full course load professional psychology program, resulting in a professional degree (licentiate), which allows the titleholder to start a lawful professional practice, or professional activities in general. This type of program is typical of most Latin American countries (Ardila, 1986). Even within this context, the undergraduate thesis is not an ubiquitous requirement. Most 3 or 4 year undergraduate programs do not result in a professional degree, and in most cases, when the thesis is performed, it is not a requirement but part of an optional honor program.

All these issues increase the difficulty of finding previous comparable studies or making generalizations based upon any of the existing studies, given the lack of uniformity of the programs and documents, and the scarce nature of the data.

Most bibliometric analyses of thesis have specifi c features and purposes, usually driven by intrainstitutional aims. In many cases, the main purpose is to provide feedback to the originating institution as for the use of library, labs and other resources (Leiding, 2005; Walters, 2008).

An additional factor contributing to the difficulty of using a standard approach for the bibliometric study of undergraduate theses, is the fact that they are seldom published in major journals, indexed or cataloged in standard widely-available databases, thus allowing for citation analysis.

Other studies include undergraduate theses as part of wider bibliometric analysis, which encompass Master’s and Doctoral Dissertations as well. It is clear that the academic process within a professional program (undergraduate) is different from a graduate program, especially in contexts in which the post-graudate degree is not required for professional practice. In this sense, the study of undergraduate thesis has to be specific and the characterization of these documents is useful to shape guidelines only if these studies focus on a particular kind of academic program.

Given those factors and contextual conditions, the bibliometric study of undergraduate thesis has to focus on within-document citation properties, instead of cross-citation analysis. Establishing the statistical properties of the distribution of the bibliographic entries in the thesis through bibliometric indexes can play a major role in shaping educational policy regarding theses standards.

Bibliometric indexes may also provide valuable insights into the diffusion and construction of knowledge within a specific field. Hargens (2000) has proposed a classification of scientific disciplines based upon the relative importance given to foundational and current scholarship. On one end of the continuum lie those disciplines that emphasize current research and rarely acknowledge original sources, since they are usually fully assimilated into standard scientific practices. References in this type of field usually become obsolete more rapidly and new findings are promptly incorporated into subsequent work. On the other end of the continuum lie those disciplines that emphasize the interpretation of canonical work where the main task of the scholar is not the generation of new empirical findings, but rather the production of current insights into classical problems. This type of discipline exhibits extensive use of “orienting reference lists”, that is, of sets of citations to well known classics aimed at supplying the reader with a conceptual framework which justifies the authors’ particular approach to a problem, rather than on providing empirical evidence on the subject matter. While the former type of scholarship typically characterizes research in the natural sciences, the latter is more typical of the humanities.

Other characteristics usually accompany both ends of the typology: scholarly work in the humanities usually exhibits a more regional orientation, while research in the natural sciences tends to follow international standards and interests (Adams, 1998). In the same token, the main type of references used in the natural sciences are citations to research articles, while more frequent citing of books is characteristic of the humanities (Nederhof, Zwaan, Debruin, & Dekker, 1989). Finally, single-author scholarship is far more frequent in the humanities, where more emphasize is given to the production of literature aimed at the general public (Nederhof, 2005).

In view of the above, the main objective of the present study was to characterize the prevailing scholarship style of the undergraduate thesis in psychology for a particular Latin American undergraduate program. The group of documents studied in this paper are a result of a thesis supervisory system implemented in the school of psychology at Universidad Católica Andrés Bello (UCAB), in Caracas, Venezuela. This system required students in the last year of the undergraduate program to perform systematic assessments and reports of the completion of their thesis during that academic year. One of the required reports included bibliometric information, specifically that concerning the bibliographic references cited in the document. The indexes which can be derived from such information are thus within-document citation indexes.

Most standard cross-citation indexes have applications in evaluation of scientific output or even predictive value on the scientific production (Alonso, Cabrerizo, Herrera-Viedma, & Herrera, 2009; Hirsch, 2007; Mathur & Sharma, 2009; Thompson, Callen, & Nahata, 2009). Within-document indexes, on the other hand, focus in the description of the properties of the document, and not in the output or impact produced by the authors.

Those indexes fall across two main axes: year of citation and source differentiation. Most timeoriented indexed are related to shelf-life and halflife indicators as cross-citation indexes. In the present analysis, within-document time indicators must be used, focusing on average publication year of the cited references as the basis to build several time-related indexes.

Regarding source differentiation, various ratios for an undergraduate thesis were included: book/ papers citation ratio, and topic diversity, which can indicate the orientation and depth of the reference search.

Method

Cohort description

Bibliographic reports archived by the thesis advising commission at UCAB were used to build the database used in this study. The reports range from 2001 to 2007, with a total of 125 theses, with the exception of 2005, due to unavailability reports for this cohort. The database was therefore built using data from 6 cohorts.

Data

In bimonthly reports, signed by the Advising Professor, the thesis authors reported the following indicators: Total Number of References, Average Year of the References, Number of References to Journals, Books, Psychology References and Inter-Disciplinary References. The data from the final report was taken to be an accurate estimate of the bibliometric quantities in the final document. These reports were completed several weeks before the final document is presented. However, a previous study (Hargens, 2000; Robles, Csoban-Mirka & Vargas-Irwin, 2009) indicated that by the time of this report, a minimum of 80% of the references has been completed, and 75% of the thesis has 95% or more of their references completed. This means the absolute quantities might be slightly underestimated, however, the ratios and indexes can be considered representatives, as they are less dependent on those small changes in the absolute quantities.

Several quality assurance operations are performed on the dataset, include range checking, random verification with the final document and consistency analysis on previous reports. A total of 49 out of the 125 thesis studied (39.2%) were verified against the final document, finding a 92.5% consistency rate between the reports and the documents. At least one document from each cohort was verified.

Index definition and estimation

Several bibliometric indexes were computed from the data gathered by the Thesis Supervisory Board. These indexes may be divided into two main categories: source-oriented and time oriented indexes.

Source-based indexes

a. Source ratios: percentage of the total references compromised by each type of source. These percentages include: percentage of journal articles, of books, of citations of theses and of citations to open web documents.

b. Index of Cross-Disciplinary References, defined as the percentage of references to sources outside the field of psychology. This is the reciprocal of the ratio of Psychology References to Inter-Disciplinary Reference, and provides an index of thematic diversity.

c. Method to Introduction ratio: ratio of references cited in the method section of the document to the ones cited in the theoretical introduction.

d. Foreign language Percentage: the reciprocal of the ratio between the number of references in Spanish to the total number of references.

e. Index of Qualitative Variation: applied to the 5 document types (sources) which can be cited in the theses: books, journals, other theses, personal communications and web documents. This index is a measure of variability in the use of sources. See appendix for details.

f. Percentage of Locally Available References: is the proportion of documents cited which were obtained through the local Universidad Católica Andrés Bello library.

Time-based indicators

Lag, computed as the difference between the average date of the references and the year of completion of the thesis. Computed for average total reference list (total), books & Journal Articles.

Analysis

In order to minimize the consequences of a possible underestimation (or sampling variation) of the absolute quantities in the reports, robust statistics and confidence intervals are reported for all variables. Robust confidence intervals were obtained using the bootstrap re-sampling technique (see Appendix).

While the description of the sample is the main focus of the analysis, an additional multivariate analysis was conducted to highlight the main features of the dataset. Both factor and cluster analysis techniques were explored, and the cluster analysis model was used to identify the different bibliometric profiles in the cohorts studied and to create a classification of the thesis according to their multivariate differences in the bibliometric quantities.

Ethical considerations

None of the authors or advising professors’ names were used in the creation of the databases, preserving their privacy. Once the quality assurance operations were performed, each thesis was identified by an arbitrary numerical value, without the possibility of backward identification. The results of this analysis have no consequences whatsoever for the authors of the thesis or their advising professors, as all of the thesis analyzed had been completed and approved years before performing this analysis. Data analysis is conducted through a statistical, group-oriented procedure, without focusing on any particular document. All the data used is part of the records kept by the thesis advisory board, and no additional information was required from any individual.

Results

Table 1 shows the raw quantities for the total number of 125 documents. Reporting these absolute values was deemed important, since they offer a more accurate picture of the depth of the literature survey of each thesis than that provided by the relative indexes. One key feature in this table is the difference between mean and trimmed-mean (t-m) estimates. The trimmed mean excludes the top and bottom 5% of the distribution, focusing in the central 90%. The consistent difference between the mean and the trimmed mean indicates that a more robust statistic should be used to describe these distributions, and in consequence, the analysis is focused on median values. Bootstrap 99% confidence intervals provide robust upper and lower bound values for both the mean and the median. Another feature in Table 1 is that low frequency quantities (number of theses and web documents) have a large dispersion, with coefficient of variation (CV) values above 100.

A notable exception to the spread of these raw variables is the total number of references. This is especially evident in the Quartile data, which show that 50% of the theses have reference lists ranging between 41 and 61 items in length.

These raw bibliometric variables showed no discernible trends throughout the six cohorts studied (Table 2). Most of the differences between the years are conditioned by the different number of thesis within each cohort: 21, 19, 27, 17,15 and 26, for years 2001, 2002, 2003, 2004, 2006 and 2007, respectively. Differentiating the series for number of papers and number of books, the average absolute yearly change for the number of papers is 4.7 and 2.7 for the number of books. Building an error bar based upon inter-cohort variations yields a variation range which falls within the bootstrap confi dence intervals presented in Table 1, indicating that the point estimates and their robust confidence intervals can be used to take into account intercohort variations. Estimating time-based trends is difficult given the variability in the conditions for each academic year and the low number of time points, since theses are completed on a yearly basis. Using the 3 main quantities as an example shows that the number of papers cited and the total number of references are fluctuating quantities, while the number of books is in steady decline from 2001 to 2006, but it rises again in 2007. All these factors and results, combined with the lack of data for 2005 lead to the conclusion that there are not clear yearly trends in the raw quantities observed.

Table 1. Descriptive statistics of the raw number of reference items by source, section, lenguage, field and local availability

Table 2. Descriptive statistics of the raw number of reference items by year

The bibliometric indexes (see Tables 3), as compared with the raw quantities, exhibit a more robust mean, since the difference with the trimmed mean is less noticeable than for the raw quantities. Journal articles proved to be by far the most widely used source for the sample as a whole, followed by books, and with other thesis and open web references being only marginally used. The scarce use of open web references is widespread across the sample, with half the theses analyzed citing two web references or less. Nonetheless, as a whole, the sample exhibited moderate levels of source diversity, as reflected in the mean of the Qualitative Variation Index. Another widespread characteristic of the literature reviews on this sample of thesis is the heavy reliance on foreign language sources and material not available at the local library. Indeed, three fourths of the thesis included at least 48% of foreign language sources (most commonly, sources in English, data not included) and no more than 20% of references were locally available. Crossdisciplinary referencing, on the other hand, was highly variable throughout the sample, with minimum and maximum values encompassing the full range of the index (0 to 100%). A similar, but less extreme, pattern may be observed for the Methods to Intro ratio: while the average minimum and maximum values of this index differ widely, three fourths of the thesis used less than one reference in the Methods section for every three references cited in the introduction. This points to the presence of a few outlier thesis with an emphasis on the method section, rather than on the substantive content. As to the average age of the references used, the books cited were in general older than the Journal articles, while the average age of the references remained highly homogeneous across the sample, as shown by the high similarity of the mean, median and quartiles.

As with the raw bibliometric values, the indexes failed to exhibit discernible trends along the cohorts (see Tables 4 and 5). The median values for all indexes fall roughly within the confidence intervals for the sample as a whole.

Table 3. Descriptive statistics of the Bibliometric Indexes

Table 4. Descriptive statistics of the Bibliometric Indexes by year

Cluster Analysis

Partitioning cluster analysis results are presented in this section Table 6. Another way of approaching this data is to explore the covariance structure (i.e. a factor model), however, there are issues with the correlation matrix, as many indexes have low specifi c variances, being redundant for the covariance structure, however those indexes are important for practical purposes. Q-oriented cluster analysis resulted in a more reliable computational solution, allowing the inclusion of the main indexes, and is therefore presented as the multivariate model for this dataset. For the sake completeness, a correlation matrix is included in the Appendix. The interpretation of the cluster analysis results is supported by the correlation structure.

The number of clusters was determined by the balance in the distribution of the observations and the separation among clusters. A number of different algorithms were employed, and the best solution was the K-means algorithm, which uses Euclidean distances between points and Means as cluster centroids (Leisch, 2006). Among solutions ranging from 2 to 5 clusters, the 2 cluster model yielded a more even distribution across the clusters, and the highest level of inter-cluster separation (average distance between clusters=2.4). Also, the 2 cluster solution was the most parsimonious.

The number of observations per cluster were 61 and 64, for clusters 1 and 2 respectively, representing 49 and 51% of the sample. This frequency count was, by far, the most evenly distributed of all the other cluster models.

Cluster 1 features higher proportion of papers (see Figure 1), minimum proportion of books, less variation in sources and content, larger proportion of foreign language references and shorter lags in reference date. Cluster 2 shows a higher proportion of books, a higher use of resources in the local library (Figure 3, and Table 6) and higher source variation.

The most noticeable differences are in the proportion of foreign language references, (22% more for cluster 1, see Figure 2), proportion of papers (21% more for cluster 1), and the proportion of books (17% more for cluster 2).

Table 5. Descriptive statistics of the age of references by year

Table 6. Differences in Bibliometric Indexes for Clusters 1 and 2

Figure 1. Boxplot of the Ratio of Papers by Cluster. Bottom, mid and top of the boxes represent Q1, Q2 (median) and Q3, respectively. Notches round the middle point are proportional to the standard error of the median. Lines beyond the boxes extend toward the 5 and 95 percentile of the distribution. Points beyond these lines are considered outliers.

Figure 2. Boxplot of the proportion of foreing language references by cluster

Figure 3. Fourfold plot of Cluster by high use of local library. Segment separation represent cell value departure from change expectation (Null hypothesis). Central ring represents the log-odds ratio contribution of each cell and outer rings represent 95% confi dence interval for the log-odds ratio. Counts for each cell are indicated near each segment.

Discussion

The shows the typical bibliographic profile of the licentiate thesis studied. The bootstrap confidence intervals allow considering the estimates stable enough to serve as reference values for policymaking about the standards for undergraduate thesis.

The extension of the literature review made in each thesis (as portrayed by the median number of references used) is difficult to gauge, due to the scarcity of bibliometric research on undergraduate scientific production. Bibliometric analysis of term papers by American undergraduate students report a median bibliography length of 10 reference items for the social sciences (Mill, 2008), while median reference list length for Master´s thesis at Iowa State University and Virginia Tech are reported to be of 33.5 and 30.5 items, respectively (Kushkowski, 2005). The reference length for Doctoral Dissertations reported in the literature exhibits a wide range: the median length for Iowa State University was reported to be of 69.5 items (Kushkowski, 2005), that of Virginia Tech of 76 items (Kushkowski, 2005), 91 for the Institut Químic de Sarriá (IQS) of Barcelona, Spain(Vallmitjana & Sabate, 2008), and 105 for Doctoral Dissertations in Education at the University of Minnesota (Haycock, 2004). Comparable bibliometric data from Latin American undergraduate thesis are even more scarce: Aguilar, López-López, Barreto, Rey, Rodríguez and Vargas (2007) report an average reference list of 32 items for undergraduate thesis in Organizational Psychology in a Colombian University. Clearly then, the depth of the literature reviews of the present dataset do not match those of doctoral dissertations, but are considerably longer than that of undergraduate term papers, undergraduate psychology thesis in other Latin American universities and Master´s thesis in several American universities.

As to the predominant references sources (journal articles, 67% and books, 22%), the present results match more closely the patterns found in undergraduate papers in the sciences than in the humanities or the social sciences. In his bibliometric analysis of papers from 64 courses in an American Mid-Atlantic college, Mill (2008) reported that for courses in the Humanities, references to books compromised 60.7% of citations, with journal articles made up just 24.5% of all references. For the Sciences, on the other hand, these proportions were practically reversed, with journal articles making up 66.2% of citations, while books accounted for only 17.3% of all references. Social Sciences showed a similar, but less extreme pattern, with citations to articles reaching 46.7 % and references to journals 25.2%. Similar differences between the use of journals for doctoral dissertations in the Biological and Social Sciences have been reported by Kushkowski, Parsons & Weise (2003).

The scarce use open web citations (median=5%) and heavy reliance on foreign language literature (median=59%) may be explained by the emphasis of most research methods courses and the thesis supervisory system at UCAB on peer-reviewed publications and use of foreign language references, mainly in English. Nevertheless, similarly low use of open web sources has been reported for doctoral dissertations in American universities (Kushkowski, 2005)¸undergraduate papers (Mill, 2008), and papers written by psychology faculty (Schaffer, 2004). Rather than constituting a new source of content, the World Wide Web seems to be providing more efficient access to traditional references, at least as far as the scholarly literature is concerned.

Regarding the age of the citations, in general, the average lag was of 10 years in the present sample, with 9 years for journal articles and 12 years for books. This can be considered a high standard for undergraduate theses, as those numbers are typical of journal-quality publications in social/ behavioral sciences. The kind of source used, as well as the relatively low age of the references, can be considered an effect of the thesis supervisory system employed during most of the time period studied (Dillon & Malott, 1981; Gant, Dillon, & Malott, 1980; Robles, Csoban-Mirka & Vargas-Irwin, 2009).

One final outstanding feature of the present data is the very low availability of references in the local library. While first-world bibliometric studies show local availability figures ranging between 62 and 89% (Mill, 2008; Schaffer, 2004), for the present sample, on average, only 14% of references were obtained locally. Although no similar data from other third world countries were available for comparison, the more meager resources may result in less duplication of collections between academic institutions, thus forcing students to use libraries from several universities in order to carry out thorough literature reviews.

Finally, in spite of the general orientation of the theses towards a natural science scholarly style, the multivariate cluster analysis of the present data shows that the distribution of the thesis along the continuum proposed by Hargens (2000) is not homogeneous, but rather allows for the characterization of two distinct groups: one which resembles more closely the natural science literature, featuring a higher proportion of references to journal articles, a smaller proportion of references to books, less variation in sources and content, larger proportion of foreign language references and shorter lags in reference dates, and a second group of thesis, closer to the scholarly style of the humanities, with higher proportion of citations to books, a higher use of resources in the local library and higher source variation. This heterogeneity in the scholarly style of undergraduate thesis in psychology may vary well respond to the diversity of the content areas within the field. The stability of the raw quantities, as well as the indexes, alongside the alignment of the results towards two kinds of bibliographic profi les (that of the natural sciences or the humanities) may provide valuable insights for policymaking and standards of bibliographic profiles for licentiate theses.

Appendix

Qualitative dispersion

One of the main approaches for the quantification of the dispersion of nominal variables is based on a concentration measure. Concentration is defined as:

where pi is the probability of the nominal variable Y to assume the value of the i-th discrete category (see Haberman , 1982, for a review). The Index of Qualitative Variation is defined as

where k is the total numbers of nominal categories (Reynolds, 1984). IQV is supposed to take any possible value from 0 to 1. Applied to the bibliographic source data, k is the number of source types (books, journals, web, etc), and the index measures the amount of variability of the citations across different sources. Lower values indicate concentration in few sources and higher values refl ect dispersion in the use of a variety of sources.

Bootstrap confidence intervals for means and medians

Due to the peculiar nature of the group of documents studied, with the added factor of its multi-cohort nature, it may be a stretch to make strong distributional assumptions about the data, and in consequence, estimation of standard errors and confidence intervals built by standard parametric methods may be inappropriate. Moreover, given that the raw quantities and indexes are of special interest to aid in policymaking of thesis supervision and regulation, robust confidence intervals were considered an important analysis requirement.

One way to build robust, non-parametric confidence intervals is a re-sampling method called bootstrap. This method produces robust confidence intervals based on a maximum likelihood approximation of the Empirical Distribution Function (EDF) (Efron, 1981). The bootstrap procedure employed to build confidence intervals for bibliometric indexes can be summarized in the following steps:

1. Definition of the EDF. In this case, each observation has a probability p=1/n.

2. Sample with replacement, Q samples of size n, according to the EDF.

3. For each sample, compute and store statistics for simulated sample, creating the vector S<j> of length Q, with statistical results for each of the j statistics (In this case, j =2, mean and median).

4. Use contents of vector S<j> as the sampling distribution of the statistical value. Estimate percentile values corresponding to the upper and lower bounds of the confidence interval for each of the j statistics.

There are other ways to build bootstrap confidence intervals, such as the normal method and the biascorrected method. However, the percentile method was considered straightforward and less loaded with distributional assumptions (Mooney & Duval, 1993).

Bootstrapping is a computer-intensive procedure, especially when involve location parameters like the median, and the chosen percentile method to estimate the confidence intervals require a large number of simulated samples (Q=10000) in this case, and intensive use of random number generators. To implement this procedure in a feasible way, a parallel algorithm was designed and programmed by the authors, to take advantage of a multi-core CPU, and an object-oriented implementation of random number generators was used (J. R. Robles, 1996), adapted for parallel computing in this case.

Correlation matrix for indexes and lags

The correlation matrix R shows standard Pearson moment-product correlations above the diagonal, and robust estimates below the diagonal, obtained via the Minimum Covariance Determinant (MCD) procedure. MCD is used to obtain robust covariance (and associated correlation) matrix estimate, by recombination of the observed data until the determinant of the matrix |R| is minimized. The obtained covariance matrix considered a robust estimate, based upon a selected subset of the observed data. The algorithm used in this case is an incremental random re-sampling procedure called FastMCD (Rousseeuw, 1999).

The main diagonal shows the lower-bound estimate of communality for each variable (Square Multiple Correlation, SMC). The SMC for each variable is computed using linear algebra algorithm based upon Cholesky decomposition of the correlation matrix. Results are equivalent to obtaining the SMC regressing each variable on the rest of the variables in the matrix.


References

1. Adams, J. Benchmarking international research (Editorial Material). Nature, 396 (6712), (1998), 615-618.        [ Links ]

2. Aguilar, M.C., López López, W., Barreto, I., Rey, Z.B., Rodríguez, C. & Vargas, E.C. Análisis bibliométrico de los trabajos de grado del área organizacional de la Facultad de Psicología de la Universidad Santo Tomás. Diversitas, 3 (2), (2007), 317-334.        [ Links ]

3. Alonso, S., Cabrerizo, F.J., Herrera-Viedma, E. & Herrera, F. h-Index: A review focused in its variants, computation and standardization for different scientific fields. Journal of Informetrics, 3 (4), (2009), 273-289.        [ Links ]

4. Ardila, R. La psicología en América Latina: pasado, presente y futuro. México DF: Siglo XXI Editores, (1986).         [ Links ]

5. Buchanan, A.L. & Herubel, J. Profiling PHD dissertation bibliographies - serials and collection development in political-science. Behavioral & Social Sciences Librarian, 13 (1), (1994), 1-10.        [ Links ]

6. Dillon, M.J. & Malott, R.W. Supervising masters theses and doctoral dissertations. Teaching of Psychology, 8 (4), (1981), 195-202.        [ Links ]

7. Efron, B. Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Biometrika, 68, (1981), 589-599.        [ Links ]

8. Gant, G.D., Dillon, M.J. & Malott, R.W. A behavioral system for supervising undergraduate research. Teaching of Psychology, 7 (2), (1980), 89-92.        [ Links ]

9. Haberman, S.J. Analysis of dispersion of multinomial responses. Journal of the American Statistical Association, 77, (1982), 568-580.        [ Links ]

10. Hargens, L.L. Using the literature: reference networks, reference contexts and the social structure of scholarship. American Sociological Review, 65 (6), (2000), 846-865.        [ Links ]

11. Haycock, L.A. Citation analysis of education dissertations for collection development. Library Resources & Technical Services, 48 (2), (2004), 102-106.        [ Links ]

12. Hirsch, J.E. Does the h-index have predictive power? Proceedings of the National Academy of Sciences of the United States of America, 104 (49), (2007), 19193-19198.        [ Links ]

13. Klassen, R.M. & Kuzucu, E. Academic procrastination and motivation of adolescents in Turkey. Educational Psychology, 29 (1), (2009), 69-81.        [ Links ]

14. Klassen, R.M., Krawchuk, L.L. & Rajani, S. Academic procrastination of undergraduates: low self-efficacy to self-regulate predicts higher levels of procrastination. Contemporary Educational Psychology, 33 (4), (2008), 915-931.        [ Links ]

15. Kushkowski, J.D. Web citation by graduate students: a comparison of print and electronic theses. Portal- Libraries and the Academy, 5 (2), (2005), 259-276.        [ Links ]

16. Kushkowski, J.D., Parsons, K.A. & Wiese, W.H. Master’s and doctoral thesis citations: analysis and trends of a longitudinal study. Portal-Libraries and the Academy, 3 (3), (2003), 459-479.        [ Links ]

17. Leiding, R. Using citation checking of undergraduate honors thesis bibliographies to evaluate library collections. College & Research Libraries, 66 (5), (2005), 417-429.        [ Links ]

18. Leisch, F.A. Toolbox for K-Centroids Cluster Analysis. Computational statistics and data analysis, 51 (2), (2006), 526-544.        [ Links ]

19. Mathur, V.P. & Sharma, A. Impact factor and other standardized measures of journal citation: a perspective. Indian J Dent Res, 20 (1), (2009), 81-85.        [ Links ]

20. Mill, D.H. Undergraduate information resource choices. College & Research Libraries, 69 (4), (2008), 342-355.        [ Links ]

21. Mooney, C.Z. & Duval, R.D. Bootstrapping: a nonparametric approach to statistical inference. Newbury Park, CA: Sage, (1993).        [ Links ]

22. Nederhof, A.J. Bibliometric monitoring of research performance in the social sciences and the humanities: a review. Scientrometrics, 66 (1), (2006), 81-100.        [ Links ]

23. Nederhof, A.J., Zwaan, R.A., Debruin, R.E. & Dekker, P.J. Assessing the usefulness of bibliometric indicators for the humanities and the social and behavioral-sciences. A comparative-study. Scientometrics, 15 (5-6), (1989), 423-435.        [ Links ]

24. Reynolds, H.T. Analysis of nominal data (2nd. ed.). Newbury Park, CA: Sage, (1984).        [ Links ]

25. Robles, J.R. PRS polytomous item generation-simulation according to the common-factor model. Applied Psychological Measurement, 20, (1996), 140.        [ Links ]

26. Robles, J.R., Csoban-Mirka, E. & Vargas-Irwin, C. Análisis cuantitativo de la dinámica individual de trabajos de grado de Psicología. Suma Psicológica, 16, (2009), 51-68.        [ Links ]

27. Rosario, P., Costa, M., Núñez, J.C., González-Pienda, J., Solano, P. & Valle, A. Academic procrastination: associations with personal, school and family variables. Spanish Journal of Psychology, 12 (1), (2009), 118-127.        [ Links ]

28. Rousseeuw, P.J. & Van Driessen, K. A Fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, (1999), 212-223.        [ Links ]

29. Schaffer, T. Psychology citations revisited: behavioral research in the age of electronic resources. Journal of Academic Librarianship, 30 (5), (2004), 354-360.        [ Links ]

30. Seo, E.H. Self-efficacy as a mediator in the relationship between self-oriented perfectionism and academic procrastination. Social Behavior and Personality, 36 (6), (2008), 753-763.        [ Links ]

31. Thompson, D.F., Callen, E.C., & Nahata, M.C. New indices in scholarship assessment. American Journal of Pharmaceutical Education, 73 (6), (2009), art. 111.        [ Links ]

32. Vallmitjana, N. & Sabate, L.G. Citation analysis of Ph.D. dissertation references as a tool for collection management in an academic chemistry library. College & Research Libraries, 69, (2008), 72-81.        [ Links ]

33. Van der Hulst, M. & Jansen, E. Effects of curriculum organization on study progress in engineering studies. Higher Education, 43 (4), (2002), 489-506.        [ Links ]

34. Walters, W.H. Journal prices, book acquisitions and sustainable college library collections. College & Research Libraries, 69 (6), (2008), 576-586.        [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License