Exploratory Factor Analysis: a critical evaluation
Originally written for a Psychology Research Methods Course in 1997.
introduction
The usual convention in starting an evaluation of method is to define the terms, however, this is a major problem we face in examining exploratory factor analysis, for our view of the “ontological status of theoretical entities” (Maxwell, 1962) governs how we define factors and ascribes the legitimate use to which we can put exploratory factor analysis. Royce provides a summary of some of the positions taken on factors:
Factors have been defined as dimensions, determinants, functional unities, parameters, and taxonomic categories. In terms of their theoretical significance they have been referred to as convenient classificatory conceptualizations …, as real … , and as artifactors (1963: 522).
To understand these differences in definition we need to explore the realist-instrumentalist debate. After examining this debate and a third associated position, the assumptions and problems associated with exploratory factor analysis will be discussed. An attempt will be made to integrate the usual perspective of exploratory factor analysis as involving inductive inference and Haig’s (1996) assertion that is involves abductive inference. Concluding we will see that: “it [exploratory factor analysis] allows for a number of relatively unexplored domains to be investigated so that they might form data bases out of which highly specific hypotheses may be generated” (Gorsuch, 1988: 235). Before this we have to make certain distinctions.
Factor analysis developed as an exploratory method. Spearman, the originator, developed it for this purpose (Kline, 1994: 7) Not until the 1970’s did it develop as a confirmatory method (Kline, 1994: 10). The distinction can be made between exploratory factor analysis and confirmatory factor analysis.In the case of exploratory factor analysis “the researcher would not venture any forecasts on the nature and structure of factors of factors that will be extracted from his matrix. The latter [confirmatory factor analysis] obtains when he sets forth an explicit hypotheses on such nature and structure, and treats factor analysis as a test that will either confirm or disconfirm his expectations” (Marradi, 1981: 26-27). The key distinction is whether hypotheses are stated prior to the method being employed (Gorsuch, 1988: 235).
A distinction must be made between common factor analysis and principle component factor analysis. Principle component analysis is one attempt to overcome factor indeterminacy, a problem associated with common factor analysis. Factor indeterminacy will be discussed later in this paper. For the purposes of this discussion only exploratory common factor analysis will be considered when we speak of exploratory factor analysis.
the debate
The question of the “ontological status of theoretical entities” divides into two main perspectives: the realist and the instrumentalist. Realism and instrumentalism are not strictly separate categories but are rather points along a continuum. Instrumentalism is equivalent to fictionalism as used by Block (1976) and others. “From a realist perspective, theoretical entities exist. Thus, in principle, there is no reason why there should be a gap between theory and observation. If there is a difficulty then the realist would say that the problem is a practical one” (Phelan and Reynolds, 1996: 153). Block describes a position related closely to instrumentalism, operationism, which equates a measurement with a construct, so “defining the word ’intelligence’ as what IQ tests measure” (1974: 127). Instrumentalism for Block rests on the idea that theoretical entities “should be construed as a way of speaking designed to facilitate prediction and control” (Block, 1963: 132). One effect of instrumentalism is what Block denotes as the “double game”, where for instance intelligence is studied in terms of the fictionalist approach, but policy, which uses their research, is based on a connotation of intelligence based on the realist interpretation of intelligence as capacity (1974: 137).
We can see the expression of the realist-instrumentalist debate in exploratory factor analysis. Gorsuch describes exploratory factor analysis in terms of paradigm conflicts, between the realists and instrumentalists and between ‘mathematical’, concerned with the use of the actual mathematical procedure, and ‘scientific’, concerned with the development of theories (Gorsuch, 1988: 232-3). He states that the there is usually an alignment between the mathematical and instrumentalist approaches and between the scientific and realist approaches (234). A problem exists if theoretical entities exist that are not observable, that is, how can these entities be measured or studied. Realists claim that exploratory factor analysis is the appropriate method to get at these unobservable variables. Gorsuch states: “The former [realists] are those who consider factors as expressions of underlying realities with the factors waiting to be discovered. The latter [instrumentalists] are those who feel all scientific constructs - including factors - are human inventions to help us organise our thoughts about data” (1988: 233). Instrumentalists are therefore looking for the most useful expression of “surface manifestations” (Gorsuch, 1988: 233). As Haig puts it: “factors are taken to be summary expressions of the way manifest variables covary” (Haig, 1996: 197). This is contrasted with Haig’s realist take on exploratory factor analysis: “On this realist interpretation, the factors are regarded as latent variables which underly and give rise to the correlated manifest variables” (Haig, 1996: 198).
There are expressions of each of these positions on exploratory factor analysis in the literature. Maraun, an example of the instrumentalist position on exploratory factor analysis argues that any claims of factors as “underlying variates; unmeasurable variates; hypothetical variates; hidden variates; unobservable variates; etcetera” are incorrect (1996: 637-8). In Maraun’s case latent variables are defined as the results of a factor analysis (Rozeboom, 1996: 566 referring to Maraun, 1996: 524). Child also cites Burt’s view:
… we may compare the advantages of using independent factors in psychology with those of using latitude and longitude in geography. … Our factors, therefore, are to be thought of in the first instance as lines or terms of reference only, not as concrete psychological entities (Child, 1970:
![]()
Contrasting the instrumentalist position is Mulaik, who stresses the movement from observed dependent variables to latent independent variables, stating:
’Factor analysis’ is a generic term associated with a number of multivariate statistical methods that model sets of manifest or observed variables in terms of linear functions of other sets of latent or unobserved variables” (1982: 637).
Marradi quotes Thurstone, one of the innovators in exploratory factor analysis, who has stated that: “These parameters represent scientific concepts. They are not merely numerical coefficients” (1981: 12). While Royce defines a factor as “a true variable, a process or determinant which accounts for covariation in a specified domain of observation” (Royce, 1963: 523).
Haig, alluding to Maxwell (1962), critiques the instrumentalist position based on the falsity of the observation/theory dichotomy (Haig, 1996:197). According to Maxwell (1962) unobservable and observable are a matter of degree and, that the line between observation and theory is basically arbitrary (7). This line is dependent on our physiology, current knowledge, and our measuring instruments. Thus, the line “has no ontological significance whatever” (14-15). This critique is strong and therefore I accept the realist position over the instrumentalist. However, acceptance of realism does not necessarily imply that factors are real. Royce provides justification for the position that exploratory factor analysis can operationalise variables deep in the “nomological net”, although not identifying the mathematical relationships between variables, factor analysis has the power to identify these variables, which can then be investigated further (1963: 526). The point that further study is required points to the exploratory factor analysis as a method for generating hypothesis, which will be discussed later in this paper.
a third position: beyond the debate?
We will briefly examine a third view which considers exploratory factor analysis an instrument but which rests on a limited view of the process of exploratory factor analysis. This third view championed by Baird (1987) is claimed to be free from the realist-instrumentalist paradigm conflict (334). Baird argues that exploratory factor analysis should not be understood as part of logic of discovery, rather as a “tool of discovery” (336). He argues further that the interest should be in the discovery of evidence not hypotheses. Hence, Baird views exploratory factor analysis as an “information-transforming” instrument (320), producing data not “hypotheses or truths” (335). Therefore, “Factor analysis is an instrument of discovery about the ensemble; with factor analysis we discover linear common factor structure in a population of correlated measurements” (320).
Baird in arguing that exploratory factor analysis is a summary of correlational data, ridicules the idea of exploratory factor analysis as a hypothesis generating method. He gives the analogy that given an area of land and the total population inhabiting it, that we should not hypothesise that population density is total population/land area. This is right, but to compare this to the generating hypotheses from exploratory factor analysis is ridiculous, by definition an average is an average! A hypothesis is not a definition. Mulaik’s (1991) critique of Baird demonstrates the relevance of this. He crticises Baird’s characterisation of exploratory factor analysis as just the statistical method, he argues that exploratory factor analysis is not just a strictly deterministic algorithm, but a process including the “selection of test variables” and interpretation of factors (1991: 88). Mulaik further critiques Baird’s foundationalism, stating: “We cannot separate our theories and hypotheses from the data which give rise to them or which we use to test them” (90). Another problem with Baird’s position is that since science is normative, the use of instruments are not given (98). Far from being independent from the domain being investigated, theory relates the instrument to what is under investigation (99).
assumptions and technical problems
The technical assumptions and the problem of factor indeterminacy both have consequences for the debate over the ‘ontological status of theoretical entities’. The major assumption of exploratory factor analysis is that: “We presume that the data are to be united by linear functions between the observed variables and a set of other variables known as common factors” (Mulaik, 1987: 287). The assumptions of factor analysis also include constraints on this linear relationship: (1) common factors have no correlation with unique factors, (2) unique factors are uncorrelated with each other and, (3) “the common factors ‘account for’ all the correlation found between observed test scores” (Baird, 1987: 321). The argument can therefore be made that the factors can only be considered real only if these assumptions accurately mirror nature. This is not a problem for the instrumentalist view, since factors are only seen as a convenient way to summarise the data. For the realist, this implies that factors are real only if the assumptions are valid or at least approximately valid. This lends credence to the perspective of exploratory factor analysis as hypothesis generator.
Factor indeterminacy is a major methodological problem for exploratory factor analysis. The problem is that factors are “not uniquely determined by the observed variables” (Mulaik, 1987: 296). There are many possible solutions, i.e. factors. The meaning given to factors are dependent on the factor loadings produced, and since these are not unique there is a problem in the interpretation of factors (Mulaik, 1987: 297). The existence of multiple solutions has consequences for whether factors can be considered ‘real’. The problem of which solution to take is usually tackled by the application the principle of parsimony, the most “simple” solution making use if the “principle of simple structure”(Kline, 1994: 64-5). This solves a problem of ease of interpretation, but still leaves the problem of factor indeterminacy unanswered. Mulaik offers two qualifications to this problem, firstly, as he puts it:
“Factor indeterminacy is just a special case of a more general form of indeterminacy commonly encountered in science, known to philosophers as the empirical underdetermination of theory - Data by themselves are never sufficient to determine uniquely theories for generalizing inductively beyond the data” (298).
Second, factor indeterminacy, according to Mulaik, is not “fatal” if exploratory factor analysis is viewed as a hypothesis generator, which is later followed by theory testing work (1987: 302).
Further, viewing exploratory factor analysis as a process, perhaps the most significant source of error is the role of subjective judgment throughout the process, both in the choices in the procedure and in the interpretation of factors (Nowakowska, 1973). For instance, the researcher will have to “find a conceptual dimension capable of plausibly unifying the indicators’ semantic contents” (Marradi, 1981: 42). From this the reality of a concept is partly dependent on the semantic description. Firstly, judgment is not unique to exploratory factor analysis. As Marradi puts it:
Factor analysis allows social scientists to bring many interesting constructs down to the empirical arena by measuring them in an adequate way, or - better - in a way that may be adequate if the algorithm is fed with accurately chosen and measured data, an the procedural phases are steered by a combination of semantic competence, experience with empirical research, familiarity with the substantive domain, solid epistemological foundations, and a good deal of common sense: the same qualities that are necessary meaningful to perform any kind of data analysis (1981: 12).
Secondly Mulaik (1987) and Haig’s (1996) prescription of exploratory factor analysis as hypothesis generator provides a balance to these problem. As Mulaik puts it: “an exploratory factor analysis study which ends in interpretations yields not incorrigible knowledge but the basis for hypotheses, which desirably transcends the resulting data treated as simple summary descriptions” (Mulaik, 1991: 89).
process
We have already discussed the process of exploratory factor analysis in relation to the debate over defining exploratory factor analysis and factors. We can now see how Royce’s explanation of the process of exploratory factor analysis will allow us to integrate the usual characterisation of exploratory factor analysis as involving inductive inference (Mulaik, 1987; Royce, 1975) and Haig’s (1996) description of exploratory factor analysis as involving an abductive inference to a common cause. For an example of the usual position, here is a statement of Mulaik’s: “Exploratory factor analysis is an inductivist method designed to discover an optimal set of latent variables” (1988: 265).
In most people’s minds there are two types of inferences: inductive and deductive. Deductive inference involves moving from the general to the specific, whereas inductive inference is a generalising move from specifics. Perhaps part of the reason that exploratory factor analysis is characterised as an inductive move is the entrenchment of the inductive-deductive duality. It may also be seen in terms of the common depiction of scientific method as either Hypothetico-deductive or some form of Inductive methodology. When a method presents itself, the tendency is to fit it into these categorisations of method. Rozeboom (1972) and Haig (1996) have proposed a further type of inference, which is Haig (1996) and Rozeboom (1972) denotes as “abduction” and elsewhere Rozeboom has characterised as “explanatory induction” (1996: 556). Haig states: “Abduction is a form of inference that takes us from a description of data patterns to a plausible explanation of those patterns” (Haig, 1996: 192). The key is that it is an explanatory move.
Haig (1996) and Rozeboom (1996) have both pointed out the applicability of this form of inference to exploratory factor analysis. As Haig puts it: “the move from correlational data to underlying factors involves abductive inference” (1996: 198). I have no problem with abduction as defined, the problem is, in this case, with how it is applied to exploratory factor analysis. To represent exploratory factor analysis as solely an abductive process neglects an important consideration. That is, as Royce (1975) has pointed out, that there are two major phases to exploratory factor analysis as process.
Royce states that: “The first phase is focused on the identification of replicable patterns of observables” (1975: 15). Exploratory factor analysis firstly then involves an inductive move, from data to non-unique solution. This is a generalising move from the specifics of the data to the generalised linear combination of factors. In light of the problem of factor indeterminacy, we can see this can only be a generalising move. The second phase, for Royce, “is concerned with the identification of latent unknowns” (15). The second phase is thus, an abductive move in explaining the meaning of the derived factor(s) in terms of reasoning to a common cause.
conclusion: exploratory factor analysis as a hypothesis generator
We should now recall Mulaik’s definition of factor analysis as a set of statistical methods that “model sets of manifest or observed variables” (1982: 637). A key to Mulaik’s definition is the word “model”, which fits with his conceptualisation of exploratory factor analysis as a method for hypothesis generation. Marradi, claiming that the correct root of the definition of hypothesis is a concern with the relationship between concepts rather than their existence, is wary of the way hypothesis is used in relation to factor analysis (1981: 27). But, given that we are attempting to investigate causality, the existence of a cause implies a causal relationship.
To accept that exploratory factor analysis is a way to arrive at the latent variables that are ‘causing’ the observed variables, requires an acceptance of the assumptions of factor analysis and recognition of technical and methodological problems. These assumptions and problems imply that the results of exploratory factor analysis are not ‘truth’ or ‘knowledge’, they are ‘only’ hypotheses or claims about knowledge. We should recognise that exploratory factor analysis generates hypotheses that are a function of the assumptions of the instrument, but are still valuable for gathering insights into the nature of underlying causal variables.
In closing, two things must be recognised. Firstly that the identification of hypotheses or ‘prototheories’ (Haig, 1996: 199) is only one stage in the scientific process. As Mulaik states: “regardless of the source from which a derives his or her hypotheses, he or she still must put the hypotheses to the test against experience that is independent of the experience used to develop the hypothesis” (1988: 286). Secondly, Mulaik makes the perhaps obvious point that exploratory factor analysis is not the only or perhaps even best way to arrive at hypotheses (1988: 269). Two related points are that the human is the important hypothesis generator, especially given that they select and interpret, and that there is perhaps a role for abductive reasoning in all theory generation. Pirsig, illuminating the human source of hypotheses, puts it like this:
The formation of hypotheses is the most mysterious of all the categories of scientific method. Where they come from, no one knows. A person is sitting somewhere, minding his own business, and suddenly - flash! - he understands something he didn’t understand before. Until it’s tested the hypothesis isn’t truth. For the tests aren’t its source. Its source is somewhere else (Pirsig, 1974: 106).
references
Baird, D. 1987. “Exploratory factor analysis, instruments and the logic of discovery”. British Journal for the Philosophy of Science 38: 319-337.
Block, N. J. 1976. “Fictionalism, functionalism and factor analysis”. Boston Studies in the Philosophy of Science, vol. 32. Edited by R. S. Cohen, C. A. Hooker, A. C. Michalos & J. W. Van Evra. Dordrecht: Reidel. 127-141.
Child, Dennis. 1970. The Essentials of Factor Analysis. London: Holt, Rinehart & Winston.
Gorsuch, R. L. 1988. “Exploratory factor analysis”. Handbook of Multivariate Experimental Psychology. 2nd ed. Edited by J. R. Nesselroade & R. B. Cattell. New York: Plenum Press. 231-258.
Haig, Brian D. “Statistical methods and psychology: A critical perspective”. Australian Journal of Education 40, 2: 190-219.
Kline, Paul. 1994. An easy guide to factor analysis. London: Routledge.
Marradi, A. 1981. “Factor analysis as an aid in the formation and refinement of emperically useful concepts”. Factor Analysis and Measurement in Sociological Research. Edited by D. J. Jackson & E. E. Borgatta. London: Sage. 11-49.
Maxwell, G. 1962. “The ontological status of theoretical entities”. Minnesota Studies in the Philosophy of Science 3: 3-28.
Mulaik, S. A. 1982. “Factor analysis”. Encyclopedia of Educational Research, vol. 2. 5th ed. 637-646.
Mulaik, S. A. 1987. “A brief history of the philosophical foundations of exploratory factor analysis”. Multivariate Behavioral Research 22: 267-305.
Mulaik, S. A. 1988. “Confirmatory factor analysis”. Handbook of Multivariate Experimental Psychology. 2nd ed. Edited by J. R. Nesselroade & R. B. Cattell. New York: Plenum Press. 259-288.
Mulaik, S. A. 1991. “Factor Analysis, Information-Transforming Instruments, and Objectivity: A Reply and Discussion”. British Journal for the Philosophy of Science 42: 87-100.
Nowakowska, M. 1973. “The limitations of the factor-analytical approach to psychology with special application to Cattell’s research strategy”. Theory and Decision 4: 109-139.
Pirsig, Robert M. 1981. Zen and the Art of Motorcycle Maintenance. London: Corgi.
Phelan, Peter & Reynolds, Peter. 1996. Argument and evidence: critical analysis for the social sciences. London: Routledge.
Royce, Joseph R. 1963. “Factors as theoretical constructs”. American Psychologist 18: 522-528.
Royce, Joseph R. 1976. “Psychology is multi-: methodological, variate, epistemic, world view, systemic, paradismatic, theoretic and disciplinary”. Nebraska Syposium on Motivation. Vol. 23. Ed. W. J. Arnold. Lincoln: University of Nebraska Press.
Rozeboom, William W. 1972. “Scientific inference: the myth and the reality”. Science, psychology and communication: essays honoring William Stephenson. Ed. R. S. Brown and D. J. Brenner. New York: Teachers College Press. 95-118.
Rozeboom, William W. 1996. “What Might Common Factors Be?”. Multivariate Behavioral Research 30, 4: 555-570.