Who should benefit from organizational research?

Image: Winchester Mystery House by russavia, via Wikimedia Commons (CC-BY-SA-2.0)

Image: Winchester Mystery House by russavia, via Wikimedia Commons (CC-BY-SA-2.0)

[Ed note: This is the first of six articles in a virtual panel on Who should benefit from organizational research?]

Academic careers reward publication, and our system of journals often privileges novelty over cumulative insights. Judgments of quality are difficult and time consuming, and so we rely on proxy measures of “impact” that are easily gamed. Paradoxically, journals have evolved a sophisticated set of standards for evaluating research claims, but they reward being “counterintuitive.” As a result, while there is a lot of sound and fury, it is difficult to point to many areas of settled science when it comes to organizations.

Things are about to get worse unless we evolve new standards rooted in a clear sense of who and what organizational research is for.

The changing nature of organizations means that we need to reconsider who our constituencies are, as “managerial relevance” is becoming an elusive goal. Enterprises today bear little resemblance to the postwar hierarchies that animated early research. Some of the best-known companies have few employees to manage, while some of the biggest rely on computer algorithms to schedule, monitor, and evaluate. This raises a fundamental question for our field: who should benefit from organizational research? Answering this question can help guide standards of evaluation for research.

The Architecture of Organizational Research

San Jose, California is home to one of the most peculiar structures ever built: the Winchester Mystery House, a 160-room Victorian mansion that includes 40 bedrooms, two ballrooms, 47 fireplaces, gold and silver chandeliers, parquet floors, and other high-end appointments. It features a number of architectural details that serve no purpose: doorways that open onto walls, labyrinthine hallways that lead nowhere, and stairways that rise only to a ceiling.

According to legend, Sarah Winchester, the extremely wealthy widow of the founder of the Winchester rifle company, was told by a spiritual medium that she would be haunted by the ghosts of the people killed by her husband’s rifles unless she built a house to appease the spirits. Construction began in 1884 and by some accounts continued around the clock until her death in 1922. There was no blueprint or master plan and no consideration of what it would mean to reach completion. The point was simply to keep building, hence the sprawling and incoherent result.

Is the Winchester Mystery House a good house? It’s certainly beautiful in its own way. Any given room might be well proportioned and full of appealing features. A stairway might be made of fine wood, nicely joined and varnished, and covered in a colorful carpet. Yet it ends in a ceiling and serves no useful purpose other than keeping its builders busy. In assessing whether a house is good, we have to ask, “Good for what? Good for whom?”—the questions we would ask about other kinds of constructions. An airport is designed for a specific function, is built according to a blueprint, and is straightforward to evaluate, although evaluations might vary widely depending on people’s experience with the realized design. A cathedral has a plan that might take decades to realize, with adjustments along the way, guided by a shared vision for what its realization will be. But for the Winchester Mystery House, the act of building was an end in itself. It is a paradigmatic folly, according to one find on a Google search, “a costly ornamental building with no practical purpose.”

The field of organization studies might be compared to a sprawling structure. There can be little doubt that a lot of activity goes into constructing the field: according to the Web of Knowledge, over 8,000 articles are published every year in the 170+ journals in the field of “Management,” adding more and more new rooms. The questions of good for what and good for whom are worth revisiting. There is reason to worry that the reward system in our field, particularly in the publication process, is misaligned with the goals of good science: we often reward novelty over truth. As a result, we may look more like a mystery house than a cathedral.

Novelty or Truth?

The unit of currency in academic research is the publication, typically the journal article. Articles are how we convey what we found and put it in context. They are the record on which individual scientists (and departments and journals) are judged. There is an emerging consensus in some quarters that the system of journals and academic career incentives often favors novelty over truth in publications, that individual academic researchers are often rewarded for being interesting rather than getting it right, leading to systematic biases in the published record. If the advancement of knowledge were the goal of science, then individual articles would be recognized as a means, not an end in themselves. In most cases, individual articles count only as part of a totality of evidence: they are one tile in a mosaic. In our world, however, career incentives turn publications into essential tools for individual advancement, and this is not always compatible with getting it right. “To the extent that publishing itself is rewarded, then it is in scientists’ personal interests to publish, regardless of whether the published findings are true” (Nosek, Spies, and Motyl, 2012: 616). Aguinis and colleagues (2014) pointed out that scholars give different answers to the question “What is good for the advancement of our knowledge?” versus “What is good for the advancement of a scholar’s career?” The misalignment between individual career incentives and the advancement of our science is the source of much mischief.

Nosek and colleagues described some of the many ways that the quest for publication can yield articles full of intriguing yet false or misleading results. This does not require fraud by researchers or excessive sloppiness on the part of reviewers and editors. Rewarding scholars for publication per se, abetted by standard processes of motivated reasoning, is sufficient. Novelty is prized in the literature, replication is devalued, and falsification is rare. Positive findings (which might reflect the idiosyncrasies of small samples, particular designs, and/or choices about how to analyze and present data) are published, while non-findings wind up in the proverbial file drawer. There are many rewards for innovation and being the first to make a novel claim, and few punishments for getting it wrong, so that questionable findings can persist in the literature indefinitely.

In a provocative and even alarming piece on publication bias in medical journals, Ioannidis (2005) concluded that “most research findings are false for most research designs and most fields.” His simulation models, based on entirely plausible assumptions, show how the bias varies by subfield within medicine. Epidemiology—perhaps the field most similar to non-experimental research on organizations—has an especially poor chance of getting it right, particularly if many statistical relationships are tested on archival data but only the significant ones get reported.

Prospects for cumulation are further hobbled by a widely shared prescription for research to be “interesting”: to find something that is generally believed to be the case and show that it is not (Davis, 1971). As a guide to delighting readers, the prescription is appealing. But if the aim of publishing articles is to advance our understanding through a cumulative process of building on prior findings, then it is hard to imagine a more nihilistic dictum than to be “interesting.” What is good for the career of the individual researcher might be very bad for the collective health of the scientific endeavor (cf. Pillutla and Thau, 2013).

Impact or Progress?

How do we evaluate organizational research? How do we know what a contribution is or how individual articles add up? In some sciences, progress can be measured by finding answers to questions, not merely reporting significant effects. How many moons does Jupiter have? How does the volume of a gas vary with pressure? What is the structure of DNA? Did Homo sapiens interbreed with Neanderthals? Does raising the minimum wage reduce employment? Some answers are more provisional than others, but the aim is clearly to answer questions. In many social sciences, however, including organization studies, progress is harder to judge, and the kinds of questions we ask may not yield firm answers (e.g., do nice guys finish last?). Instead we seek to measure the contribution of research by its impact.

There are many ways to assess the impact of scientific work, from book sales (where Who Moved My Cheese? is the best-selling management book in history) to prizes to Google hits and article downloads (Aguinis et al., 2014). By far the dominant measure of impact is citations: how often a piece is cited in subsequent works. An advantage of this measure is that it is easily accessible: Google Scholar and Web of Knowledge are just a click away. Citation metrics are widely used in faculty evaluations and routinely come up in tenure reviews. By this accounting, good science is widely cited science.

Yet there is now a vast literature on the inadequacy of these measures as indicators of research quality. Moreover, because so much is at stake, the incentives for gaming the system are irresistible to many editors and authors. According to the Web of Knowledge, seven of the ten most-cited articles published in the field of management in 2012 all appeared in the same journal. Not coincidentally, almost all of the citations came from the same small set of journals and authors, which resulted in the offending journal being suspended from the Web of Knowledge. There are many ways to engineer measures of impact that have little to do with the quality of ideas or the contribution to science, and not all of them result in suspensions.

At a more fundamental level, impact in this sense may not measure what we want. Consider what happens when police are evaluated according to their numbers of citations and arrests. We might imagine that as a society we want “safety” or “justice,” but if what we count when we evaluate police officers is “number of arrests and citations issued,” we get something rather different—in the worst case, entire populations weighed down with criminal records for trivial offenses. Similarly, it is unclear why “impact” is an apt measure if the goal of research is to answer questions. If anything, raising questions that do not get answered, or being surprising and counterintuitive, may be better strategies for being widely cited than actually answering questions accurately (Davis, 2010). Being provocative may be more impactful than being right.

The Purpose of Organizational Research

In these new circumstances, it is appropriate to ask whose interests our research should serve. Who are the constituencies for organizational research? Answering these questions can guide our answers to what kind of research is worth doing and how we should be structured to do it.

For most of the 20th century, business corporations kept growing bigger, and the need for managers to staff their internal hierarchies spawned a massive expansion in management education. General Motors, the quintessential management-heavy corporation, expanded from 600,000 to over 800,000 employees within a few years. The demand for managerially relevant research was evident. Yet beginning in the 1980s, changes in the economy were reflected in the kinds of jobs taken by MBA students. Instead of seeking management jobs at GM or Eastman Kodak or Westinghouse, MBAs from elite schools went into finance and consulting, a shift that in turn empowered finance departments within business schools (see Khurana, 2007). Traditional corporations, particularly manufacturers, shrank or even disappeared through multiple rounds of outsourcing and downsizing, while the largest employers came to be in retail, where hierarchies within stores are relatively short. Meanwhile, information technologies increasingly turn the tasks of management (measuring and rewarding performance, scheduling) over to algorithms. There are nearly 7 million Americans classified as “managers,” but the content of their tasks may not involve the actual supervision of other people.

More recently, alternative business models have arisen that dispense with “employees” and “managers” entirely. Uber reported that it had 162,000 “driver-partners” in the U.S. at the end of 2014. These are not employees of Uber— which itself employed perhaps 2,000 people—but independent contractors without need for management. Amazon expands and shrinks by tens of thousands of workers at a time through the use of temporary staffing companies for its warehouses—it added 80,000 temporary workers for the 2014 holiday season. The tasks are straightforward and largely supervised by computer. Retail, fast food, and the “sharing economy” are increasingly moving to a world in which algorithms and platforms replace human management. Meanwhile, GM’s North American workforce has shrunk to under 120,000. Management of humans by other humans may be increasingly anachronistic. If managers are not our primary constituency, then who is? Perhaps it is each other. But this might lead us back into the Winchester Mystery House. If our standards of evaluation privilege what is interesting or novel to researchers over what is true, or what is valuable to the public that provides resources, then our sprawling enterprise is unlikely to continue forever.

A final possibility is that our obligation is to society more broadly. In a time of social transformation, when basic units of economy and society are undergoing upheaval with uncertain consequences, perhaps our best bet is to return to the mission laid out by Thompson (1956: 102), with an eye toward the new structures and new processes that are arising. Thompson wrote,

The unique contribution of science lies in its combination of deductive and inductive methods for the development of reliable knowledge. The methodological problems of the basic sciences are shared by the applied fields. Administrative science will demand a focus on relationships, the use of abstract concepts, and the development of operational definitions. Applied sciences have the further need for criteria of measurement and evaluation. Present abstract concepts of administrative processes must be operationalized and new ones developed or borrowed from the basic social sciences. Available knowledge in scattered sources needs to be assembled and analyzed. Research must go beyond description and must be reflected against theory. It must study the obvious as well as the unknown. The pressure for immediately applicable results must be reduced.

For most of its existence, the object of administrative science was bureaucratic organizations and their administrators. Today, few companies identify as bureaucracies, and few individuals claim the mantle of “administrator.” What will an administrative science look like in a world administered by algorithms? Many aspects of Thompson’s vision still hold: the combination of inductive and deductive methods, the use of the tools of basic social science, the benefits of an interdisciplinary orientation (which must surely include connections with information science), and the importance of theory. But it is time to update how we do administrative science and who we do it for.

Businesses and governments are making decisions now that will shape the life chances of workers, consumers, and citizens for decades to come. If we want to shape those decisions for public benefit, on the basis of rigorous research, we need to declare ourselves.

Gerald F. Davis, editor of the Administrative Science Quarterly, is the Wilbur K. Pierpont Collegiate Professor of Management at the Ross School of Business and a professor of sociology at the University of Michigan.

This article is an excerpt of a longer essay published in the Administrative Science Quarterly.

Leave a comment