making the sausage

Polone: The Folly of Having Focus Groups Judge TV Pilots

Photo: Eduardo Jose Bernardino/iStockphoto

Next week are the Upfronts, when the broadcast networks announce their new shows for the coming fall season, which means that this week network executives are making their final decisions on which of their 100 or so drama and comedy pilots to pick up. To make this decision, much of their attention — as well as that of the producers and studio executives who created those pilots — is concentrated on the research reports based on audience testing and focus groups done on each pilot. Having produced between 35 and 40 scripted pilots (I lost count years ago), I am very familiar with this process. And like most producers, I’m of two minds when it comes to the legitimacy of audience testing on pilots: (1) It is an invaluable tool, proving when a pilot connects with the public, and it’s a good indicator that a show will succeed. I believe this deeply when my show has tested well. (2) It is a ridiculous and wasteful exercise, famously damning shows that end up succeeding and supporting others that fail miserably, while invalidating the judgment and experience of those the networks allegedly trust to create entertainment for their viewers. I am firmly of this mind when my show has tested poorly.

These sessions are usually done with a test audience of 48 people who are divided evenly between men and women, and  who are each paid $75 for two hours of their time. They are shown the pilot and everyone is given a box with a dial that they turn one way or the other (assigning scores ranging from zero to 100) to indicate how they are enjoying what they’re watching at any given moment. There is also a button they can press to indicate that they would have changed the channel had they been watching it at home. Network and studio execs and producers watch them behind a one-way mirror, and also see a playback of the pilot with the real-time dial scores laid over it via a graph with three lines for men, women, and the combined average. After the screening, twelve men and twelve women from the group are brought into separate rooms and asked for their reactions to and opinions on the pilot, which they deliver frankly and often disparagingly; the creators and execs also listen to this from behind their fake mirror.

Opinions differ among network executives as to how important these testing sessions are. One executive vice-president at a broadcast network told me that she saw testing as a “true sense of what our audience would think of the show,” though she finds the instant dial reactions more valuable than the post-show debriefings because “raw data is more important than focus group. You’re looking for what makes people react; you’re looking for passion.” However, she notes that “if we are sketchy on a pilot, testing won’t get it on. As long as the testing is in the middle, everyone goes with their gut; but if testing is horrible, it is hard to get it on the air.” A VP at a prominent cable network whom I talked to was more cynical about testing: “If you put a baby or a dog in the pilot, the dials go up. Dials will go down when your lead is going through a tough time in the pilot. I see testing as an asset if you wanted reinforcement for something, [like] if my creative instincts tell me that that an actor isn’t working and the test results reinforce that. I think it is badly used if you love a pilot and then the testing isn’t good and that becomes a reason not to go forward. For me, it is [just] a selling tool to use with the internal higher-ups.”

TV producers and execs who hate testing often bring up the same arguments. First, they commonly cite the famously negative testing reports for Seinfeld and Friends. The report on the unenthusiastic response to Seinfeld offered such insights as, “George was negatively viewed as a ‘wimp’ who was only mildly amusing.” Friends received a very low score of 41 (65 is usually considered just middling), and its report begins with “overall reactions to this pilot were not very favorable. Interest in the show was very narrow.” 

Another common complaint is the size and makeup of these groups. One producer with a show in contention this year lamented, “How can a test audience of 50 or 100 people tell you if your show will succeed? And my guess is the affluent audiences that networks target aren’t sitting in a test screening so they can make 75 dollars.”  The idea that such a small sample can represent the whole market for a show does seem ridiculous and bound to deliver anomalous results. In his book Thinking, Fast and Slow, the Nobel Prize–winning economist Daniel Kahneman describes the inherent flaw in making judgments based on small sample sizes: “The law of small numbers is part of a larger story about the workings of the mind: Statistics produce many observations that appear to beg for causal explanations but do not lend themselves to such explanations. Many facts of the world are due to chance, including accidents of sampling. And causal explanations of chance events are inevitably wrong.” This kind of reflexive reaction to any result may explain the experience of the aforementioned producer during the current pilot season; when his show was tested by the studio (which they often do before turning them into the networks to make any last-minute changes), it went extremely well. “They never had a higher testing comedy at that studio. Two days later the same cut, basically, was tested by the network and it went very badly, for whatever reason. Everyone panicked and we had to scramble in a few short days to fix something that everyone was perfectly happy with. There’s no telling who is going to walk in that screening room and it can change day to day. But the latest bad test is all that matters. The previous great tests are invalidated.” And the future of a show can be damned because of accidents of chance when it comes to the small group selected for the focus group. For instance, a great show with gay characters could receive a bad score because an above-average number of homophobes were randomly selected for that particular test. The producer told me a bizarre story about a test he had on a pilot a couple of years ago where he “saw in the audience, and then in the smaller focus group, an actor who read for the third-lead series regular on that pilot. He didn’t do well and wasn’t called back. But there he was, watching the pilot he didn’t get, pretending he was not familiar with it, making sure to point out that the actor who got his part was bad.”

Regardless of whether you believe in testing, the networks do, so it creates great stress for those with a vested interest in a show’s success. I don’t think I’ve ever been to a test where a writer-producer didn’t watch the live graph of the participants’ reactions and, like a desperate gambler watching a race at an OTB parlor, gesticulate at the TV screen and implore the approval lines to move up. The above broadcast network executive acknowledged that “most creators find it brutal,” but went on to stress its importance. “Why not get as much information as you can to find out what the audience is thinking about the show? It’s not a painting. The creative community’s bias against it is misguided. If you can roll with it you might be able to give them something the audience likes better.”

I have found some value in testing. It can show what parts of a story are confusing; it can give you an idea of where things slow down too much and become boring; and, most usefully, it can tell you if a joke is funny or not. Either the viewers laugh or they don’t. But the post-screening focus group conversations are pretty worthless. People feel like they have to talk so they come up with criticisms where they might otherwise have none, and there are always those who have some agenda they like to forward or are just showing off now that they’ve been given an audience; one person among twelve can easily move a discussion to distraction. And I certainly don’t think the information gained by a test is worth the $25,000 that it costs.

Really, this testing ritual seems pretty outdated in the Internet age. Why stick with such a small sample size when we have the technology to quickly get the opinions of a much bigger audience? A network could put all of its pilots up on its website and let the entire populace of interested viewers vote for their favorites and leave comments on how they would like to see the show improved before it went to series. Anything less than that is like deciding who wins an election based a recent poll. Additionally, networks could probably sell ads on those online pilots, thereby recovering some of the huge costs they incur each year for productions that don’t get picked for the schedule. Those shows that do subsequently get picked to go to series would then already have a constituency of viewers who felt a vested interest in the particular show they helped to get on the air. That kind of connection can translate into consumer loyalty and dollars; I have to believe that everyone who bought the 2.5 million Ruben Studdard albums sold also voted for him in season two of American Idol. (Really, there can be no other explanation.) 

And if the Internet isn’t the way to go with this, maybe a network could program a new series called Pick the Lineup or, if it is on NBC, The Choice. All the pilots would air and compete against each other, and the voting would be handled just as it is on AI. I bet it would work not only as a tool to pick new series but as a stand-alone ratings hit — assuming, of course, this DIY-scheduling show can test well enough to make it onto the network’s schedule in the first place.

Polone: The Folly of Focus Grouping TV Pilots