clock menu more-arrow no yes

Filed under:

Navel-Gazing: Rankings and Methods

New, 15 comments

SMQ's guest spot last week on Every Game Counts, detailing his issues with Ohio State as the presumptive, no-doubt Number One and his "resume-based" ranking method, led to a couple further questions in comments from Peter Bean of Burnt Orange Nation:

1. How difficult do you think it is to avoid power pollage when you attempt to rank strictly on resume? I ask you, specifically, because 1) you went with a preseason power poll of sorts, 2) you now attempt to rank your ballot strictly by resume (which is not to say you were wrong to do so), and 3) you've thought about the distinction between the two extensively. The short question would be: how difficult do you find it to be in ranking strictly on resume?

2. Is it even possible to avoid some amount of power polling? Let's take this to the next level. I ask you because, well, because you're thoughtful enough to sort through it: What if, as an example, Tennessee had tanked after their whipping of Cal. But Cal still improved as much as they have since week one. Does it affect Cal's resume? Should it?

Cal's loss to Tennessee is on the resume regardless of what happens to Tennessee - SMQ looks at the process as game-by-game, and each one counts on its own merits (demerits). So if Tennessee had tanked, Cal would definitely suffer for that in judging the "value" of that game. But Tennessee's win, too, would have less value if Cal had tanked.

On the first question, SMQ's preseason poll was not a "power poll," but a projection of how each team would finish in the final poll in January (the difference being that schedule was a huge factor in SMQ's poll, where it wouldn't be considered in a preseason power poll). But this ballot was also thrown out after the first week of games in favor of the "resume" for the rest of the season.

The most important measure of any poll or ballot is its internal consistency. On that note, SMQ came up with hairy, confusing definitions of several possible methods of ranking teams that he imagines encompasses most voters:

Power Poll, or "Holistic"
The apparently preferred method, which asks simply, "Who's better?" or "Who would beat who on a neutral field?" or something like that. No measurables, just a human brain sorting information as it sees fit - a kind of almost metaphysical effort to determine the "essence" of a team in its current incarnation. If you're a voter and haven't given much thought to your overriding method, this is almost definitely what you're doing.

Strengths: Simple, direct, and the most flexible, because its based the most on perception and opinion. Can incorporate both a "resume" and a "futures" element that takes into account where a team has come from and where it's going compared to another, similar team; i.e., if two teams look like they're in the same spot at the same point in the season, like undefeated West Virginia and undefeated Rutgers, for example, a notion of "strength" can take into account not only WVU's more successful past, but also its likely more successful future as the conference schedule stiffens towards the end of the season. If a voter looks at its remaining slate and says, "Rutgers is going to fall," the Knights will remain below a similar team, like maybe Boise State, which hasn't necessarily been more impressive on the field but has clearer skies ahead.

Drawbacks: Haphazard. There are really no internal rules to dictate consistency, which is a bitch when perception does not reflect reality, and an overemphasis is placed on a team's history (meaning past seasons) rather than its present. Ratings on "strength" are abstract, almost by definition non-quantifiable, and easily wrecked by idiosyncrasies in the illogical infinite regress of who beat who - in 2005, for instance, a victory chain can be drawn to show how Division III Averett University could have beaten Ohio State, which is proof (the chain, that is, which can be drawn to and from any team in any division) that merely beating a team is not a pure indicator of "strength." So other very malleable notions like "talent" must be brought into the picture to determine a prospective ten-win team from an eight-game winner. It's a real instinctual, gut-feeling guessing game up here, when one of the first rules of the process should be that your eyes and gut are not always reliable sources. Also leads to the dreaded "drop-em-when-they-lose" syndrome, which is excessively loyal to preconceived notions and pretty much just unfairly stubborn.

Resume
A method that attempts to rank based strictly on the measurable: if each team had a resume for this season and this season only, and its name at the top was blacked out, how would the voter rank those resumes? Takes into account only games played to date this season - these are folks who always complain about polls that come out and distort reality before October. SMQ's preferred method all year, and seemingly the default method for most end-of-season rankings.

Strengths: Consistency. Attempts to use "evidence" rather than perception or past history to eliminate abstraction, and treats every team equally and entirely as a team - doesn't give any boosts or demerits to teams based on the recent past or personnel. For example, Tennessee's opening win over Cal was deemed the most impressive of the week, and the Vols were number one in SMQ's poll in Week Two. If Boise State defeats a I-AA team in its opener by a two touchdowns more than Georgia defeats a I-AA team, as was the case the first week of this season, the "Resume" voter would rank Boise higher in the second week even if he believed Georgia was the "better" team, because there's no way to measure UGA's perceived superiority - it's just an abstract notion based on past teams, not the current reality. When Michigan State was an impressive 3-0, the "Power Poll" voter might have said "I don't believe in the Spartans, they always fall apart," and stayed away from MSU, but the "Resume" voter, even if he believed in an eminent collapse, would criticize and reward based solely on those three games, and deal with the meltdown only when it came (which, of course, it did in the fourth game). All that's considered is what's happened on the field to date, which is all that can be measured, and which is all anyone will have to go on in the final ranking in January, when it counts.

Drawbacks: "Attempts" is the very key word above. Even if a voter is using a statistical method (see below), subjectivity and abstraction creep in when considering how much credit or punishment is deserved for a particular win, especially early in the season, or, on the same lines, how to account statistically for strength of schedule. It's OK that the same win or loss on a resume changes in value as the season goes along according to changes in perception about a particular opponent, but that's still dreaded perception, which is what Peter was getting at in his second question. Early this year, in trying to come up with a way to account for strength of schedule on various resumes, SMQ started making a list that assigned a basically arbitrary value to each team as part of a group of similarly-valued teams, until it dawned on him to ask, "If this is what I actually think of these teams, why don't I just use this list?"

Futures
At the other pole, it's the mock stock approach - explicitly embraced most weeks by Orson and ripped off at least once by Gameday - of "buying" and "selling" (or "holding") teams based on where they're going to end up at year's end. These are the people who have West Virginia at two, or, weirder, one, based on the Mountaineers' softy schedule. It's not about what you've done, or how "good" you are - it's only about where you wind up.

Strengths: Ruthless pragmatism. The "Futures" voter probably didn't get carried away with Florida because of the minefield it had ahead of it, and is probably a lot less excited by Southern Cal with California, Oregon and Notre Dame awaiting than the Trojan-loving computers are. On the other end, Arkansas' stock shot up like a rocket with its remaining schedule after it beat Auburn; big money's going down on either Texas or Nebraska (especially if it's Texas) after this weekend, because it's pretty much clear sailing for the winner right into the Big XII Championship.

Drawbacks: Highly speculative by definition. Rewards soft scheduling, and creates bubbles around teams prepared to devour the empty calories in delicious cupcakes. Instills a hollow, frontrunning mentality.


Past results are no guarantee of future returns

Statistical (Faux "objective")
Like the "Resume" method, eliminates speculation and abstractions like perception and previous history to the extreme by running cold, hard numbers to reach a conclusion most bordering as closely as possible to scientific fact. The much-maligned computer guys.

Strengths: Able to process huge amounts of relevant information that puny human brains could never consider alone, and reach subsequently enlightening conclusions. When SMQ raged against the machines Monday, frequent commenter and resident stat guru Paul Kislanko argued "the only thing worse than using computers is using the human polls," and said by the end of the season, when teams are more connected by common opponents and opponents of opponents, etc., results like six I-AA teams ranked ahead of No. 63 Miami of Florida would be eliminated. So, clearly, they're not beholden to flawed human perceptions and biases, either - you know, an acrobatic, game-winning 20-yard catch that earns a kid an impressive highlight and all-conference honors is just another 20-yard catch in the books. Stupid mortals!

Drawbacks: Puny human brains are telling the computers what factors to consider and how much to consider them to reach said conclusions. SMQ, as one who's tried to devise his own low-tech, purely stat or other number-based projections, didn't say "faux objective" for nothing: the formula for input itself has all kinds of built-in biases that can be rigged (intentionally, for you conspiracy theorists, but more likely unintentionally) to favor certain types of teams. It doesn't matter what the formula is - unless, that is, it's something exceedingly simple like pure winning percentage, in which case it can't account for the all-important strength of schedule variances. Strength of schedule itself is the biggest stick in the craw here, because it skews the relevance of every other possible number, and the most difficult element to measure by numbers alone; many computer rankings, like Jeff Sagarin's, for instance, use "Record vs. Top 10" and "Record vs. Top 30," but this seems more than a little "Chicken or Egg?" If the rankings haven't been generated yet, how can you tell who's in the top 10 or top 30? After those numbers are figured in, and the top 10 and top 30 change, do the inputs to those categories change again to reflect the difference? And do they change again after that? And again, ad infinitum? The "finish line" to such changes is subjective. There's also the huge problem of grouping at the margins (No. 11 is grouped with No. 29 rather than No. 10, for example), which brings us back to the arbitrary nature of such decisions.

This will probably be elaborated on later - SMQ is intrigued by the notion of constructing four polls, one based on each method (or more, if there are more valid methods), and coming up with a final ballot based on an average of each one. He's not going to do that halfway through the season (he doesn't spend nearly enough time with the one method he uses now), but it's an interesting thought.