• Calender

    July 2013
    M T W T F S S
    « Jun   Aug »
  • Contact

    Send us press releases, events and new product information for publication..

    Email: nchuppala@ outlook.com

  • Advertisements

Hortonworks Sandbox



A full HDP environment on a virtual-machine. Sandbox enables you to quickly get up and running with the Hortonworks distribution of Hadoop. A dozen interactive hadoop tutorials are included to help you get started. http://hortonworks.com/products/hortonworks-sandbox/


Keep your eye on the stats

A number of recent crucial football matches have been won by the application of the kind of number-crunching that has already changed baseball — the Moneyball approach
by Simon Kuper

Arsène Wenger, French manager of the London football club Arsenal, was looking in 2004 for an heir to the midfielder Patrick Vieira. He wanted a player who could cover lots of ground, so he scanned statistics from European leagues and spotted an unknown teenager at Olympique de Marseille, Mathieu Flamini, who was running 14km a game. But could he play football? Wenger went to look and signed him for peanuts. Flamini prospered at Arsenal before moving to AC Milan.

Back then, Wenger was one of the few in football who used data to inform his decisions. In a traditionally anti-intellectual sport, he was a keen mathematician with an economics degree. But now a data revolution is sweeping the sport. Most big European clubs now employ sophisticated number crunchers. Arsenal’s data department is led by a German with a background in investment banking.

Data-crunching computers began infiltrating most professions in the 1980s, but sport remained immune. The first to change was American baseball, which saw the rise of a subculture of nerdy statisticians who, in their free time, played with the numbers of their beloved sport. Their dean was Bill James, janitor in a Kansas factory, who in his typewritten, mimeographed Baseball Abstract tore apart the game’s traditional wisdoms (1). He proved statistically that traditional ploys like base-stealing made no sense.

Eventually, some inside professional baseball noticed the Jamesians. Billy Beane, general manager of the Oakland As, had been an intellectual baseball player who at 27 walked into the head office and said he wanted to quit playing and become a scout. Beane read James’s theories with fascination. He grew fed up with the “gut wisdom” of gnarled scouts, and hired a 20-something Harvard-educated statistician to find new players. Using new stats, the team identified undervalued talent. It turned out that baseball’s respect for natural athletes was misplaced. Fat men with good ball sense did just as well. “‘Big-boned’ is the term we prefer to use,” says Beane. For years the fat players punched well above their weight for the team, winning more games than a penniless club had a right to expect. Richer clubs finally copied them. Recently, Beane marvels, the mighty New York Yankees has hired 21 statisticians.

Beane’s story was told in Michael Lewis’s book Moneyball (2). It sold over a million copies, became a Hollywood movie with Brad Pitt as Billy Beane, and is surely the most influential sports book ever written. Moneyball changed almost all ballgames. Inspired by Beane (and latterly by Pitt), executives throughout sport began using statistics to win matches.

Stats enter the beautiful game

A few people in English football read Moneyball. Stats had entered the game in the mid-1990s, when data providers began cataloguing metrics such as the number of kilometres, passes and tackles per player per match. Football executives made the pilgrimage to California to meet Beane, who had fallen hard for soccer (he’d encountered the game on a holiday with his wife) and quizzed the visitors about their game. Mike Forde, now performance director at Chelsea, jokes that in the last half-hour of conversation, he finally managed to ask Beane about baseball.

The Frenchman Damien Comolli, a former assistant of Wenger’s at Arsenal, visited too. Comolli briefly lived in northern California in his youth, was an As fan, and clicked with Beane. In 2005 Comolli became sporting director at Tottenham in London, with a chance to use stats to unearth talent.

He did find some excellent players for Spurs — notably Dimitar Berbatov, Luka Modric and an unknown 17-year-old named Gareth Bale, now a superstar — but he also ran into opposition from Spurs’s traditionalist coaches. The typical football coach left school young, isn’t an expert statistician and makes decisions based on a “gut instinct” acquired as a player; having seen many less-educated men replaced by computers, he wasn’t keen to have the same thing happen to him. And he wasn’t about to listen to a Frenchman whose playing career had peaked in Monaco’s youth team.

Eventually Tottenham ousted Comolli. He then joined Saint-Etienne. French clubs are dominated by emotional presidents, who seldom follow developments in American baseball. In France, Comolli was among the pioneers. Saint-Etienne rarely bought players, because it lacked money, but it did have to decide whether or not to offer players new contracts. If you had a starting player aged 30, and you gave him a new two-year contract, that might cost two million euros. At 30, the guy was still good. But how could you know if he’d still be good enough at 32? Comolli would look at his statistical trends: had the player been making fewer sprints each year, was his number of passes in the opponents’ half declining? If the trends were heading downward fast, you wouldn’t offer him a new contract.

‘So much further ahead of us’

Comolli and others were learning from Beane how to apply data to sport. But did he learn anything from European soccer clubs? They too had built up over a century’s worth of knowhow. Beane eventually told me, “They were so much further ahead of us in terms of nutrition.” He thought some more: “They dress nicer. When they walk in they all have their blazers on. We could never get our guys to do that.” Otherwise, when Beane looked at soccer, he saw an emotional sport, which meant irrational decisions.

In 2010 the American businessman John Henry, who had once tried to lure Beane to run his baseball team the Boston Red Sox, bought Liverpool Football Club. Henry knew nothing about football. He asked Beane, and Beane advised him to hire Comolli, who became sporting director of one of the world’s biggest clubs. Late at night he’d call Beane and chat.

Comolli’s tenure was a failure. He used numbers to establish that a young striker, Andy Carroll, was the best at heading in high crosses. He signed Carroll for €40m, and bought players with good passing statistics to feed him. But the experiment failed, because, as is becoming clear from the data, high crosses are a bad way of scoring goals — a truth that Liverpool had already demonstrated in practice. Short low passes are much more productive. Comolli had bet the company on the wrong strategy. In April 2012 he left Liverpool.

The greatest advances have been made in planning set pieces: corners, free-kicks, penalties and throw-ins. A set-piece is when a football match stops for a moment, and becomes an easily analysable static tableau rather like a baseball game. It’s at set-pieces that data now regularly prove decisive. Manchester City’s data department analysed about 400 corners in several national leagues over seasons, and concluded that the most dangerous corner is the inswinger: the ball that swings in towards goal.

The data team took this finding to City’s manager Roberto Mancini, who had played football for many years, and his gut told him that the most dangerous corner was the outswinger. But City’s outswingers kept on not producing goals. Mancini’s assistant David Platt came to chat to the data analysts, and they noticed that City had begun taking inswinging corners. Last season City scored 15 goals from corners, the most in the English Premier League. Vincent Kompany’s headed goal against Manchester United, which effectively clinched the championship for City, came from an inswinging corner.

Not only was last season’s Premier League arguably decided by data analysis; the European Champions League was too. In the final against Bayern Munich, Chelsea’s goalkeeper Petr Cech dived the right way to all six Bayern penalties, saving two. Afterwards, he said, “I either guessed pretty well, or I was ready to guess pretty well.” But he hadn’t just guessed. Chelsea’s data department had supplied him with a DVD of every Bayern penalty since 2007.

Team Cologne works on the data

A penalty analysis almost decided the last world cup final. The world’s leading expert on penalties is Ignacio Palacios-Huerta, game theorist and economics professor at the London School of Economics. He has assembled a database of over 9,000 penalties since 1995. I grew up in the Netherlands, and when Holland reached the world cup final against Spain in 2010 I emailed an official in the Dutch camp: would they like a report on Spain’s penalty-takers, prepared by Palacios-Huerta? The Dutch said yes. Palacios-Huerta (a Basque, happy to see Spain lose) worked 48 hours straight to produce his report.

With five minutes left in extra time in the final, the score was still 0-0. A penalty shootout seemed imminent. I was reading the file on my laptop that contained Palacios-Huerta’s report. Spain’s Fernando Torres usually shot low, he wrote; Xavi and Andrés Iniesta would probably shoot to the keeper’s right. What if Ignacio’s advice was all wrong? But then Iniesta scored, and penalties were averted.

Today the most statistically minded team in international football, Germany, is finding ways to use data when the game is in motion. A group of professors and students from the Cologne Sporthochschule (the “Higher School for Sports”) has worked for Germany’s national team for years. During last year’s European championship, “Team Cologne” produced a dossier several hundred pages thick on each team that Germany faced. The German coaches used the huge files.

Before Germany-Holland at Euro 2012, Team Cologne unearthed a key fact: the Dutch defenders often strayed too far apart. The secret codebook for Germany’s national team lays down that the ideal horizontal distance between defenders is eight metres. The Dutch were regularly leaving larger gaps. When the two nations met, Germany located those gaps and won 2-1.

In Oakland, the team with the second-lowest payroll in baseball made the playoffs last autumn. All their rivals now crunch stats too, but Beane and his MIT-educated right-hand man Farhan Zaidi (a fellow soccer fan) are still finding new crucial metrics. Beane has gone radical: “We’re now at the point that every transaction is backed by data analysis.” Football is rolling slowly in the same direction, about 20 years behind.

Simon Kuper is a journalist and writer. Most recently he co-authored with Stefan Szymanski,Soccernomics: Why England Loses, Why Spain, Germany, and Brazil Win, and Why the US, Japan, Australia, Turkey — and Even Iraq — are Destined to Become Kings of the World’s Most Popular Sport, Nation Books, 2012.

(1) Bill James, The Bill James Historical Baseball Abstract, Villard Books, New York, 1985.

(2) Michael Lewis, Moneyball: The Art of Winning an Unfair Game, W W Norton, New York, 2004.