A game changer takes on cricketâ€™s statistical problem

Tyler · « **on:** September 23, 2018, 12:00:50 pm »

A game changer takes on cricketâ€™s statistical problem
20 September 2018, 5:00 pm

Jehangir Amjad has done something few people can: He found a way to combine his favorite sport with his work. A longtime cricket enthusiast and player, heâ€™s currently tackling an important statistical problem in the game â€” how to declare a winner when a match must end prematurely, due to weather or other circumstances. Given cricketâ€™s global popularity, and the fact that matches can last for several hours, itâ€™s a problem of great interest to fans and players alike.

For Amjad, itâ€™s also a project that incorporates his passion for operations research. And the Laboratory for Information and Decision Systems (LIDS) was the perfect place for him to explore it.

Amjad took a circuitous path to MIT. Born and raised in Pakistan, he received a scholarship to complete his last two years of high school at the Red Cross Nordic United World College in Norway. Along with the schoolâ€™s 200 other students, who came from over 100 countries, he studied, made personal and professional connections, and learned how to live with people of many different cultures during his time there. He then returned home to teach for a year (following in the footsteps of his parents, who are both professors), before attending Princeton University for a bachelor's in electrical engineering.

He graduated in 2010, and assuming he was finished with school, went to Microsoft to be a product manager. After several years there, though, he felt restless. Realizing that heâ€™d found himself increasingly drawn to data science and machine learning since starting at Microsoft, he says figured he could either stay in the tech industry and learn more about these fields on the job, or â€œgo back to school to master the mathematical nuances of this field.â€ He chose academics and came to MIT in 2013 as a graduate student in the Operations Research Center. There, he collaborated frequently with LIDS students and researchers, under the supervision of MIT Professor Devavrat Shah.

Because Shah is also a cricket fan, he and Amjad had been discussing the cricket problem for years, although Amjad didnâ€™t land on his research project immediately. In fact, the theory that he is now applying to the cricket problem â€” robust synthetic control â€” is mostly used in economics, health policy, and political science. But because all of his work is interdisciplinary, he was able to see how to connect them. â€œA lot of what we train on [at LIDS] is the methods, but the applications are and should be very diverse,â€ Amjad says.

The current standard for international cricket games is to use the Duckworth-Lewis-Stern (DLS) method, created by British statisticians in the mid-1990s, to determine the winner when a game has to be called early. Amjad is viewing this as a forecasting problem.

â€œWe arenâ€™t just interested in predicting what the final score would be; we actually project out the entire trajectory for every ball, we project out what might happen on average,â€ he says.

In collaboration with Shah and Vishal Misra, a professor of computer science at Columbia University, Jehangir has used the robust synthetic control method to propose a solution to the forecasting problem, which has also led to a target revision algorithm like the Duckworth-Lewis-Stern method. Having back-tested their cricket results on many games, they are confident in the approach. They are currently comparing it to DLS, he says, and planning â€œwhat statistical argument we can make so that we can hopefully convince people that we have a viable alternative.â€

Broadly, synthetic control is a statistical method for evaluating the effects of an intervention. In many cases, the intervention is the introduction of a new law or regulation.

â€œLetâ€™s say that 10 years ago, Massachusetts introduced a new labor law, and you wanted to study the impact of that law,â€ Amjad explains. â€œThis theory says you can use a data-driven approach to come up with a synthetic Massachusetts, one that that mimics Massachusetts as well as possible before the law was in place, so that you can then project what would have happened in Massachusetts had this law not been introduced.â€

This creates a useful comparison point to the real Massachusetts, where the law has been in place. Placing the two side-by-side â€” the synthetic Massachusetts data and the real Massachusetts data â€” gives a sense of the lawâ€™s impact.

Amjad and his collaborators have developed a robust generalization of the classical method known as Robust Synthetic Control. In examining a problem this way, it turns out that limited and missing data do not become insurmountable obstacles. Instead, these sorts of difficulties can be accommodated, which is especially useful in the social sciences where there may not be many common data points available.

Continuing his example, he says, â€œthe method is about using data about other states â€¦ to construct a synthetic unit. So, specifically, coming up with a synthetic Massachusetts that ends up being 20 percent like New York, 10 percent Wyoming, 5 percent something else â€” coming up with a weighted average of those. And those weights are essentially what is known as the synthetic control because now youâ€™ve fixed those weights and youâ€™re going to project that out into the future to say, â€˜This is what would have happened had the law not been introduced.â€™â€

Eventually, as research continues and more data become available to add to the synthetic unit, the accuracy of the results should improve, he says.

Amjad has used robust synthetic control in this more traditional way, as well. One of his other projects has been a collaboration with a team at the University of Washington on a study of alcohol and marijuana use to assess whether various laws have, over time, affected their sale and use. Another example he mentions as being a particularly good fit is any situation where a randomized control trial isnâ€™t possible, such as studying the effect of distributing international aid in a crisis. Here, the moral and ethical implications of denying certain people aid make it impossible to use a randomized trial. Instead, observational studies are in order.

â€œYou [the researcher] canâ€™t control who gets the treatment and who doesnâ€™t,â€ he says, but the results of it can be watched, recorded, and studied. As his work evolves, heâ€™s also looking towards the future, thinking about time series forecasting and imputation.

â€œMy work has converged on imputation and forecasting methods, whether itâ€™s synthetic control or just pure time-series analysis,â€ he says.

This intersection is an emerging field of study. Econometricians historically used small data sets and classical statistics for problem solving, but with modern machine learning, options now exist that use lots of data to do approximate inference instead. Combining these approaches means you can explore the why of the problem and the prediction.

â€œYou care both about the explanatory power and the predictive power, using these algorithms,â€ Amjad says. â€œThese are designed for a larger scale, where you can still be prescriptive as well as predictive.â€ Elections forecasting is just one important example of the areas in which this work could be put to use.

Having defended his thesis earlier this year, Amjad is now a lecturer of machine learning at MITâ€™s Computer Science and Artificial Intelligence Laboratory. He says he is grateful for his time at LIDS â€” and all of the inspirational individuals heâ€™s met and the groundbreaking ideas heâ€™s come across here.

â€œThe biggest lesson of my PhD is that itâ€™s a journey,â€ he says. â€œLIDS is very accepting of you breaking the norm. They let people wander. And what that really helps you with is to understand that you can deal with ambiguity. If there is a problem that I donâ€™t know about, I may never be able to completely solve it, but that wonâ€™t prevent me from thinking about it in a systematic way to hope to solve some parts of it.â€

Source: MIT News - CSAIL - Robotics - Computer Science and Artificial Intelligence Laboratory (CSAIL) - Robots - Artificial intelligence

Reprinted with permission of MIT News : MIT News homepage

Use the link at the top of the story to get to the original article.

A game changer takes on cricketâ€™s statistical problem

Tyler

A game changer takes on cricketâ€™s statistical problem

Recent Topics

Recent News

Users Online

Articles