Multidepartmental Collaboration on Detecting Code Clones Leads to Distinguished Paper Award

November 20, 2018

Faculty and graduate students representing all three departments of the Donald Bren School of Information and Computer Sciences (ICS) received a Distinguished Paper Award at the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 18). At the internationally renowned forum for software engineering researchers, practitioners and educators, software engineering Ph.D. students Vaibhav Saini and Farima Farmahinifarahani, along with their adviser, Informatics Professor Crista Lopes, and statistics Ph.D. student Yadong Lu and his advisor, Distinguished Professor of Computer Science Pierre Baldi, were recognized for their paper, “Oreo: Detection of Clones in the Twilight Zone.”

Distinguished Paper Award recipients: (from left, bottom) Crista Lopes, Vaibhav Saini and Farima Farmahinifarahani; (from left, top) Yadong Lu and Pierre Baldi.

“The Twilight Zone is where the distinction between clones and non-clones gets increasingly harder to make, even by human judges,” explains Lopes. Researchers have categorized source code clones into four different types, ranging from textual (Type 1) to semantic (Type 4), and the “Twilight Zone” referenced in the paper resides between Types 3 and 4. The Oreo approach, named by Saini because the tool’s architecture resembles an Oreo cookie, combines machine learning, information retrieval and software metrics to detect not only Type-1, -2 and -3 clones but also clones in this Twilight Zone.

“The applications of code clone detection are varied and important,” says Lopes. Applications include detecting license violations and software theft as well as optimizing code. “With so much open source software,” she continues, “it’s easy for companies to get into trouble when software developers simply copy and paste files with problematic licenses into their code base.” According to Saini, who recently accepted a job offer from Microsoft and will start after defending his thesis this fall, the team plans to further improve Oreo to “capture even more clones with better precision.”

“Working on Oreo was a great experience,” says Saini. “We were thrilled to know that after many sleepless nights, liters of coffee and packets of Oreos, our work is getting recognized.”

Their work builds on previous collaborations aimed at developing and applying machine learning approaches to software problems. “Clone detection is only one such problem; there are many more,” says Baldi. “In the long run, most programming could be done just by telling machines what to do.”

Shani Murray