Informatics Professor Cristina Lopes received a $600,000 Defense Advanced Research Agency (DARPA) grant under the Mining and Understanding Software Enclaves (MUSE) program. The program was started with the goal of reviewing billions of lines of open-source code to discover new relationships among this “big code,” thereby helping to build more robust software. As part of this effort, Lopes is researching software analytics for big code.
Such code duplication has considerable implications, given that research is increasingly conducted using large collections of open-source projects available on GitHub. Lopes and her team argue that the duplication can skew research conclusions if there was an underlying assumption regarding the dataset’s project diversity.
To address this issue, the team created DéjàVu, a publicly available index of file-level code duplication in the GitHub repository. Lopes hopes that DéjàVu will help researchers and developers better understand code cloning in GitHub so they can avoid it if needed.
— Shani Murray