An international team of eight researchers didn’t set out to measure GitHub duplication. Their original aim was to try and define the “granularity” of copying – that is, how much files changed between different clones – but along the way, they turned up a “staggering rate of file-level duplication” that made them change direction.
Presented at this year’s OOPSLA (part of the late-October Association of Computing Machinery) SPLASH conference in Vancouver, the University of California at Irvine-led research found that out of 428 million files on GitHub, only 85 million are unique.
Read the full story at The Register.