2015
_TECHNOLOGY Computer Science

_Dusting for Cyber Prints

Computer scientists are using coding style to identify anonymous cybercriminals.

_Aylin Caliskan-Islam

Caliskan-Islam is a PhD candidate in the Privacy, Security and Automation Lab whose research interests range from privacy analytics for the end user to source code authorship attribution and underground forum analysis.

A team of computer scientists led by researchers from Drexel University’s College of Computing & Informatics has devised a way to lift the veil of anonymity protecting cybercriminals by turning their malicious code against them.

Their method uses a parsing program to break down lines of code, like an English teacher diagramming a sentence, and then another program captures distinctive patterns that can be used to identify its author.

“Just like writers and artists, every coder has a unique style all their own,” says Aylin Caliskan-Islam, a doctoral student at Drexel who developed the system and is the lead author of a paper on the topic. “Our process distills the most important characteristics of a programmer’s style — which is the first step toward identifying anonymous authors, tracking cybercriminals and settling intellectual property questions.”

Red_Handed

The team was able to use their method to pair computer code and its author with 98 percent accuracy — marks nearly as high as modern
fingerprint analysis.

Caliskan-Islam drew on contributions from Princeton University, the University of Maryland, the University of Gottingen in Germany, and the Army Research Laboratory to produce a digital analytics system that could become a kit for electronically “fingerprinting” cybercriminals.

Caliskan-Islam’s team tested their theory using acquired volumes of code — the collective work of 250 contestants who solved coding challenges as part of “Google Code Jam” competitions from 2008–2014. This sample yielded 20,000 distinct coding features and Caliskan-Islam’s program narrowed that list down to the most relevant 137, which were used as the data points for generating digital fingerprints for the authors.

When the team put together a lineup of anonymous author “suspects” to see if the program could successfully match them to some of their code, it was able to pair the code and its author with 98 percent accuracy — marks nearly as high as modern fingerprint analysis techniques.