The study sought to identify whether there was a pattern to sort documents as ‘classified’ or ‘confidential,’ based on machine learning and natural language processing techniques.

Renato Rocha and Flávio Coelho, professors from FGV’s School of Applied Mathematics (EMAp), in partnership with Matthew Connelly, history professor from Columbia University, have developed a study that allows creating algorithms to automatically sort diplomatic documents of the U.S. Department of Defense.

The study sought to identify whether there was a pattern to sort documents as ‘classified’ or ‘confidential,’ based on machine learning and natural language processing techniques. In other words, the algorithm was tweaked by the research team to identify communication patterns, in order to automate document sorting.

“Through supervised machine learning and natural language processing techniques applied to the contents of diplomatic documents, we reached a performance of more than 90% hits in most cases, in addition to identifying severe flaws in the manual sorting process,” said professor Renato Rocha.

The research falls under EMAp’s big data analytics and machine learning track, and is connected to Political Science and information transparency at Columbia University. This particular subject came to light during the U.S. presidential race, with the FBI investigating the conduct of candidate Hillary Clinton during her time as Secretary of State.

Related News

FGV News migrated to FGV’s Portal, in May 2017.
What would you like to do?