Last week I was forwarded articles about inferring gender on Twitter by a bunch of friends and colleagues. I believe Fast Company had the initial coverage [1] which was picked up by Business Insider, Atlantic Wire, Gizmodo, and others. The articles were mostly recapping a paper from researchers at MITRE Corporation [2] published back in May and presented at the EMNLP conference.
As someone with a text analytics and machine learning background who’s working in social media, people wanted my take on whether this could really be done and how credible we felt the research was.
One thing that was missing from most of the mainstream coverage was why it’s useful to know a social media user’s gender. Marketers are using social media as a platform for research and engagement. Knowing what males thought about your super bowl commercial or who the top female technology influencers are is useful to marketers to help message and engage with different segments of the community in ways that resonate. Twitter doesn’t provide users a profile field to self-report their gender, unlike Facebook, Google+, or Myspace.
I was also a little surprised by the overall tone of the mainstream response to this research. People seemed surprised that gender could be inferred on Twitter (one headline called it “freaky”). In fact, one can guess (infer) the gender of a person fairly accurately with nothing more than their name. Here is a recap of the accuracies from the MITRE paper which are fairly consistent with other research:
Bottom line is, yes, the MITRE research is credible and these types of models work well. In fact they work well enough to move out of the labs and into everyday use. We rolled out our own gender inference (based on a methodology very similar to that in the MITRE paper) in Visible’s last release.
[1] http://www.fastcompany.com/1769217/there-are-no-secrets-from-twitter