Using "Artificial Intelligence" to Fix Fake News

ryangtanaka (48)in #blockchain • 8 years ago (edited)

So you've probably heard about how Fake News is a problem on the internet right now, and that tech companies (Facebook, Twitter, Google, YouTube, Reddit, etc.) have largely all failed to prevent propaganda, illegal/immoral content, and false information from spreading all over the web. We're in a situation right now where nobody really trusts anything anyone says, because you're never quite sure where people are getting their information from, who originally made them, or if the person you're talking to is even real to begin with. Are they a person who really believes in what they're saying? Or are they a bot? Or a foreign agent? Or a journalist working undercover? Or a foreign agent's bot posing as a journalist? As of now, there's no real way to tell.

To be fair to the early-skeptics of social media platforms (who were often attacked for being curmudgeons), the fact that we would end up in this situation today was already apparent from the very beginning. Technology had lowered the barriers of entry for content publishing (a good thing) but didn't bother setting up the infrastructure or means of quality control. Combined with the culture of anonymity, foreign political interests, and lack of editorial oversight, it culminated into the perfect race-to-the-bottom chaos that made things like Brexit and the Trump presidency possible.

There's potentially a few permanent solutions lurking in cryptocurrency and blockchain projects on the horizon, but this will require the consensus of big institutions working in tandem (government, media companies, technology platforms), which in today's hyper-partisan environment, isn't likely to happen any time soon. There is a way for tech platforms to combat Fake News on their own terms, however, if companies are interested in tackling the problem in a clear and honest way.

Why Reviving the Humanities is Necessary

The reason why the Fake News problem got so out of hand has largely to do with the fact that tech companies never really wanted the responsibility of having to curate people's posts to begin with. This issue probably should have been in discussion the moment tech platforms became social, but emerging mediums usually don't follow such a clean-trodden path -- the rising new thing known as "social media" grew so fast that people were having trouble grasping what was even going on to begin with. By the time its problems became apparent, it was already too late.

It's only natural that technologists look at technology as the answer to society's biggest problems, but in light of recent events it has become very difficult to argue that we're really on the right path as far as our current approach goes. The humanities gives us the skills and training needed to write good stories so that we can see ourselves as being part of a greater whole -- to be mission driven, gaining the means toward understanding perspectives not of our own, and instilling a sense of duty towards ones' community rather than just one's self. Without the humanities, our existence is merely the pursuit of money and power for its own sake, with death patiently waiting for all of us in the end. Isn't there more to life than that?

There are technical challenges to the Fake News problem, sure, but I would say that figuring out the logistics of implementation is the easy part -- the hard part is navigating through the subjectivity of cultural ideas while changing people's attitudes about how we view and talk about things in an overall sense. Tech companies have the opportunity to lead the way here, or risk being restrained by ham-fisted regulatory practices imposed by the government or other third party groups.

One or the other will happen, either way. I would argue that it would be smarter to take the initiative on these matters rather than waiting for external interests to force them into place. One theme that always emerges in these types of discussions is the fact that if tech companies don't take explicit control of shaping the narratives that are being created on their platforms, outside interests will do it for them. And as recent events have shown, these interests are not always benevolent.

The Definition of "Artificial Intelligence"

I have "Artificial Intelligence" wrapped in quotes above (and everywhere else) because the way people talk about "AI" projects these days tends to be way over-hyped. Most "AI" projects are simply machine learning projects with sci-fi branding to make it appear more cool/powerful than it actually is. People do this because they want to make their project seem more appealing to investors and institutions -- especially towards those who don't have the technical acumen to tell the difference between the two.

In plain English, "Artificial Intelligence" is just a database with input streams that feeds into an algorithm in real time. The machine "learns" to the extent that it changes the value of variables within the system, but is wholly incapable of inductive reasoning without manual intervention by human beings. If you're worried about Elon Musk's apocalyptic visions of man-killing drones, don't be -- he's just doing what he can to raise more money for his crazy (government-subsidized) projects. His greatest fear really is about falling out of the media spotlight, not the possibility of prompting the end of the world. (There are much better candidates for that nowadays if you've been paying attention to the news.)

Tech companies are already using machine learning (or some variant of) in order to "police" their content online -- they have some prior experience with this already due to the fact that they're legally required to combat violent and discriminatory language on their platforms to the best extent possible. The algorithm will scan through the database and flag offensive words/phrases (the "n" word, threatening language, etc.), which then gets passed into a purgatory section for manual review.

It's necessary to automate this process to a certain extent because with the millions and billions of content assets being created every day online, running every single post through an human editor is literally impossible. But the accuracy of machine-based scanning tends to be hit or miss because the keyword matching and detection algorithms behind them are never perfect...or in many cases, not even good.

For a while, the goal of the tech industry was to make a super-algorithm that could handle ALL types of inputs and requests (hence the hype for "AI") but many have found out the hard way that this approach doesn't really work, and probably never will. A machine learning algorithm is only as good as the quality of its data inputs, after all, and most of the actual work involved in this approach lies in cleaning and good organization of the data itself. And that, fortunately/unfortunately, requires the intervening hands of human beings.

The problem with language (and the world in general) is that it never stands still. The usefulness of an algorithm gradually erodes over time as the machine locks into a process and becomes set in its ways. Human input is not only useful but necessary for the process itself to have any value. Manual intervention, in other words, should be seen as a "feature, not a bug" in order for "AI" technology to reach its full potential.

Colloquial Expertise

The biggest problem with the war on Fake News right now is that the vast majority of its initiatives are directed by a very top-heavy approach. The keywords and phrases used to identify problematic posts are usually done through executive decision rather than an organic process where the colloquialisms and dialects of niche and subcultures can be harnessed by the framework itself.

The most obvious example of this failure lies in what's known as "Dog Whistle Politics", where people will use "code words" to express bigoted sentiments in public in order to render it invisible to the editor's eyes. People with objectionable opinions may not always understand how things work in the big picture, but they aren't necessarily stupid -- if they see that they're being punished for saying certain things, they'll find ways to get around it since there's no penalty or disincentive to just renew your account and try again.

As it turns out, dog whistling is one of the oldest tricks in the game, and it still works wonders, no matter how sophisticated our technology becomes. I would be surprised if current algorithms caught most -- or even the majority -- of problematic posts out there because skirting around the edges of these regulatory practices is just so easy to do.

Another example: The word "queer" was originally used as a derogatory alternative to "fag", but eventually the LGBTQ movement co-opted the phrase in order to spin it into having a positive meaning for the community. One word can carry different meanings over time depending on when and where its used, and by whom. (The n-word used by Whites have an entirely different meaning than when used by Blacks, for example.) The machine, however, has no way of knowing the difference between any of these meanings because all it's able to see is the word itself.

As the above examples show, In order for the system to work properly, the algorithms have to be directed and updated regularly by a subject matter expert both in its identification and organization. Coming to an accurate, nuanced understanding of how language works and evolves is really a full-time job that requires very high levels of focus, skill, and subject matter expertise. This expertise becomes more important as you start to move away from blatant policy violations and into the highly nuanced worlds of political opinion and cultural sentimentality.

Another approach to the Fake News problem would be to empower user communities through self-policing (e.g. have keywords and phrases be determined by vote), but in either case the levers that control the learning mechanisms must come from the bottom, rather than the top. By the time colloquial information percolates up and waterfalls down, the conditions on the ground have already changed and the solution is already obsolete before it's even released.

Automation should be built for the people, by the people, in other words -- not only for its nice catch-phrase, but it's what actually works.

Conclusion

As a lot of new content production companies are finding out, doing things the right way is expensive, time-consuming, and often frustratingly difficult. Because most of the incumbent companies in the tech industry have built their entire business models around a completely different process, these changes are not likely to come from the major players any time soon. Given the level of distrust people are harboring against establishment powers right now, however, it may be wise to keep an eye out for startups and grass-roots organizations looking to push big, radical changes into the world because the environment is now ripe for these ideas (both good and bad) to gain a lot of traction.

This also raises the possibility of the humanities becoming a viable career path once again, after years and decades of neglect. But only if there's the will and means to take political and cultural issues seriously in mediums normally dominated by engineering culture. How long will it take to get there? Only time will tell. But the institutions and companies that manage to figure it out first will likely become the directors of the new status quo to come.

#culture #politics #programming #philosophy