Homeland Security is Using Text Profiling to Discriminate “Fact” from “Opinion”
Cornell University News Service reports that:
A new research program by a Cornell computer scientist, in collaboration with colleagues at the University of Pittsburgh and University of Utah, aims to teach computers to scan through text and sort opinion from fact. The research is funded by the U.S. Department of Homeland Security, which has designated the consortium of three universities as one of four University Affiliate Centers (UAC) to conduct research on advanced methods for information analysis and to develop computational technologies that contribute to national security. Cornell will receive $850,000 of $2.4 million in funding provided for the consortium over three years…
The new research will use machine-learning algorithms to give computers examples of text expressing both fact and opinion and teach them to tell the difference. A simplified example might be to look for phrases like “according to” or “it is believed.” Ironically, Cardie said, one of the phrases most likely to indicate opinion is “It is a fact that ...”
Recall that words associated with emphatic factual assertion are used to discriminate between “racist” and “anti-racist” text by the EU-funded anti-majority artificial intelligence watchdogs. In their research the appearance of such words would tend to indicate that the work was “racist” where “racist” was defined by similarity to the writings of selected “racist” authors. So does this mean Homeland Security’s “fact vs opinion” discriminators will classify “racist” writings as “opinion” and “anti-racist” writings as “fact”? Well, perhaps, but what is more important is why this “ironic” correlation exists.
We can test this assertion about the phrase “It is a fact that ...” by running a simple google search on that phrase. What we find is that there are two very different ways in which “It is a fact that…” can be used: 1) To assert the primary point of the writing. 2) To support the primary point of the writing. Using “It is a fact that…” as part of an opinion piece means the author isn’t necessarily asserting his primary proposition but is rather asserting supporting arguments which he believes are verifiable facts. The latter makes sense and, particularly when expressing opinions violating sensibilities of the likely audience, requires the emphatic use of verifiable facts. On the other hand, when one is reinforcing the sensibilities of the target audience, the primary proposition of the opinion piece is frequently asserted as fact which the audience is likely to accept without objection precisely because it is the common sensibility of the audience.
Posted by James Bowery on Tue, 26 Sep 2006 18:11 | #
Don’t be confused by their failure to come up with an objective distinction (operational definition independent of human judgement) between “fact” vs “opinion”. All they have done is ask some humans to use their judgement to classify some writings as “fact” and others as “opinion” and then used pretty standard data mining techniques to train a computer program to mimic that judgement against a much larger sample of texts.
The best the computer can do under these circumstances is no better than the selected human consensus can do. Indeed, as I pointed out in an earlier comment on “word sense disambiguation” and its application to creation of coherent lexicons, the use of humans as the standard is precisely where these approaches are failing to realize the potential of computer algorithms. There is a battle brewing within the philosophy of science over precisely this sort of standard and it is going to erupt throughout all of academia, the humanities as well as sciences.
The trigger of this eruption is the termination of the long hiatus—now nearly 50 years—of rational research into artificial intelligence. I won’t go into all of the dimensions of the abominable history of artificial intelligence research, but suffice to say that with the resurgence of algorithmic information theory, things are being reformulated rapidly.
The bottom line is this:
Information and knowledge are inseparable. If you can formulate information theory consilient with computer technology you have a rational basis for artificial intelligence. Algorithmic information theory is that consilience and it has been in hibernation for decades.
The principle result of algorithmic information theory is that the shortest program that can output a text string represents the true information content of that text string. It is Ockham’s Razor on steroids.
This doesn’t mean that a computer program can be written that will find that shortest program—indeed it has been proven that such a metaprogram cannot exist in the general sense. But what it does mean is that we have an objective test of the relative truthfulness of two discriptive frameworks. The one which results in the shortest description of the world—the one that is most coherent—most consilient—that “hangs together’ the best—is also the most truthful. We can still have human judgement play a part of course—but that part is put to the emperical test of now rigorously defined epistemology.
The failure of “political correctness” as a conceptual framework is, like the failure of the canons of prior theocracies, due to their need to inject confusing political construct at the wrong level of discourse. The correct level of discourse for political correctness, as with much theocratic nonsense, is as an instance of ethnic nepotism hijacking the moral machinery of competing ethnicities. If placed at its proper place in the universe of discourse, the world becomes more comprehensible precisely because its description is simpler.