EU Funded Anti-majority Artificial Intelligence Watchdogs

Posted by James Bowery on Wednesday, 20 September 2006 16:35.

A study funded by the European Union to create artificially intelligent anti-majority watchdogs.  Here is an excerpt:

The main aims of Princip project lie in the detection of racist content on the Internet…

The main body of linguistic work on racist language has concentrated on discourse analysis of majority groups within European countries and the USA with the aim of uncovering tacit or concealed racist attitudes. This has mainly been achieved by means of interviewing members of majority groups and subjecting the resulting text to a discourse analysis (prototypical example: Teun van Dijk, 1987, 19931 with many followers), or by means of studying existing political textual documents (M. Souchard, 1997) in order to detect recursive linguistic units and themes.

The Princip-project differs from previous linguistic research in the type and source of texts that it researches, in the method it uses and in the aims it pursues. The project deals with (mainly) open racist attitudes as expressed on the Internet. There has been research into web-based racist language before but this has generally been limited in its scope: using typical pieces of text from particular websites2. The Princip-project deals with large amounts of text published on the Internet and uses corpus linguistics methods as its primary research tool. The aim of the linguistic studies of the Princip project is to enable an automated multi-agent system to detect racist content without recourse to human input during on-line running.

Since this document may otherwise disappear from the net altogether I’m posting a copy here.

PS: As you read this document you will notice that nowhere do they cop to the fact that they are using the technique of “profiling” to “discriminate” between “racist” and “anti-racist” text.  Guys like “Godless Capitalist” like to point to government funded technology like this as evidence that “the cognitive elite” will crush any attempt by separatist movements to assert their fundamental human right of freedom of association/self-determination.  With such self-deceptive hypocrisy (no, Razib, hypocrisy isn’t simply saying you believe something and then not following it—its preaching something you don’t practice like Godless “Capitalist” does when he uses civil rights, immigration liberalizations and “fair” housing laws to gain access to other’s territory/property) there can be little doubt that “the cognitive elite” has all the intellectual integrity of brie on a hot summer patio table.

PRINCIP Project

Contract No 2119/27571

Project Deliverable D2.2

Linguistic Features of Racist Documents

January 14th, 2003

 
 
 
 
 

Maggie Gibbon1, Edel Greevy1, Heinz Lechleiter1, Patrick Martin1 

Jean-Michel Daube2, Natalia Grabar2, François Rastier2,

Monique Slodzian3, Mathieu Valette3 

Armin Burkhardt3, Reinhardt Hopfer3 

Roswitha Bayer4, Petra Drigalla4, Thomas Kässner4,

Nico Lausch4, Franz Stuchlik4, Nico Winkel4 
 
 
 
 
 

1 Dublin City University (DCU)

2 Institut National des Langues et Civilisations Orientales (INaLCO)

3 Universität Otto-von-Guericke Magdeburg (OVGU)

4 ADI Informatik-Akademie gGmbH (IAMD) 
 

Restricted document

 
 
 

The work for redacting this document was carried out with the financial support from the European Union.

 

Introduction

After a brief re-statement of the aims of the Princip project and the hypotheses underlying its implementation, this paper presents the linguistic features of textual corpora collected from Internet resources. The results we present are not final and will be adjusted and tuned after further analysis during WP2. As explained in Deliverable 2.1, distinctive and racist-specific features will be used as clues for the detection of racist content. These clues will then be characterised by quantitative values (presence, absolute or weighted frequencies). Detection of these clues in Internet documents will be performed with modules and tools, such as those presented in Deliverable 3.1.

The Aims of Princip

The main aims of Princip project lie in the detection of racist content on the Internet.

Various studies of racist content have been addressed by previous research. A major part of this research is aimed at the analysis of racism as a social and psychological phenomenon, mainly in an attempt to explain its origin and development. Approaches here include social cognitive approaches (e.g. Hamilton and Trolier), social identity theory (e.g. Tajfel, Turner, Giles), psychoanalytical approaches (e.g. Ottomeyer, Adorno, Fromm, Horkheimer, Simmel, Reich), political economy (Nikolinakos, Miles), post-modern studies (Hall, Westwood), Marxist and neo-Marxist theories (Miles, Guillaumin, Taguieff), most of them with varying degrees of interest in the linguistic expression of racist attitudes. The main body of linguistic work on racist language has concentrated on discourse analysis of majority groups within European countries and the USA with the aim of uncovering tacit or concealed racist attitudes. This has mainly been achieved by means of interviewing members of majority groups and subjecting the resulting text to a discourse analysis (prototypical example: Teun van Dijk, 1987, 19931 with many followers), or by means of studying existing political textual documents (M. Souchard, 1997) in order to detect recursive linguistic units and themes.

The Princip-project differs from previous linguistic research in the type and source of texts that it researches, in the method it uses and in the aims it pursues. The project deals with (mainly) open racist attitudes as expressed on the Internet. There has been research into web-based racist language before but this has generally been limited in its scope: using typical pieces of text from particular websites2. The Princip-project deals with large amounts of text published on the Internet and uses corpus linguistics methods as its primary research tool. The aim of the linguistic studies of the Princip project is to enable an automated multi-agent system to detect racist content without recourse to human input during on-line running.

There are a number of assumptions which inform linguistic analysis of racist language in the framework adopted by Princip:

     
  • Racist language   contains features sufficiently different from other discourses to make   it possible to determine clearly what is, and what is not, racist;
  •  
  • Since racist and   anti-racist materials share some similarities (for example certain keywords),   we have to analyse not only a racist corpus, but also an anti-racist   one and, certainly, general language corpus and then to contrast linguistic   features obtained from these different corpora;
  •  
  • These features are   not limited to explicitly racist lexis (like specific nouns, or specific   phrases). Racism often consists of implicit rhetorical construction   (euphemism, antiphrasis) due to prevailing prohibitions of hate speech   within the countries hosting the material;
  •  
  • We presuppose that   racist content can be revealed through language. Analysis of a document   makes it possible to reveal the specific linguistic features of racist   discourse and its style or genre. If the assembled corpora are representative   and large enough - and the linguistic features are  specific and   stable enough - we will be able to detect these features and generalise   them to a larger set of documents of the Internet.
  •  
  • Documents of racist   corpora may present important differences. In this regard, as we explain   in Deliverable 2.1, finer and more precise subcorpora should be created   according to different prototypical models.

 
 

Levels of analysis 

Different types of analysis focus on different features of the racist corpus. These are linguistic features, features relative to the structure and organisation of the document, and features relative to its location. Linguistic features include the following: character, morpheme, word, collocation, isotopy, POS tag, POS tagged word, lemma, POS tagged lemma, sentence, paragraph, document, URL. Features relative to the structure of the document are mainly HTML tags and their attributes. Location features concern the URL and IP address of the document. Features can be simple or complex. Simple features correspond to units such as character, morpheme, word, POS tag, sentence, paragraph, document, URL. Complex features correspond to the combination of simple units: collocation, co-occurrence, isotopy, tagged word, word in a given HTML tag. Features can be detected in a full text document or/and in other formats (HTML or tagged formats).

Despite clear linguistic differences between the three languages, their racist discourses and the tools used by each partner, we have obtained similar linguistic features. Some of the features are common to all three  languages (words, collocations) while others are mostly present in one or two languages only. As stated above, the more implicit nature of French racism, for example, involves specific methodological approaches (and consequently specific tools are required). In the fremainder of the document we present each type of linguistic feature, give examples where available, explain the way the feature can be detected in the document and briefly discuss its efficiency for the detection of racist content.

1: Character

The character level corresponds to single character units, such as punctuation, figures, symbols and runes. Characters can be detected in the original text or isolated after HTML-to-full-text conversion or after tokenisation of the document.

The analysis already conducted shows that the punctuation can be characteristic of a given topic, such as very frequent use of exclamation marks ! in French racist documents, or the use of the dollar symbol ‘$’ in certain words like holocau$t, juif$ in French revisionist and racist documents.

2: N-gram of characters

Detection of linguistically motivated n-grams of characters can be considered as close to the detection of morphemes or to the detection of any type of substrings. Depending on the different types of morphemes, the approaches used can be different. Roughly speaking , two kinds of approaches can be used for the analysis of words and their morphemes in the document: statistical (stemming methods) or simple matching of expected morphemes. Stemming and lemmatisation methods –presented below- can be considered. The suitable method for each language has to be chosen according to the precision and recall it presents and its time-consuming characteristics.

Depending on the possible position of usual morphemes in the words of studied languages, we distinguish roots, prefixes and suffixes.

Prefix Analysis

A prefix is the first element of a word used in inflection or in word derivation. The analysis of English corpora allowed to isolate following prefixes (presented in alphabetical order): 

ab-, ad-, ag-, al- ,anti-

be-

com-, con-, counter-

de-, di-

ex-

im-

mal-, man-

non-

ob-

per-, pre-, pro-

re-

se-, sub-, sup-

ultra-, un- 
 

Tools used on English corpora (Concordance and Wordsmith) have no way of identifying whether or not these strings are being actually used as prefixes, or whether they are simple bigrams in larger morphemes. For this reason, if the initial substring is not large and/or precise enough, the words returned by wildcard searches and which begin with this initial substring can correspond to an erroneous grouping of words. For instance, the result for the initial substring de- is: dead, deal, destroy. Such subspecified initial substrings have to be evaluated before being used as clues.

The contrastive analysis of racist and anti-racist English corpora shows that the number of prefix types per corpus of words beginning with the highlighted prefixes is respectively 8,669 and 8,526. There are slightly more types in the racist corpus but these results are too similar to be useful. When we take a closer look at the frequency of these types within the racist and anti-racist corpora we see that the frequency of tokens is always higher in the anti-racist corpus. In the appendices we present more detailed information about the initial substrings’ distributions.

The analysis of French corpora allowed to isolate prefixes like ex-, im-, non-, as racist ones, but if we examine each website separately, we can observe that only one prefix (ex-) is a permanently over-represented prefix in racist corpora.  

Suffix Analysis

A suffix  is a non-independent element at the end of a word that is used in inflection or word formation.

We have analysed three suffixes in the English corpora: -ist, -ism and –tion. The following table shows that suffixes –ist and –ism are more specific to the racist corpus, while the suffix –tion is more present in anti-racist corpus.

                     
    Category-ist-ism-tion
    Racist2,2311,74511,814
    Anti-Racist4,0983,71915,920

Table 2.1: The distribution of some suffixes in the English language corpora

The next table presents more detailed information about these suffixes: numbers of occurrences (total number of times they are used in the corpus), number of their types (number of different individual words which contain these suffixes) and number of hapaxes (number of words which contain these suffixes and are used only once in the corpus):

                                                           
FeatureUnit   typesRacistAnti-racist
  -istOccurrences2,2314,098
-istTypes225211
-istHapax   126112
-ismOccurrences1,7453,719
-ismTypes211206
-ismHapax   116111
-tionOccurrences11,81415,920
-tionTypes808810
-tionHapax   297300

Table 2.2: Type of suffixations in English language corpora 

The suffix analysis can allow detection of some grammatical categories with stable formal marks, such as adverbs with –ally, -ement endings (respectively in English and French). Adverbs are one important feature of racist discourse. 

In French corpora, we have isolated the following suffixes as being antiracist ones:  

     
  • -tion and   -tions, (like, for example, in association(s)),
  •  
  • -sion (exclusion,   répression),
  •  
  • -isme   (Note that in the English corpus, -ism is racist),
  •  
  • -issement.

 

And “racist” suffixes are :

     
  • -age

and the sometimes pejorative –ards (politicards).  

We consider affixed words are over-represented in French antiracist sub-corpora because French antiracist people use more compound words (historically Latin words) than racist people who give priority to short words. For example, the chart following shows the frequency distribution of short words (three to five-letter-words) in the whole corpus. Each stick represents a set of documents extracted from one specific website (the first stick corresponds to an antiracist website called Droits humains, the second is Hommes et Migrations and so on). The first nine sticks represent antiracist websites and the nine following sticks, racist websites. The chart shows there is a general deficit of short words in antiracist sub-corpus and a surplus in racist one.  
 

Chart 2.1: Frequency distribution of short words (three to five-letter-words) in the French corpus

Root Analysis

Root conveys the core lexical signification of a word or of a family of words with close meaning. Roots can be seen as statistically obtained stems. Detection of roots are worthy of interest when one would like to group inflectional and/or derivational variants of a given word.

For instance, the inflectional family of word Islam in French is: islam and islams. Its derivational family is much larger: islam, islamique, islamiste, islamiser, islamisation, etc. Each of the members of this family can receive one or more inflectional variants. The global weight of an entire morphological family in the document analysed is potentially more important than the weight of one of its members.

In the German corpus keywords (like Abstammung, Ausländer, Fremde, fremd, Front, Jude, Kultur, Mensch, Nation, national, Rasse, System, Volk etc.) are frequently used in compounds. Since such derivations are used in equal measure by racists and anti-racists, a detailed morpheme analysis has to be conducted to determine the differences between language usage by both groups. A simple analysis of affixation prefixes and suffixes is in many cases not sufficient. (For example, the significant morphemes for the word „Jude“ are the following: ab+stamm, amerikanisch, arbeit+s, bank, beruf, blut, europäisch, führ+ung+s, gesinnung+s, terrorist, judäo, stämmig, kontingent, kontroll, loge, muster, reform, ver, zentral+rat+s, west, ober, aus+rott+ung+s+these, mächtig+en, chef, führ+er, minderheit, exklusiv, zigeuner, mit, reform, which appear in compounds like: Abstammungsjuden, amerikanischjüdische, Arbeitsjude, Bankjuden, Berufsjude, Blutsjuden,europäischjüdischen, Führungsjuden, Führungsjudentum, Gesinnungsjuden, gesinnungsjüdische, Judenterroristen, judäoamerikanische, jüdischstämmiger, Kontingentjude, Kontrolljude, Logenjudentum, Musterjuden, reformjüdische, verjudet, Zentralratsjuden, Westjuden, Juden- ausrottungsthese, Judenmächtigen, Judenchef, Judenführers, Minderheitsjuden, jüdisch-exklusive, Zigeunerjuden, Mitjuden, reformjüdische). For further examples of significant racist morphemes. See Appendix 3 for more details.

3: String

A string is any token delimited by blank characters in the text that can then be isolated. Note that the tokenised document contains more isolated strings than a standard full-text document (mainly due to the separation of punctuation). Note also that according to the techniques and tools used during on-line running, strings and substrings can be matched with the same or different modules.

Linguistically and lexically motivated strings or words can correspond to different grammatical categories: nouns, verbs, adjectives, adverbs, determiners, prepositions, pronouns, etc. Each of these categories conveys very specific meaning and plays a specific role in the document, its organisation, argumentation, etc.

The most frequently used features are largely consistent with the distribution of words in standard English usage (e.g. a, of the, to and, for, is that, in, are). By itself, this information does not indicate if these lexical features are predictive of racist discourse. However, their comparison with the anti-racist corpus and standard English usage (British National Corpus), indicates which lexical units are likely to be robust and useful indices of racist content (see appendix 1 for more detail). 

Hence, in English corpora there are 509 words which are used 10% or more frequently (consistently) in racist texts than in anti-racist texts. The linguistic and social categories to which these words belong are consistent with theoretical models of racist speech: there are many nominalizations of ethnic, racial or national groups (whites, Jews, Americans) and, more interestingly, many words which are strongly associated with the type of over-emphasised argumentation discourse (even, course, ever). which is typical of self-conscious, minority belief-holders.  
 
 

Graph 3.1: Consistency/Frequency comparison of lexis in English language racist and anti-racist corpora. 

The French corpora present close examples with words such as rien, assez, grand, and jamais. These two types of words are 30% more likely to be found in English racist texts than in anti-racist texts (see appendices 1 & 2 for more detail).

Determiner Usage

Essentialising and reductive logic is typical of stereotyping language, such as is used heavily in racist discourse. With regard to the use of determiners, this manifests itself in the choice of the definitive singular the in preference to the indeterminate a in conjunction with members of target groups, as the former is more consistent with the argument that there are singular and unchanging characteristics to the subgroup and the degree of difference between members of the subgroup, with regard to their defining (negative) characteristics, is nugatory. Hence the recurrence of expressions such as the jew, which, somewhat counter intuitively, are used to refer to unspecified, undetermined mythic examples of Judaism. This last point is borne out by the fact that the Jew – the singularising determiner – is a more frequently used expression than the Jews. This example of a preference of singular version over plural for identity marking nouns is not the case when reference is being made to the in-group. To emphasize these considerations, we present the following table which contains raw frequencies of occurrences of the and a and their relative distributions (1L means 1 word to the left). 
 

                                 
    CorpusDet.Freq.%
    Racistthe61,076 (6.19%)99.53
    Anti-racistthe57,599 (5.6%)98.31
    Racista19,207 (1.95%) 99.76
    Anti-racista19,907 (1.93%)98.87

Table 3.1 Comparison of word frequency at 1L to the and a in racist and anti-racist English corpora

wh- and qu-words Usage

The wh- English words and the qu- French words, which are broadly equivalent, are characteristic of racist speech.

The wh- English words, such as what, when, where, who and why, can correspond to the interrogative, relative or indirect pronouns. The following figure shows that 58.51% of all wh- words are to be found in the racist corpus. The greatest disparity is in the use of why and what. These words are not often used as interrogatives but as indirect pronouns. Because of the phrase structures typically associated with indirect pronouns, our analysis of the usage of wh- is centred on the collocation of nouns to the left and verbs to the right of each term. This demonstrates prototypical racist use of indirect pronouns. 

Graph 3.2 Overview of consistency of usage of common wh- interrogatives in racist and anti-racist English corpora 

The “over-emphatic presence” of wh- in racism is not sufficiently high to merit criterion status. However, as with many other cases where certain types of speech are over-represented in racist discourse, it allows us to be more confident about the clues which we discern from its pattern of usage within racist texts. We present here the lists of words which appear one place to the left of who (mainly nouns):

    Jew, Gentiles, Non-whites, Parasites, Any, Female, Colored, Nigger, Niggers, Christ, Enemy, Foreigners, Goyim, Homosexuals, Jewess, Mexicans, Millions, Powers, Southerners

and words which appear one place to the right of who (verbs):

    Control, Created, Built, Love, Opposed, Raped, Support, Bear, Enters, Fail, Dwell, Pay, Settled, Wishes, Advocated, Appear, Buy, Conquered, Feed.

A more detailed cases studies of interrogative usage of who, what, when, why is presented in appendix 1.

In French corpora, qu- is highly represented by the conjunction/relative pronoun que, as we can see on the chart following. It shows the frequency distribution of que in the whole corpus. Like previously, each bar represents a set of documents extracted from one specific website. The first nine bars represent antiracist websites and the nine following bars, racist websites. The chart shows there is a very significant surplus of que in seven of the nine racist websites.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Graph 3.3: “que” distribution in the whole French corpus 

The nearby thematic (i.e. the most frequently words in close context) of  que: 

        Distance Corpus  Excerpt  Word 

        61.25  34667  4074   QUE    

      39.24  13039  1573  NOUS    

      16.70  40088  2617  EST    

      15.04  16896  1219  NE      

The thematic research shows relation between words. One can observe that ne comes in 4th position after nous and est. We can consider that que fits in a “ne… que” structure (for instance: “Les initiatives, propositions et conférences internationales ne sont que pertes de temps et tentatives vaines”).  
 
 
 

Graph 3.4 In blue ne ; in red que - que distribution in the whole corpus 
 

Argumentation words and structures; hedging and facticity

Analysis of racist discourse reveals language use that is typical of minority belief-holders with conversionary zeal in that it boasts a disproportionate use of absolute truth claims. Minority social groups and belief holders conceptualise truth as something which has been repressed or ‘concealed’ through socialisation processes and global information control at the hands of their chosen out-group. This paranoiac worldview leads them to return to fundamental principles of truth and falsehood. The tendency to claim ownership of the truth (to arrest the demonisation of their belief community and selves) is evidenced by the disproportionate use of words such as certain, fact, truth, knowledge, etc. (in English) and angeblich, aufdecken, behaupten, Fakten, Tatsachen, wahr, Wahrheit etc. (in German). Examples of noun phrases specific to the racist content in German corpus are: Nationaler Aktivist, Ruhm und Ehr, nationaler Widerstand, nationale Opposition, nationale Kräfte, frei sozial und national, mit kameradschaftlichem Gruß, Tag der nationalen Arbeit, unser Kampf, Deutsche Volksgemeinschaft. 

There are various sociocognitive and psycholinguistic reasons for this: the individuals are reconciling their socially vilified belief system by placing it in a rational context through the use of standard rhetorical and persuasive language use. Because their beliefs are not socially accepted as truth they must explicitly represent their beliefs as such; whereas non-racist discourse participants, when speaking of the same matters, have no requirement to announce the truthfulness of what they are about to say since it is part of a socially sanctioned and validated belief system.

Because of the fact that Racists speak from within a minority belief system, addressing itself to those who hold different beliefs and assumptions about the issue of race politics, there is a greater tendency within racist discourse to hedge, or palliate, statements which the author knows are socially non-normative. Although this seems to contradict the evidence of greater frequency of strong truth claims, the two processes actually complement each other: the expression of doubt is tacit (through use of words such as perhaps, maybe, almost, quite, nearly) and, in fact, many hedge words allow the author to encourage a reader to entertain provisional worldviews (often racist ideas) and explanations of social phenomena which would be easily dismissed if phrased as strong truth claims.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Graph 3.5 Distribution of truth claim lexis in the English language corpus 

The presence and frequency of argumentation lexis in the English language corpus, and the syntactical patterns which form around such words, are represented in appendix 1.

4: Local Grammar and Collocation

Collocation is a restrictive kind of local grammar which allows us to extend our criteria beyond basic word detection. In this instance, where a great deal of the lexis is shared by opposing discourses, it is also highly useful for disambiguating between different types of context-dependent word usage without requiring structural analysis of the texts. Detection of collocations can be done by separate detection of each word and resultant ordering and reconstitution of the collocation, or by detection of the complete collocation. In both cases, inflectional and derivational variants can be taken into account. When performing collocation analysis, one must specify two things: the headword and the distance parameters within which collocated words are to be indexed. These parameters may greatly affect results. Words which appear immediately to the right and left of the keyword are the most instructive; as the parameters expand, the list of collocates becomes more like the tokenised list for the whole corpus (as it becomes a more sizeable fraction of it). The most useful information about racist discourse is to be found in close proximity to the chosen keywords (one word to the left and one word to the right of the keyword). Collocations correspond often to noun phrases, but also to verbal and adjectival phrases, as well as to any neighbourhood of words. One specific type of collocations is proper nouns.

Noun Phrases

The analysis of the frequency and distribution patterns of noun phrases with specific words as white, race, black, Jew etc. but also with some grammatical words as we, our, they, their, etc across the corpus, may be helpful in formulating clues: our civilization, our race, white pride, their beliefs, white american genocide, white flight, white genocide OR suicide, white ghettoisation, white knights.

Examples of noun phrases specific to the racist content in German corpus are: Nationaler Aktivist, Ruhm und Ehre.

Verb phrases

Collocation analysis of verbs enables us to create an other type of paradigm of syntactic structure and lexical company in which racist keywords are typically found. For example, the verbs that racists use in conjunction with the word truth (such as, fact, course, knowledge), are completely different to those used by anti-racist discourse.

More elaborate and complex type of verbal phrase can correspond to the entire sentence like:

    To be born white is a privilege;

    To be born white is an honour and a privilege.

Proper Nouns

Names of authors and racist organisations fall under the category proper nouns. One approach anti-racist organisations adopt in order to fight hate is exposure of hate groups and racist organisations, the objective being to educate and inform people about the activities and beliefs of racists.  This would account for the high frequency of recurring expressions such as authors and racist organisations within the anti-racist corpus. One reason for low frequency in the racist corpus is that in many cases racist authors do not believe they deserve the label ‘racist’ and do not therefore associate themselves with hate groups or racist organisations despite the fact that their views may tally with the views of hate groups. An organisation such as GRECE (in France) which is known to inspire racist ideology is mentioned more in anti-racist websites. In the same way, authors such as Alex Curtis, David Duke, David Irving, Don Black who are either founders of racist organisations or prominent figures within racist organisations are mentioned more in the English language anti-racist corpus. However other authors such as Elisha Strom, Kevin Alfred Strom, Elena Haskins, H. Millard, Per Lennart Aae, Martin Laus, Holger Apfel, Günter Battke, Jürgen Gerg who are not so popular in the public domain have a higher frequency in the racist corpus. Somebody like Guillaume Faye, a theoretician of racism, is mentioned more in racist corpus.

See appendix 1 for further information on collocation in English.  

5: Isotopy

Isotopy is a set of words (or other linguistic features such as morphemes) and/or collocations with strong semantic relationships, which can be considered to be a group of synonyms. This is a complex type of feature: detecting an isotopy depends on separate detection of each element which it contains. The presence of all elements of an isotopy is not compulsory, as each element refers to the overarching isotopy.

Isotopies are built during the linguistic analysis of the corpora (and/or result from adapting existing synonym resources, like WordNet and their European language counterparts). Isotopies correspond often to recurring themes of racist discourse: collective character of the enemy (group, organisation, network), danger, destruction, drugs, etc. Being specific to the racist discourse, they do not always correspond to the normal associations between words. If existing synonym resources are used, they have then to be adapted.

An example of derogatory isotopy (people as animals) in a short French text is done with the terms: “Puces” (i.e. “flea market” with quotation marks) vermines, à 4 pattes and some grammatical constructions (the uncountable plural).  

6: POS tag

Part of Speech (POS) tags reflect the grammatical level of the language: lexical units are replaced by their grammatical label: noun, verb, adjective, adverb, determiner, pronoun, etc. This allows us to analyse on the meta-linguistic level. POS tags are available only on the tagged document. The matching of POS is then performed by the normal string matching module. The performance and robustness of the POS tagging tool have to be tested on the documents we are working with, because these documents often present very special vocabulary and syntactic structures.

Our linguistic studies show that in the English racist corpus there is a greater usage of adjectives in specific syntactic slots (particularly before key nouns). The French corpora seem to present a more frequent usage of verbs in racist corpus, and a more frequent usage of nouns (and adjectives) in anti-racist corpus.

Some syntactical patterns (noun phrases, noun-noun constructs, the type of verbs occurring near nouns, the adjectives used to describe nouns) seem also be interesting in characterising racist discourse.

POS tags can also be analysed as substrings. In this case, we should aim at rough grammatical categories (V-, ADJ, N-, etc.) without consideration of morpho-syntactical features of words. Their detection is then performed with substring detection techniques. 

7: POS tagged word

The combination of lexical and POS clues provides a kind of disambiguation of words which can belong to different POS categories. In French, for instance, the word juif can be either a noun or an adjective. In the following example, this word is characterised by its POS tag: juif/SBC:m:s, where SBC means the noun category and m and s are its morpho-syntactical features: masculine gender and singular number. POS tagging gives then the grammatical characterisation of the word analysed. Detection of tagged words is performed with string or substring matching modules. It can be done only on POS-tagged documents. For this type of linguistic features and clues, the performance and robustness of POS tools have to be tested.

In French racist and revisionist corpora the word juif seems to be used rather as a noun and like an adjective in anti-racist documents. These findings have emerged from the human linguistic analysis and we have doubts as to the precision software tools can offer in the analysis of such fine grammatical features.

The other possible usage of combining POS tags and lexical clues information appears from the preliminary results obtained on English corpora, where this combination seems to offer a very promising perspective. One example is the presence of ADJ + truth or fact or knowledge or the combination of ADJ + ADJ in a document. 

8: Lemmas

When a word form has no more inflectional marks (gender, number, case, tense, mode, etc), it is called a lemma. The lemma corresponds also to the word forms which are used as dictionary entries: singular (nominative, masculine) form for nouns and adjectives, infinitive form for verbs. The matching of lemmas is performed with string or substring matching modules.

Reduction of inflected words to their lemmas allows one to group all the occurrences they present, which has a direct influence on the frequencies of these words. Of course, the differences of tense, number or other inflectional forms can be also considered as meaningful for the characterisation of racist and revisionist content.

The calculation of lemmas is not always a successful operation. Hence we have to compare results one can obtain with POS tagging, lemmatising tools and stemming algorithms. And then choose the best approach for each language processed. 

9: Tagged Lemma

The combination of a lemma and its POS tag allows basically to disambiguate those lemmas which can belong to few grammatical categories. The matching of lemmas and their POS categories is performed with string or substring matching modules. 

10: HTML tags

The HTML tags reflect information about the structure of documents, their layout and the links they have with other documents (hyperlinks). The detection of isolated and meaningful HTML tags can be done with matching modules. String matching allows to detect complete HTML tags (H1, H2, H3, TR, TD, UL, OL, BR, HR), substring matching allows to detect subtags (H-, T-, -L). In a more complex way, one can analyse given HTML tag and its attribute values. For instance, the meaningful colour combination in German racist documents is black, white and red. The same seems to be pertinent for some French racist documents.

If a complete HTML tree is required, one has to use the HTML parsing tool. This concern remains an option because of its time and effort-consuming character.

Hyperlink Analysis

The English corpus shows that the presence and frequency of hyperlinks is much lower in the racist corpus (5,819) than it is in the anti-racist (22,056). English racist web pages are generally minimalist in appearance. Use of dynamic html, style sheets, JavaScript and such dynamic features of web pages are kept to a minimum. Hyperlinks may be more common in the anti-racist corpus as many of the anti-racist pages were found on domains that impose structure on the web pages (e.g. online newspapers and anti-racist organisations) and that take into account accessibility and user-friendliness. They want to make it easy for readers to find or link to other material. As a result there are many menus and hyperlinks to other documents regardless of whether these documents are related or not. Racists on the other hand do not provide links in such abundance and may prefer to keep readers tuned into their own articles. Apparently, there are variations to be observed between languages. Some German racist web sites are of a high technical standard which use multi-media effects (music and video clips). It is not uncommon to find links to anti-racists web sites and they serve the purpose of knowing the adversaries. Anti-racist websites, on the other hand, refrain from linking to racist web sites in order to avoid unwanted advertising of racist material.

First results obtained seem to show the existence of web rings (z.B. Portalseiten), where for the most part racist web pages link to other racist pages and anti-racist web pages link to anti-racist pages. The French corpora and material show that racist web rings present a more complex structure than anti-racist ones.  

11: Sentence

The bulk of linguistic studies are performed on the word level, while a complete meaning can be created on the level of the sentence. To deal with the sentences, one has firstly to detect it with a specific linguistic module. The information about sentences can be used in various ways: detection of co-occurrences, collocations in the same sentence, statistics on the sentence, sentence like a complex lexical collocation. Matching of collocations and co-occurrences in the same sentence is performed with matching modules. Statistical analysis of sentences is performed with counting, averaging and weighting modules.

The paragraph can be considered as basic unit for the POS tagging module, since this module is time-consuming.

When the sentence appears like a complex lexical collocation as:

    I hate niggers.

its detection is done with string matching modules. 

12: Paragraph

The paragraph level is more directly attainable than the sentence level, for instance through the analysis of HTML code of the document. The paragraph level can be taken into account in the same way as the sentence level: detection of co-occurrences, collocations in the same paragraph, statistics on the paragraph.

Detection of paragraph can be done by the analysis of the HTML code (detection of some meaningful HTML tags) or further to the HTML-to-full-text conversion. One can use then matching and statistical modules.

The paragraph can be considered as the basic unit for the POS tagging module, since this module is time-consuming.

First results obtained on the French corpora show that the average number of paragraphs in racist documents is lower (110.78) than in anti-racist (137.7). And the average number of words per paragraph is also lower in racist documents (10.1) than in anti-racist (14.99).

In English corpora, the results are not the same: the average number of words per paragraph in the racist corpus (83.89) is almost twice as big as the anti-racist (48.83). 

13: Document

The document level corresponds to the entire document and all the information it contains. During on-line running, the aim will be to verify if clues of our knowledge bases are present in the documents newly collected on the web, and then to decide if the document contains racist or revisionist content. Linguistic, matching and statistical modules will process the entire document, except, may be, the POS tagging which, being time-consuming, could be applied to a single sentence or paragraph. The other way to exploit information contained in the document is to produce general statistics on the document level: number of words, of characters (bytes), of paragraphs, etc in the document, paragraph, word, etc., their average values.

First results obtained by the analysis of English corpora are presented in the figure 14.1. One can see there the size of the English corpora in terms of the number of documents and words. For the sub-corpora which have a comparable size in terms of the number of words, the number of files may vary considerably. The anti-racist corpus contains one third more files than the racist one. This indicates also that racist documents are longer than anti-racist ones (See Appendix 1 for more detail about the statistical differences in English corpora).

                           
    CategoryNb   DocumentsNb   Words
    Racist4251,009,618
    Anti-racist6881,049,162
    Revisionist2371,228,694
    Anti-revisionist149265,479
    Other2961,076,914

Fig. 14.1 – Overview of English corpus size

In the French corpora, the average number of words in racist documents is lower (1,121) than in the anti-racist documents (2,064). Racists seem then to produce shorter documents. A similar tendency can be observed in the German corpus (racist = 935; anti-racist = 1.230; revisionist = 2.784; anti-revisionist = 1.199).

Hapaxes within the English Racist Corpus

A hapax is a word which has been used only once within a document (or a corpus, or a paragraph, according to the level considered). The number of hapaxes varies according to the linguistic processing performed: a full-text document will contain more hapaxes than a lemmatised or stemmed document.

In the English language corpora, there are 16,958 hapaxes specific to the racist corpus (when compared with the anti-racist corpus) within a total of 37,061 types in the corpus. Most of these words are used too infrequently (16,130 < 5 files). However, 102 hapaxes each appear in more than 10 separate texts and between them comprise 1,376 instances of usable low-level criteria for categorising the documents.

Hapaxes share many of the features commonly discerned from critical readings of racist discourse: fear of the multiplicity of the ethnic out-group (multiply, takeover, teeming); the kind of nominalization associated with mythic stereotyping (Jewess, goy, mestizo); and the attribution of negative and essentialist characteristics to such out-groups (insanity, wickedness, superstition).

At present, the applicability of these unique lexical markers is dependent upon further testing to discern ways of disambiguating their use in racist discourse from that of non-racist discourse (their non-usage in anti-racist discourse having been established).  

14: URL

The URL concerns the location at which a document has been found on the web. The detection and analysis of a URL inside a document is performed with analysis of the HTML structure and hyperlinks and through the matching modules.

The URL presents useful information in that certain domains are known to contain primarily racist material  (e.g. Stormfront.org, sos-racaille.org, aaargh.com). Documents that are localised at these domains can be then considered as racist or revisionist with a very low verification of other clues. In other words, the confidence rating of domain URL clue can be considered as relatively high.

The domain URL can also be used to compute the IP-number of the provider, since some providers clearly specialise in hosting racist and revisionist sites. 

15: Conclusion

The classification of linguistic features which we present in this Deliverable aims at grouping and organising the clues found during the linguistic research of the knowledge base.  It reflects the linguistic levels and units of documents which seem to be common to the three languages. When new types of clues emerge during further research, they will be incorporated into the existing linguistic knowledge base.

During the building of the corpora, we have tried to adopt methods as open and comprehensive as possible (described in Deliverable 1.1). But, on one hand,  the general search engines which were used do not index all the web pages. And, on the other, racist and revisionist content evolves and changes with political, ideo

Tags:



Comments:


1

Posted by Guessedworker on Wed, 20 Sep 2006 18:14 | #

Straight out of the sick philosophy of unique white evil ... only European-native peoples are capable of the dreaded Jewish-liberal sin.

Presumably, lower-order legal folk of the race-traitorous, Hebreic or merely vibrant type will be required, at public expense, to sift through the tens of thousand of instances of the Jewish-liberal sin, and pick out the choicest morsels for repressive measures.

I’d better post the appeal for my defence fund now.


2

Posted by Desmond Jones on Wed, 20 Sep 2006 19:10 | #

Will this be part of the transhumanism computer model? smile


3

Posted by Laban on Wed, 20 Sep 2006 21:24 | #

There seems to be more of this document - any chance of getting the rest up ?


4

Posted by Boris on Thu, 21 Sep 2006 00:34 | #

From now on I will no longer consider myself a separat…, I would instead become pro-separation. I have personally never used the n word as I believe the message can be brought across without ‘names’. BTW whatever happened to words cannot hurt me, but stones and sticks will? Ask the Lebanese if they’d mind being called goyim or would they rather get carpet bombed?


5

Posted by Nick Tamiroff on Thu, 21 Sep 2006 04:32 | #

Crhist,This is scary-Geoge Orwell has to be laughing in his grave.I just printed this crap out,and will try to digest it later ,when I’m totally inebrieated.In the meantime,let me say-Nigger-Nigger -Nigger!!!,Faggot,Faggot ,Faggot!!! Raghead,raghead.,raghead !!! Liberal,Liberal,Liberal!!!Illegal alien,Illegal alien,Illegal alien!! May be soon to the last time I’m allowed to speak such words,as we further fuck up the First Admendment—Thats part of the Constitution of our REPUBLIC {for you shit-heads that think this country was founded as a democracy]


6

Posted by Rnl on Thu, 21 Sep 2006 07:24 | #

First results obtained on the French corpora show that the average number of paragraphs in racist documents is lower (110.78) than in anti-racist (137.7). And the average number of words per paragraph is also lower in racist documents (10.1) than in anti-racist (14.99). In English corpora, the results are not the same: the average number of words per paragraph in the racist corpus (83.89) is almost twice as big as the anti-racist (48.83).

Translation: Our statistics on “racist paragraphing” have turned out to be completely worthless, which is depressing, since we spent so many hours collecting them. A skeptic could have warned us that the chances were remarkably small that the size of “racist paragraphs” would be significantly different from the size of “anti-racist paragraphs,” but we’re not skeptics, just sinister idiots with too much time on our hands.

Adverbs are one important feature of racist discourse.

Which isn’t surprising, since they appear in a large percentage of sentences.

More significantly:

Analysis of racist discourse reveals language use that is typical of minority belief-holders with conversionary zeal in that it boasts a disproportionate use of absolute truth claims. Minority social groups and belief holders conceptualise truth as something which has been repressed or ‘concealed’ through socialisation processes and global information control at the hands of their chosen out-group. This paranoiac worldview leads them to return to fundamental principles of truth and falsehood. The tendency to claim ownership of the truth (to arrest the demonisation of their belief community and selves) is evidenced by the disproportionate use of words such as certain, fact, truth, knowledge, etc.

In other words, these “racists” make regular appeals to evidence. Appeals to evidence are here conceptualized as a “return to fundamental principles of truth and falsehood.” On this point the report’s authors are (to use three markers of “racist discourse” in two words) certainly correct. Racialism does, in fact, often appeal to facts and truth, since racialists are convinced, rightly or wrongly, that facts and truth are on our side.

“It is widely assumed within the mainstream media that there are no socially significant differences among the various races. In fact, that assumption is false, and here is the evidence ...” “It is often stated in the mainstream media that Islam is a religion of peace. In fact, that claim is false, and here is the truth ...” Both of those sentences, and any longer argument that elaborated on them, would rank high in the textual features that the authors have identified as symptoms of “racist discourse.”

The authors of the report are, to use their own language, speaking from the perspective of a dominant discourse. They are attempting to pathologize dissent from this dominant discourse—whose dominance, as they casually note earlier, is often enforced by law—by identifying a distinct form of “racist discourse” which exists in contrast to anti-racist discourse, the privileged discourse in their analysis. That this “racist discourse,” in contrast to anti-racist discourse, regularly makes truth claims becomes evidence of its “racism,” because the privileged discourse, as their analysis has apparently demonstrated, makes such claims less frequently. “Racist discourse” is therefore marked by “a disproportionate use of absolute truth claims”—disproportionate, that is, in comparison with anti-racist discourse, not in comparison with (say) a physics textbook or the Summa Theologica. Thus the fewer truth claims that anti-racists make, the more “disproportionate” every truth claim in a “racist text” becomes. That’s both bizarre and stupid.

Of course any minority discourse attempting to attack a dominant discourse will, if its advocates have confidence in its validity, inevitably speak in exactly the language the authors have identified as evidence of paranoiac “racist” speech.

(Did anyone spot the signs of “racist discourse” in the preceding sentence? There were two in the first three words alone. It must be a real challenge to write anti-racist sentences.)

Because of the fact that Racists speak from within a minority belief system, addressing itself to those who hold different beliefs and assumptions about the issue of race politics, there is a greater tendency within racist discourse to hedge, or palliate, statements which the author knows are socially non-normative.

I’ll take a wild guess that VNN wasn’t in their corpus of WN websites.


7

Posted by Kenelm Digby on Thu, 21 Sep 2006 12:57 | #

Honestly, I coudn’t give a sh*t.
Saying that trying to wade through all that pretentious psycho-babble (note to ‘experts’ here, does all this verbiage actually mean anything?), was a painful experience - rather like being buried alive in a coffin of words.
Does the old ‘Pseuds Corner’ feature in ‘Private Eye’ magazine still run?
Anyhow, let us pay homage to that remarkable man, George Orwell, and collectively kiss our copies of ‘1984’ while saying a little benediction for his soul.
It does make one wonder though, with 7/7, the Madrid bombings and the alleged thwarted Heathrow plot that the EU bureaucracy can fiddle while Rome burns and waste public funds on this guff.
  I sincerely hope that my pieces in various WN fora, (it’s my passion you see), have been taken as examples to work on.It’s an honor that I’ve got some silly little knickers-in-a-twist.Meanwhile I’ll continue to lard my postings with my favorite catch-phrases ‘political class’, ‘massive non-White immigration’, ‘non-White majority’ ‘2040’, etc etc.


8

Posted by James Bowery on Thu, 21 Sep 2006 14:46 | #

The thing that is most frightening about this isn’t the techniacl expertise demonstrated—it isn’t sophisticated—nor even the fact that they are profiling “racists”.  The thing that is disturbing is the use of funding by the European Union to attempt to filter for “racist attitudes”. 

Aside from the base hypocrisy of using profiling to discriminate between “racists” and “anti-racists”, they are saying that “attitudes” which would have characterized the vast majority of people who opposed Hitler during WW II, are legitimately targeted by the government.


9

Posted by Guessedworker on Thu, 21 Sep 2006 15:00 | #

The legitimacy of the government is derived from the consent of the people.  There is nothing legitimate in this Marxist hate campaign against the expressed interests of the majority.

It is this fundamental absence of legitimacy which entitles the present majority to take back its land, its language and its legal rights at any time and by any means, before or after the passage into minority status in its own homelands.


10

Posted by proofreader on Thu, 21 Sep 2006 15:49 | #

The real plan is to monitor private e-mails (presumably of a “racist” nature, i.e.  opposition to the EU regime) with the software they´re developing based on these corpora. Scary, were it not for the obvious fact of the inanity of the work published.


11

Posted by A. Windaus on Sat, 23 Sep 2006 00:19 | #

Thought police… *gags*


12

Posted by Rnl on Sat, 23 Sep 2006 23:13 | #

James Bowery wrote:

Aside from the base hypocrisy of using profiling to discriminate between “racists” and “anti-racists”, they are saying that “attitudes” which would have characterized the vast majority of people who opposed Hitler during WW II, are legitimately targeted by the government.

Churchill himself, the most prominent of the anti-nazis, shared this “racism,” as Auster pointed out several days ago:

In his book Churchill: The Unexpected Hero (Oxford University Press, 2005, p. 233), discussing Churchill’s second government of 1951-55, Paul Addison writes:

He tried in vain to manoeuvre the Cabinet into restricting West Indian immigration. “Keep England White” was a good slogan, he told the Cabinet in January 1955.

This makes me think more highly of Churchill. I had not previously heard that he had done anything to oppose Britain’s postwar immigration disaster. [End Quote]

http://www.amnation.com/vfr/archives/006448.html

The thing that is disturbing is the use of funding by the European Union to attempt to filter for “racist attitudes”.

They’re not _exactly_ filtering for attitudes. They’re identifying the formal features of a special kind of discourse, almost a dialect, which they claim characterizes the writing of Whites who either unwittingly hold or openly express racialist beliefs. They hope to detect “racist attitudes” by detecting the “racist discourse” in which the attitudes are expressed. That’s subtly but significantly different from merely filtering for “racist attitudes.”

The researchers began with a collection of “racist texts” culled from WN websites. Identifying these texts as “racist” was not problematic, nor should we expect that it would be. It’s easy, from their perspective, to identify a “racist” text. You need only look at its location on the Internet and its subject matter. Any policeman equipped with a dictionary could do the same.

The important contribution of the report is the identification of the linguistic features that the “racist texts” they collected share with other texts that are not so easily identified as “racist.” Those distinguishing features (a large number of adverbs, for example) are “clues for the detection of racist content.” As the authors observe, hate-speech laws in Europe encourage some writers to conceal overt racialism for fear of prosecution, and other writers may be genuinely unaware that they are guilty of thought-crimes on racial subjects:

The main body of linguistic work on racist language has concentrated on discourse analysis of majority groups within European countries and the USA with the aim of uncovering tacit or concealed racist attitudes. This has mainly been achieved by means of interviewing members of majority groups and subjecting the resulting text to a discourse analysis ...

[...]

Racism often consists of [note the extraordinary claim in this “consists of”] implicit rhetorical construction (euphemism, antiphrasis) due to prevailing prohibitions of hate speech within the countries hosting the material.

[...]

French racism appears much more heterogeneous than its English counterpart due to being curbed by law. As a result it is conveyed in a wide range of discourse from explicit (NaziLauck, a multilingual site with very English-like content) to implicit (Français d’abord, the Front National site, which uses understatements and an anti-immigrant, racist rhetoric conveyed as if it were mere macro-economic analysis.

All of this constitutes an obstacle if you’re an anti-racist intent on fining or imprisoning your opponents, who may conceal their “racism” or may even be unaware of its existence. The authors of the report have, they believe, made a preliminary attempt at overcoming the problem. There are, they have concluded, certain linguistic features that “racist texts” share in common. There is a distinct “racist discourse” that differs formally—not simply as a matter of content—from other discourses. A “racist text” can therefore be dispassionately identified on the basis of its formal linguistic features. You don’t need to analyze its content for “racism”; the “racism” is embedded in the language itself.

Leaving aside the totalitarian impulse percolating throughout this report, the authors’ method is seriously flawed, as I noted above. It’s foolish to suggest that a distinct “racist discourse” can be identified by its departures from the linguistic features that characterize an anti-racist corpus culled from anti-racist websites. But it’s quite possible that a police officer or a judge wouldn’t agree. To them this inept research may very well look like science. We live in a strange world, and we can’t rely on the common sense of police officers staffing some anti-hate task force in Europe. There are many obscure terms (e.g. isotopy) in this report that a policeman or a judge wouldn’t understand. For many people that implies “science”—the presence of rare and complex words that you don’t understand but presume that experts do understand. The history of psychiatric expert testimony in criminal trials offers a relevant analogy.

Would a French judge be willing to qualify one of the authors of this report as an expert witness if the National Front were taken to court for promoting “racism” on its various websites? He might. And if the authors spend the next ten years compiling more data and publishing elaborate studies of “racist discourse” in academic journals, they will almost by definition become experts on the subject, even though the subject (a linguistically distinct “racist discourse”) doesn’t actually exist. All a judge would need to know is that a text discusses race, which he can figure out for himself, and that a distinguished expert on the language of hate has scientifically determined that the text exhibits formal features typical of a category of hateful texts called “racist discourse.” 

Therein lies, it seems to me, the real purpose of this report. I don’t think the authors, despite their avowed objective, are after some superbot that can scan the web for “racism” without human intervention. They want, rather, to create a body of supposedly scientific research on “racism” that (a) helps in the project of criminalizing dissent on racial matters, especially opposition to non-White immigration; (b) helps effect this criminalization of dissent without having to respond to the content of the dissent, specifically whether its truth claims are true or false. “Truth is no defense” is an explicit axiom of hate-speech prosecution and it is an implicit axiom in this report.


13

Posted by Fred Scrooby on Sat, 23 Sep 2006 23:40 | #

Excellent analysis by Rnl!


14

Posted by Rnl on Sun, 24 Sep 2006 05:31 | #

proofreader wrote:

The real plan is to monitor private e-mails (presumably of a “racist” nature, i.e.  opposition to the EU regime) with the software they’re developing based on these corpora. Scary, were it not for the obvious fact of the inanity of the work published.

Unless the authors of the report are remarkably dull, which I suppose is possible, they can’t seriously be planning to turn their research loose on “racist” e-mail communication. If they wanted to detect racialist _content_ in e-mail or other electronic texts, they would concentrate on nouns. The task would not be complex. Does a text contain certain keywords like race, immigration, nationalism, black, white, IQ, crime, Muslim, mestizo, etc? If it does, the chances are good that the writer is expressing impermissible racialist beliefs. 

Most of the data they have accumulated would be useless for that purpose. Searching for a large cohort of adverbs or the “interrogative usage of who” would not distinguish racialist texts from normal discourse. They claim it would, but it wouldn’t. And there is no possiblity whatever that they could distinguish the “implicit rhetorical construction[s] (euphemism, antiphrasis)” in racialist e-mail from implicit rhetorical constructions in non-racialist e-mail. Any software designed on the data in their report would be just as likely to criminalize participants in a bridge-players listserv.

They don’t distinguish racialist texts by analyzing their content because they want to promote the idea that something called “racist discourse” can be identified on the basis of formal features alone. That’s a sign of people who fear evidence, so they prefer not to discuss it.


15

Posted by James Bowery on Sun, 24 Sep 2006 17:23 | #

Rnl writes: The researchers began with a collection of “racist texts” culled from WN websites.

So far so good.

Identifying these texts as “racist” was not problematic, nor should we expect that it would be.

It was problematic only in that the word “racist” has multiple senses—ambiguity that is profitably exploited in “anti-racist” propaganda.  In other words, if you research race differences of any kind, you can easily be tarred as someone who would sail to Africa, chase down a hapless native, throw a net over him, tie him up, shackle him, throw him in the hold of a diseased slave ship and whip him into submission if he ever even showed the slightest inclination to resist your will.

Both the researcher and the slaver are “racist”.

It’s easy, from their perspective, to identify a “racist” text.

That’s more like it. 

They can’t define “racist”, otherwise they might not be able to “justify” chasing down, throwing a net over, shackling, etc. a researcher of race differences.  But they know a “racist” when they see one.

You need only look at its location on the Internet

Again… good so far as it goes.

and its subject matter.

They intend to detect new locations that are “racist”.  This is the entire point of their having focused on discriminating between “anti-racist” and “racist” texts in their training corpora.  Both have the same subject matter.

Any policeman equipped with a dictionary could do the same.

A policeman equipped with a dictionary would have the following to work from (Wordnet):

rac·ism (r?‘s?z’?m) pronunciation
n.

  1. The belief that race accounts for differences in human character or ability and that a particular race is superior to others.
  2. Discrimination or prejudice based on race.

So the policeman sees some guy researching race differences.  He cannot conclude that the researcher is guilty of “racism” using sense 1 but what about sense 2?  He looks up “discrimination”:

dis·crim·i·na·tion (d?-skr?m’?-n?‘sh?n) pronunciation
n.

  1. The act of discriminating.
  2. The ability or power to see or make fine distinctions; discernment.
  3. Treatment or consideration based on class or category rather than individual merit; partiality or prejudice: racial discrimination; discrimination against foreigners.


Well our policeman looking at sense 1 sees a circular definition which might lead him to look up the root word “discriminate” but he doesn’t need to because sense 2 provides him with the grounds to arrest his suspect.


16

Posted by James Bowery on Sun, 24 Sep 2006 18:01 | #

Leaving aside the totalitarian impulse percolating throughout this report, the authors’ method is seriously flawed, as I noted above. It’s foolish to suggest that a distinct “racist discourse” can be identified by its departures from the linguistic features that characterize an anti-racist corpus culled from anti-racist websites.

Don’t conflate the burden of proof required to convict with the more nuanced features used by everyone to make practical discriminations, day to day, hour to hour, minute to minute and second to second.  This is something neurons do.

But it’s quite possible that a police officer or a judge wouldn’t agree.

A judge needn’t be so ridiculous (although the history of the Judiciary here clearly shows they go above and beyond the call of duty striving to best their peers for title of most ridiculous) in order to wreak terrible havoc using this sort of filter.

Basically, all he has to do is approve the means by which law enforcement officers and prosecutors find suspects which they then bring before him.

For example, even though racial profiling would render much police work far more effective, courts throw out most cases where racial profiling was used to bring the suspect before the bar.

The argument here is not that profiling fails to establish guilt beyond a reasonable doubt, which is certainly true but that profiling itself is an illegitimate form of perception (even though all perception is profiling at some level).

All the courts have to do is say that it is legitimate to use this kind of profiling (not calling it “profiling” of course) to discriminate (not calling it “discriminate” of course) “racists” (not defining exactly the sense of “racist” of course) when dredging for suspects.  Once the suspects have been identified, the system can leave it up to the mushy “I know it when I see it and truth is no defense anyway.” mentality of the politicos populating the law enforcement and judiciary to do the rest of the dirty work.


17

Posted by Rnl on Mon, 25 Sep 2006 05:59 | #

I probably shouldn’t waste further time on this report, but I’m impressed by its sinister stupidity.

Since racist and anti-racist materials share some similarities (for example certain keywords), we have to analyse not only a racist corpus, but also an anti-racist one and, certainly, general language corpus and then to contrast linguistic features obtained from these different corpora;

Now they don’t in fact analyze a “general language corpus.” Instead they contrast their “racist corpus” with an “anti-racist corpus.” On the basis of this contrast they describe the linguistic features that allegedly characterize what they call “racist discourse.” They don’t contrast their “racist corpus” with a collection of science-fiction novels or a corpus culled from discussions of Christian theology. Their “racist discourse” exists as a distinct discourse only insofar as their web-based “racist corpus” differs linguistically from their web-based “anti-racist corpus.”

They have a practical reason for this choice, namely that “racist and anti-racist materials share some similarities (for example certain keywords).” They mean that racialists and anti-racialists are likely to discuss similar subjects. Filtering only for racialist content could therefore also detect anti-racialist texts. A keyword like “immigration” could detect both those who strongly approve of non-White immigration and those who strongly disapprove. So they must, if they hope to detect racialist content while avoiding anti-racialist content, devise some system—a system not focused directly on content (e.g. on nouns from the semantic field “race”)—to distinguish good discussions of racial matters (“anti-racism”) from bad discussions of racial matters (“racism”). 

Although it is highly unlikely that a detection system based on the “clues for the detection of racist content” they have assembled (e.g. an abundance of adverbs) would work, we can at least see its practical purpose from their Stalinist perspective. But their method is moronically wrong if they want, as they say they do, to describe a linguistically distinct “racist discourse.” It would be convincing only if Jehovah or Odin descended to earth and officially declared the linguistic practices of anti-racist websites to be representative of normal language use. Absent some authoritative declaration to that effect, their description of the linguistic features of “racist discourse” is worthless.

 
Names of authors and racist organisations fall under the category proper nouns. One approach anti-racist organisations adopt in order to fight hate is exposure of hate groups and racist organisations, the objective being to educate and inform people about the activities and beliefs of racists. This would account for the high frequency of recurring expressions such as authors and racist organisations within the anti-racist corpus.

Anti-racialist websites list names and addresses of their opponents in order to encourage violence against them. The authors of this report are surely aware of that fact. They chose to lie about it.

in many cases racist authors do not believe they deserve the label ‘racist’ ...

Which should require, if the Stalinists responsible for this report had any intellectual integrity, defining “racism” and “racist,” since, as they have just acknowledged, there is a dispute about the meaning of the terms. 

There are three categories of “racist” in their report: (i) avowed “racists”; (ii) secret “racists” who unsuccessfully try to conceal their “racism”; (iii) unwitting “racists” who don’t believe they are “racists” but really are. Evidently the writings of all three have been tossed indiscriminately into the “racist corpus,” with no attempt by the report’s authors to describe their common beliefs.


18

Posted by Rnl on Wed, 31 Jan 2007 16:38 | #

Dinesh the Dhimmi

By Serge Trifkovic
ChroniclesMagazine.org | January 26, 2007

[...]

D’Souza uses “Islamophobia” with the implicit assumption that the term’s meaning is well familiar to his readers. For the uninitiated it is nevertheless necessary to spell out its formal, legally tested definition, however. It is provided by the European Monitoring Centre on Racism and Xenophobia (EUMC), a lavishly-funded organ of the European Union. Based in Vienna, this body diligently tracks the instances of “Islamophobia” all over the Old Continent and summarizes them in its reports. The Monitoring Center’s definition of Islamophobia includes eight salient features:

1. Islam is seen as a monolithic bloc, static and unresponsive to change. 2. Islam is seen as separate and “other.” 3. Islam is seen as inferior to the West, barbaric, irrational, primitive and sexist. 4. Islam is seen as violent, aggressive, supportive of terrorism and engaged in a clash of civilizations. 5. Islam is seen as a political ideology. 6. Criticisms made of the West by Islam are rejected out of hand. 7. Hostility towards Islam is used to justify discriminatory practices towards Muslims and exclusion of Muslims from mainstream society. 8. Anti-Muslim hostility is seen as natural or normal.

This definition is obviously intended to preclude any possibility of meaningful discussion of Islam. The implication that Islamophobia thus defined demands legal sanction is a regular feature of the Race Relations Industry output. It also routinely refers to “institutional Islamophobia” as an inherent social and cultural sickness of most Western societies that needs to be rooted out by education, re-education, and legislation. In reality, of course, all eight proscribed statements are to some extent true.

http://frontpagemagazine.com/Articles/ReadArticle.asp?ID=26585


19

Posted by Elena Haskins on Wed, 31 May 2023 02:36 | #

Thank you so much for this article re: the Princip Project.

I have referred various persons to this article so they can understand the type of scrutiny White “Gentile” Racialists endure.

All Best,
Elena Haskins



Post a comment:


Name: (required)

Email: (required but not displayed)

URL: (optional)

Note: You should copy your comment to the clipboard or paste it somewhere before submitting it, so that it will not be lost if the session times out.

Remember me


Next entry: An exercise in guilt by association
Previous entry: Nejad on the bomb, the pope, revisionism and the Palis.

image of the day

Existential Issues

DNA Nations

Categories

Contributors

Each author's name links to a list of all articles posted by the writer.

Links

Endorsement not implied.

Immigration

Islamist Threat

Anti-white Media Networks

Audio/Video

Crime

Economics

Education

General

Historical Re-Evaluation

Controlled Opposition

Nationalist Political Parties

Science

Europeans in Africa

Of Note

Comments

Guessedworker commented in entry 'Soren Renner Is Dead' on Thu, 28 Mar 2024 23:47. (View)

Thorn commented in entry 'Soren Renner Is Dead' on Thu, 28 Mar 2024 23:15. (View)

Thorn commented in entry 'Soren Renner Is Dead' on Thu, 28 Mar 2024 22:48. (View)

Thorn commented in entry 'Moscow's Bataclan' on Thu, 28 Mar 2024 22:02. (View)

Guessedworker commented in entry 'Soren Renner Is Dead' on Thu, 28 Mar 2024 16:55. (View)

Guessedworker commented in entry 'Moscow's Bataclan' on Thu, 28 Mar 2024 16:38. (View)

Thorn commented in entry 'Moscow's Bataclan' on Thu, 28 Mar 2024 14:36. (View)

Guessedworker commented in entry 'Moscow's Bataclan' on Thu, 28 Mar 2024 12:50. (View)

Thorn commented in entry 'Moscow's Bataclan' on Thu, 28 Mar 2024 10:26. (View)

Al Ross commented in entry 'Moscow's Bataclan' on Thu, 28 Mar 2024 05:37. (View)

Thorn commented in entry 'Moscow's Bataclan' on Tue, 26 Mar 2024 15:07. (View)

Thorn commented in entry 'Moscow's Bataclan' on Tue, 26 Mar 2024 11:00. (View)

Al Ross commented in entry 'Moscow's Bataclan' on Tue, 26 Mar 2024 05:02. (View)

Thorn commented in entry 'Moscow's Bataclan' on Mon, 25 Mar 2024 11:39. (View)

Al Ross commented in entry 'Out of foundation and into the mind-body problem, part four' on Mon, 25 Mar 2024 09:56. (View)

Al Ross commented in entry 'Moscow's Bataclan' on Mon, 25 Mar 2024 07:51. (View)

Al Ross commented in entry 'Moscow's Bataclan' on Mon, 25 Mar 2024 07:46. (View)

Al Ross commented in entry 'Moscow's Bataclan' on Mon, 25 Mar 2024 07:41. (View)

Thorn commented in entry 'Moscow's Bataclan' on Sun, 24 Mar 2024 12:25. (View)

Guessedworker commented in entry 'Moscow's Bataclan' on Sun, 24 Mar 2024 00:42. (View)

Thorn commented in entry 'Moscow's Bataclan' on Sat, 23 Mar 2024 22:01. (View)

Thorn commented in entry 'Moscow's Bataclan' on Sat, 23 Mar 2024 21:20. (View)

Guessedworker commented in entry 'Moscow's Bataclan' on Sat, 23 Mar 2024 20:51. (View)

Guessedworker commented in entry 'Moscow's Bataclan' on Sat, 23 Mar 2024 20:45. (View)

Thorn commented in entry 'Moscow's Bataclan' on Sat, 23 Mar 2024 17:26. (View)

Manc commented in entry 'Moscow's Bataclan' on Sat, 23 Mar 2024 15:56. (View)

Thorn commented in entry 'Moscow's Bataclan' on Sat, 23 Mar 2024 14:55. (View)

Guessedworker commented in entry 'Moscow's Bataclan' on Sat, 23 Mar 2024 14:07. (View)

Thorn commented in entry 'Moscow's Bataclan' on Sat, 23 Mar 2024 13:12. (View)

Thorn commented in entry 'Moscow's Bataclan' on Sat, 23 Mar 2024 12:51. (View)

Thorn commented in entry 'Out of foundation and into the mind-body problem, part four' on Sat, 23 Mar 2024 12:38. (View)

Guessedworker commented in entry 'Moscow's Bataclan' on Sat, 23 Mar 2024 10:01. (View)

Al Ross commented in entry 'Out of foundation and into the mind-body problem, part four' on Sat, 23 Mar 2024 05:13. (View)

Thorn commented in entry 'Out of foundation and into the mind-body problem, part four' on Fri, 22 Mar 2024 23:51. (View)

Thorn commented in entry 'Out of foundation and into the mind-body problem, part four' on Thu, 21 Mar 2024 11:14. (View)

affection-tone