Google Weighs Into Dangerous Turf—Similar to Bowery’s Laboratory of the States

Posted by James Bowery on Sunday, 18 March 2007 22:33.

To get an idea of the dangerous turf Google has weighed into, watch this video by Swedish professor Hans Rosling at TED 2006 (or here)

I wrote a primitive version of such a site several years ago which I called Laboratory of the States since the goal was to gather lots of demographic variables by State and present ecological correlations.

Shortly thereafter, a site called Nation Master cropped up, with a bit flashier and simpler user interface, but focused on CIA World Fact Book data, rather than the States of the US.  (The same folks later did State Master using similar UI technology.)

Finally, Google tested Gapminder with an even spiffier and simpler UI—again focusing on by Nation correlations.

Aside from the usual complaints about “The Ecological Fallacy” (a fallacy that cuts both ways BTW) there are two big pitfalls for this stuff:

  1. Dealing with missing data.
  2. Estimating statistical significance.

What I did about missing data was simply eliminate any data points where data was missing from one or both of the variables being correlated.  This reduces the sample size, hence statistical significance, but it bypasses arguments over what sort of missing data should be used.  The Netflix Prize is coming up with really good algorithms to compute missing data efficiently and accurately so maybe there is hope for something more effective here.

Statistical significance is more difficult to deal with.  Usually one must look at tables for statistical significance of correlations under the assumption that the variables each follow a normal distribution.  Unfortunately, many variables follow polynomial (like squared) or exponential distributions, so you have to do things like take the sqrt or log of one or both of the variables to try to normalize them.  However, when you are looking for correlations, sometimes it its the relationship that is polynomial or exponential—in which case you can apply sqrt or log to get the maximum correlation coefficient at the sacrifice of normality of one or both of the variables.  Unfortunately, there is no simple arithmetic formula for calculating the significance level of a correlation given a non-normal distribution—you can’t just plug in the skewness, kurtosis, etc. as well as sample size and correlation coefficient, and get out a valid statistical significance.  Therefore it is hard to make good statements about many very important correlations without watering them down to meaninglessness.

Also, a complaint about the “simple” user interfaces:

Some of the worst reporting from news media comes when they refuse to report statistics in terms remotely related to anything meaningful—for example you will frequently hear statements to the effect that “California has the most orange trees in the nation.” or some such.  Such statistics are nonsense for the purposes of correlation studies since the size of the ecology (California state) is all you are really measuring with such statements.  You have to divide by the population or divide by the total GDP or something to rationalize the ecology against other ecologies.

In Laboratory of the States, I did this with all my variables but I also left the raw variables around and allowed people to do arithmetic on them—like dividing them—to get their own rational comparisons if for some reason my choices were not adequate.  This problem isn’t as bad with Gapminder as it is with Nation Master and State Master—but Gapminder has vastly fewer variables.

Also, doing addition and subtraction are valuable as well.  For instance, even though geographic distribution within a State ecology is broken down as rural, suburban and inner city, I can synthesize non-rural population with the simple expression:

InnerCityPercapita1990+SuburbanPercapita1990

You can’t do this with the other systems and this really bothers me because there are many many cases where you want to synthesize some sort of demography they don’t compute for you already.

Finally, the reason I started into this project circa 2002 was because, for several years, I had been trying to figure out something to do about the very lame work going on with autism epidemiology, so I started where many epidemiological studies start:  by gathering ecological correlations.  My goal was to come up with some rank order of ecological correlations that would let me test various hypotheses of autism causation using quantitative comparison (sometimes called “strong inference”).  Yes, such ecological correlations are very preliminary but they do justify further investment if you find your predicted correlations coming out on top.  Well, I was able to come up with my rank ordered list of correlations with autism but when I looked at State Master, they seemed willing to present rank ordered correlations with everything but autism!  Seems they’re scared of that variable for some reason.  Or maybe they just forgot to compute the correlations for it.  This sort of selective presentation of correlations—deliberate or accidental—is very dangerous for really obvious reasons.

PS: I probably have to say this because some of the brain-dead folks from GNXP will be hanging around ready to start screaming “data dredging” the way they always do—so here goes:

“Data dredging” is the fallacy that presumes statistical significance of correlations remains unchanged when one looks at a list of correlations with no preconceived notions of what correlations will come out on top.  In other words—you aren’t predicting what will come out on top—or even near the top—when you are “data dredging” (aka “data mining” when that phrase is used in its purgative sense).  Of course, the brain-dead idiots will accuse you of doing this and in the next sentence accuse you of being “prejudiced” because you were looking for the correlation that came out on top before you ran the rank ordering.  This idiocy renders “questionable” their claims that the “cognitive elite” can wage war on the “xenophobes” and win.

Tags:



Comments:


1

Posted by gnxp stinks on Mon, 19 Mar 2007 16:00 | #

James, speaking of the “brain-dead” at GNXP, here is an old GC comment addressed to Majority Rights, where “secessionist” (i.e., separatist) ideas are labeled as “bizarre, treasonous…fantasies.”

Says it all, doesn’t it?  We just MUST live with them.

Comment:

“To all Majority Not-too-bright trolls:

Don’t try to post here. Remember: the first step to carving out your ivory state is to stop talking to the “darkies”. The politics of exclusion is a bitch, ain’t it?

...larger observation: I find it alternately amusing and annoying that so many of these guys try posting here…I suppose it’s because they’d rather attempt to argue with the center-right h-bd realist “darkies” here than the predominantly white leftists at, say, Crooked Timber.

Which is, ironically, yet another point in favor of the idea that ideological proximity often trumps ethnic proximity.

I’m constantly amazed by it. It’s too much of a pain for these guys to even try to engage in internet flamewars and/or rational discussion with the lefty whites they want to proselytize. So instead they come here, thinking that the fact that we accept IQ implies that we will cheer on—or at least treat with respect—their bizarre, treasonous, secessionist fantasies.

Ironically, we may have inadvertently encouraged this…by being more tolerant of this sort of thing than their nominal coracialists at (say) Crooked Timber, where this behavior would have earned a banning in short order. But no longer.”


2

Posted by Guessedworker on Tue, 20 Mar 2007 17:19 | #

Well it’s always good to hit GC’s nerve.

I enjoyed Hans Rosling’s presentation.  I was struck, however, by how little he offered in the way of context for his generally positive data sets.  Why is advancement in health and wealth possible at all?  Where does it come from, actually?  Free trade?  Small families?  These seem more collorates than causalities.  What underlying force drives these processes?  What, if it was removed, would bring it all to a halt?


3

Posted by gnxp stinks on Tue, 20 Mar 2007 17:47 | #

“Well it’s always good to hit GC’s nerve.”

Let’s be fair, GW.  GC has given us a phrase which can well sum up our overall realistic strategy:

“politics of exclusion”

We must give credit where credit is due.  However, the types of “exclusion” some may have in mind will differ from gnxp’s own posting/censoring policies.


4

Posted by A. Windaus on Fri, 23 Mar 2007 02:40 | #

Missing data is always a problem, when ever I come across this I just take the average of the two data samples on either side of the missing data and assume that the trend continues in the same manner, i.e. if the year 2005 data is missing, I would use (2004+2006)/2.

There is no other real way to get around it unless you can find that data.



Post a comment:


Name: (required)

Email: (required but not displayed)

URL: (optional)

Note: You should copy your comment to the clipboard or paste it somewhere before submitting it, so that it will not be lost if the session times out.

Remember me


Next entry: Genetic Similarities Within and Between Human Populations
Previous entry: Majority Report For 20070317

image of the day

Existential Issues

DNA Nations

Categories

Contributors

Each author's name links to a list of all articles posted by the writer.

Links

Endorsement not implied.

Immigration

Islamist Threat

Anti-white Media Networks

Audio/Video

Crime

Economics

Education

General

Historical Re-Evaluation

Controlled Opposition

Nationalist Political Parties

Science

Europeans in Africa

Of Note

Comments

Al Ross commented in entry 'Soren Renner Is Dead' on Thu, 02 May 2024 04:26. (View)

Al Ross commented in entry 'Soren Renner Is Dead' on Thu, 02 May 2024 03:35. (View)

Al Ross commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Thu, 02 May 2024 03:24. (View)

Al Ross commented in entry 'Soren Renner Is Dead' on Thu, 02 May 2024 03:12. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Wed, 01 May 2024 11:32. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Tue, 30 Apr 2024 23:28. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Sun, 28 Apr 2024 23:01. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Sun, 28 Apr 2024 17:05. (View)

Guessedworker commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Sun, 28 Apr 2024 16:06. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Sun, 28 Apr 2024 12:50. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Sun, 28 Apr 2024 11:07. (View)

Landon commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Sun, 28 Apr 2024 04:48. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Sat, 27 Apr 2024 10:45. (View)

Guessedworker commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Fri, 26 Apr 2024 23:11. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Fri, 26 Apr 2024 19:50. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Fri, 26 Apr 2024 19:14. (View)

Guessedworker commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Fri, 26 Apr 2024 18:05. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Fri, 26 Apr 2024 13:43. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Fri, 26 Apr 2024 12:54. (View)

Guessedworker commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Fri, 26 Apr 2024 12:03. (View)

Guessedworker commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Fri, 26 Apr 2024 11:44. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Fri, 26 Apr 2024 11:26. (View)

Guessedworker commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Fri, 26 Apr 2024 07:26. (View)

Landon commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Thu, 25 Apr 2024 23:36. (View)

Guessedworker commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Thu, 25 Apr 2024 19:58. (View)

Guessedworker commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Thu, 25 Apr 2024 19:46. (View)

Thorn commented in entry 'Soren Renner Is Dead' on Thu, 25 Apr 2024 15:19. (View)

James Marr commented in entry 'Soren Renner Is Dead' on Thu, 25 Apr 2024 11:53. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Thu, 25 Apr 2024 11:26. (View)

Guessedworker commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Thu, 25 Apr 2024 06:57. (View)

Landon commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Thu, 25 Apr 2024 00:50. (View)

Thorn commented in entry 'Soren Renner Is Dead' on Wed, 24 Apr 2024 22:36. (View)

Thorn commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Wed, 24 Apr 2024 18:51. (View)

James Marr commented in entry 'Soren Renner Is Dead' on Wed, 24 Apr 2024 14:20. (View)

Guessedworker commented in entry 'Ukraine, Israel, Taiwan … defend or desert' on Wed, 24 Apr 2024 12:18. (View)

affection-tone