May 12, 2005

The law of small numbers

Filed under: Uncategorized, social study — @ 4:44 pm

Anybody having some knowledge in statistics knows “the law of large numbers”. It states that as the number of observations increases, the sample mean will approach the population mean. It is arguably the foundation of modern statistical inference. So what the heck is the “law of small numbers”?

Let’s take an example. Considering a random sequence of coin tosses, common people may think that the sequence of “H-T-T-H-T-H” is more likely to happen than “H-H-H-H-H-H” because the former one looks more random than the later (in fact, they have the same joint probability). Therefore, even some erudite students believe that given that a series of heads have occurred, the next one is more likely to be a tail. That is, several heads in a line make the next toss almost bound to a tail. Unfortunately, this is the famous “gambler’s fallacy”.

People always think “randomness” is close to the meaning of “fairness”. In other word, chance is more like a “self-correcting” mechanism. There seems to be an “equilibrium” over time. In the long run, randomness is sort of “fairness” at the probability level. But that is an overall assessment. For a random process with no inherent dependency such as coin tossing, there are no such things as “fairness or equilibrium”. The abnormal pattern observed can’t be corrected (and often can’t be replicated) , but instead diluted over the time.

Some may resort to the Bayesian type of inference in the coin tossing case. The root of Bayesian is conditional probability. However, the probability of a tail given all previous tosses is the same as the probability of a tail without any previous tosses. The inter-independence among coin tosses renders this type of Bayesian inference of no use in this case (The correct way of applying Bayesian method is in the postscript).

Furthermore, if you think that a sequence of Bernoulli trials such as coin tosses is a binomial trial (as many people have indeed argued in this way), you unintentionally impose a condition on the coin tossing experiment– there is an end of the experiment. That is, there are only n tosses. Then the sample size problem kicks in (the more flaws in treating random process as binomial distribution are detailed in postscript).

People have a misconception that a small random sample should be representative to the total population. They should have the same characteristics. Or, in the coin tossing example, the local behavior of a random process should be pretty much similar to the overall behavior. This is the fallacy of the “law of small numbers”.

Ignoring the role of sample size yields the “gambler’s fallacy” which seems trivial. Misunderstanding the assumptions of these basic statistical concepts is serious in research. Unfortunately, even trained researchers are subject to this bias. For example, in meta-analysis reports, it is very common to see that small sample size studies have the most varied results, while studies with large sample size are more consistent (thus they are weighted more in the pooled analysis). However, they all get published in pretty good journals.

The conclusion is that we should always remain skeptical to small sample size studies. Well then, are large sample size studies always good?

=============

Postscript:

=============

Some readers gave many insightful comments. Wasguru pointed out that the flaw in treating a random process such as coin tossing as a binomial distribution is that binomial considers only the number of heads in N trials, not the order of sequence. It lost a great deal of information.

Furthermore, Wasguru also pointed out that the whole argument is based on a basic assumption: the coin is fair (or p=0.5). “If that assumption is subject to test, Bayesian School comes in. In fact, if see six heads in a row, I’ll bet the next one be a head again, not the opposite”. I agree that the six heads in a row may indeed suggest that the assumed probability of 0.5 is violated, or the independence between tossing is violated.

Given six heads in a row,when you view it as binomial distribution retrospectively, it is a small probability. “If you observed something with small probability, it won’t be corrected later,…you go back and modify your assumption, that’s what should happen”( 008). This is similar to the argument by Wasguru.

Enlighten gave another interesting observation: “Although the difference between the observed probability and the expected probability will get closer to 0 if you have more trials, the difference between the NUMBERS of heads and tails actually tends to increase”. The absolute number of heads (or tails) is the denominator of probability. When taking into account of numerator, the total N, the difference won’t affect much to the percent of head, or probability based on the frequentist point of view.

On the other hand, “a gambler will more likely to lose all his money the more he plays, even if it’s a fair coin”(Enlighten). Or “in gambling, the random walk has an absorbing barrier” (Wasguru). This is certainly true due to the finite nature of gambler’s resource (time and money). There is a stopping point for any game player. If the gambler lost all of his money, or stops at an unlucky point (dies unexpectedly), he may lose more instead of winning anything.

For an ideal coin tossing game (fair and independent), any previous lost is sunk cost. If it’s gone, it is lost forever. Or maybe, as 008 asserted: “there is no “fair” game”. Then one should always apply Bayesian’s rule to test the fairness of history and to predict the future.

Unfortunately, human beings always resort to intuitive thinking instead of careful reasoning, a psychological misconception. They expect the process “corrects” itself, but are unwilling to change their assumptions.

Tracing smartness in genes

Filed under: Uncategorized — @ 1:26 am

It is fashionable to attribute everything to genes, or sometimes more vaguely, biological factors. For example, your insatiated desire to surf the internet is determined by your genes—the same circuit as that of substance addition.When it comes to human intelligence, everybody can testify the genetic root of intelligence based on his/her own family history.

Yes, as large as 50% of human intelligence may be genetic determined. However, the fact that intelligence has a large genetic component within groups may have nothing to do with the racial or ethnical difference in human intelligence, as the group difference can be due to the other 50% of influence, the non-genetic, or culture factors.

Because of the scarcity of family studies among Blacks, it is uncertain whether the heritability in Blacks is the same as that in Whites. The twin studies, or sometime less favorable sibling studies, track the life history of either monozygotic or dizygotic twins, or siblings, reared together or apart. It can directly examine gene-environmental interaction in relation to human intelligence (hereafter I refer to IQ).Some suggested that the heritability in Blacks might be indeed smaller than that in Whites. The proportion of shared components between twins and siblings was different between Blacks and Whites. However, the shared components also include family influence. Mothers are more influential to the child’s development than other family members. Another interesting finding was that the nonshared proportion decreased to almost zero after age 30. Some declared that the racial difference of IQ in adults could be attributed to genetics. IF this racial difference in heritability is true, it may also imply that environmental factors such as education level within Blacks are more heterogeneous, while Whites have similar education attainment, thus the variation of intelligence is more likely due to genetic factors.

Transracial adoption study is the natural experiment of cross-fostering design in which people adopt children from different racial background. Biological children have no genetic relationship with adopted children. The environmental influences between them can also be assumed the same. The Minnesota Transracial Adoption Study included 265 children from White upper middle families. At age 7, the average IQ for nonadopted White children (biological children) was 117; adopted children with two White biological parents had average IQ of 112; and it was 109 for adopted with one White and one Black biological parent, while those adopted children with two Black biological parents had average IQ of only 97. The differences became larger when they were reexamined at age 17. In particular, those with two Black biological parents were essentially not different from general Black children. It seemed that given the same environment—upper middle White parents, those who had Black biological parents had significantly lower IQ than others, which pointed to the direction of genetic determinism.Furthermore, children adopted from Asian countries unequivocally had much higher IQ than both Whites and Blacks had. Then the differences among transracially adopted children were not due to earlier childhood experience.

However, to literally interpret the above data is overtly dangerous. Recent studies have shown that the adopted children do not necessary grow up the same way as their step-siblings. For example, the Black children may feel subtle pressure within White families. These subtleties may have effects on their learning and daily activities. In fact, the gradient from biologically related children to adopted children and to black children in the above study suggested that the different growth track did exist between biological and adopted children. There should be no difference between biological white children and adopted white children. Furthermore, even the biological siblings have different social circlesin the school. The friends with whom Black children interact are also different, as the de facto racial separation in schools is not uncommon. Data are gradually available to explore this issue.

No matter how convincing the evidence from twin, sibling, adoption and family studies are, one can not answer what kind of genes are responsible for human intelligence. If the genes affecting intelligence are somehow linked to the characteristics of race or geographically defined groups, another natural experiment may be more interesting. Racial admixture is pervasive in the US and South American. If we can compare the IQ from those racially mixed people with those non-mixed people, we may be able to solve the IQ mystery once for all. However, one can’t rely on characteristics such as skin color to determine the racial admixture because it is the skin color and other physical appearance that cause all sorts of discrimination and separation in our society. Therefore, studies based on skin color are of little value in verifying the genetic hypothesis, even though statistical adjustments are exhaustive. The DNA based racial definition, or more precisely, polymorphism clusters, may be a better indicator to examine the genetic hypothesis of IQ. Unfortunately, data of this sort are not available.

Then, what are the implications of all the above discussion? Are the racial difference in IQ is real? What causes the difference? Unfortunately, the difference is real but we don’t know why. Another equally distressing fact is that Blacks are suffering almost all disadvantages. For example, they are more likely to die young, more likely to engage risky behavior such as bad diet. They also tend to be uneducated and unemployed. Are they all genetically based? The stereotypic view that Blacks equals to violence is also taught in most families and from TV, either deliberately or unintentionally. What are the effects of that?

One should always remember the history. Even within European descendents, people from Ireland, Balkan, and old England, were considered uncivilized and unintelligent. Those medieval aristocrats held a common view that peasants were dumb and intelligently underdeveloped. Now all white skinned people are thought equally intelligent. It is unconceivable that within hundreds of years, all of their genes become the same.

The search for genetic basis of the racial difference in IQ is one of many facets of new racism. We may have to wait till all races are biologically mixed up, and then the need of race definition and all related debates will disappear.


Freely hosted by www.xlogit.com. Powered by WordPress.