Tuesday, 12 February 2008

2007_05_01_archive



Names (or Parents?) Make a Difference

I just read an article about a study showing that the name of a girl

can be used to predict whether a girl will study math or physics after

the age of 16. The study, done by David Figlio, professor of economics

at the University of Florida indicated that girls with "very feminine"

names, such as Isabella, Anna, and Elizabeth, are less likely to study

hard sciences compared to girls with names like Grace or Alex.

Myself, I find it hard to understand how someone can estimate the

"femininity" of a name but it might be just me. Even if there is such

a scale though, I do not see any causality in the finding, as implied

in the article. (I see predictive power, but no causality.) In my own

interpretation, parents that choose "very feminine" names also try to

steer their daughters towards more "feminine" careers. I cannot

believe that names by themselves set a prior probability on the career

path of a child. (The Freakonomics book had a similar discussion about

names and success.)

Oh well, how you can lie with statistics...

Posted by Panos Ipeirotis at 11:40 AM

1 comments Links to this post

Tuesday, May 8, 2007

Replacing Survey Articles with Wikis?

Earlier this year, together with Ahmed Elmagarmid and Vassilios

Verykios, we published a survey article at IEEE TKDE on duplicate

record detection (also known as record linkage, deduplication, and

with many other names).

Although I see this paper as a good effort in organizing the

literature in the field, I will be the first to recognize that the

paper is incomplete. We tried our best to include every research

effort that we identified, and the reviewers helped a lot in this

respect. However, I am confident that there are still many nice papers

that we missed.

Furthermore, since the time the paper has been accepted for

publication, many more papers have been published and many more will

be published in the future. So, this means that the useful half-life

of (any?) such survey is necessarily short.

How can we make such papers more relevant and more resistant to

deprecation? One solution that I am experimenting with is to make the

survey article a wiki, and then post it to Wikipedia, allowing other

researchers to add their own papers in the survey.

I am not sure if Wikipedia is the best option, due to licensing

issues, though. A personal wiki may be a better option, but I do not

have a good grasp of the pros and cons of each approach. One of the

benefits of Wikipedia is the existence of nice templates for handling

citations. One of the disadvantages is the copyright license of

Wikipedia, which may discourage (or prevent) people from posting

material there.

Furthermore, it is not clear that a wikified document is the best way

to organize a survey. A few days back, I got a (forwarded) email from

Foster Provost, who was seeking my opinion for the best way to

organize an annotated bibliography. (Dragomir Radev had a similar

question.) Is a wiki the best option? Or is it by construction too

flat? Should we use some other type of software that allows people to

generate explicit, annotated connections between the different papers?

(Any public tool?)


No comments: