Tuesday, 12 February 2008

2007_11_01_archive



New Class: Search and the New Economy

Next semester, I will be teaching an MBA class with the title "Search

and the New Economy," and I will be also participating in the

undergraduate version of the class, taught by Norm White. The intended

audience for the class are MBA students, that have interest in

technology but are not necessarily programmers.

I have been thinking a lot on how to organize such a class, so that it

has some internal structure and flow. My current list of topics:

1. Search Engine Marketing: Introduction, Search Basics: Crawling,

Indexing, Ranking, Pagerank, Spam, TrustRank

2. Search Engine Marketing: Analyzing and Understanding Users�s

Behavior, Web Analytics

3. Search Engine Marketing: Search Engine Optimization

4. Search Engine Marketing: AdWords, AdSense, Click Fraud

5. Social Search and Collective Intelligence: Blog Analysis and

Aggregation, Network Analysis, Opinion Mining

6. Social Search and Collective Intelligence: Recommender Systems,

Reputation Systems

7. Social Search and Collective Intelligence: Prediction Markets

8. Social Search and Collective Intelligence: Wikis and Collaborative

Production

9. Ownership of Electronic Data: Privacy on the Web

10. Ownership of Electronic Data: Intellectual Property issues on the

Web

11. Ownership of Electronic Data: The Future of Privacy and

Intellectual Property

12. Future Directions and Wrapping-up

Some rough sketches of the assignments for this course:

* Run and optimize an online advertising campaign, using Google

AdWords or Microsoft adCenter.

* Analyze the visitorship data of an online website to analyze the

effectiveness of different pages. You can use Google Analytics, or

tools like CrazyEgg

* Optimize the keyword campaign of a company by choosing the

appropriate keywords and bid amounts, depending on the competition

and the rank of the organic pages.

* Analyze (or build) a recommender system for movies, books, and TV

Shows using Facebook data.

* Build a dating recommendation system using Facebook data

* Build prediction markets at Inkling Markets, for an event of

interest, examine the accuracy of the predictions, and analyze the

behavior of the participants. Alternatively, analyze real-money

prediction markets at InTrade and BetFair and examine the effect

of real-life events in political campaigns.

* Use Google Trends to build a predictor of unemployment measures.

Any more topics what would be worth covering? Alternative exercises?

Posted by Panos Ipeirotis at 12:20 AM

1 comments Links to this post

Friday, November 9, 2007

Only for Database Geeks, the SeQueL

http://www.qwantz.com/archive/000153.html

Thanks to my students, Cissy and Shelley, for the pointer :-)

Posted by Panos Ipeirotis at 1:50 PM

1 comments Links to this post

Wednesday, November 7, 2007

What is Wrong with the ACM Typesetting Process?

Recently, I had to go through the process of preparing the

camera-ready version for two ACM TODS papers. I am not sure what

exactly is the problem but the whole typesetting process at ACM seems

to be highly problematic.

My own pet peeves:

Pet peeve A: The copyeditors do not know how to typeset math and they

do not even check the paper to see if they have incorporated correctly

their own edits.

I detected problems repeatedly and the copyeditor consistently does

not check the proofs after making the edits. Here are a few examples.

Example #1

I submit the latex sources and the PDF, with the following equation:

The copyeditor does not like the superscripted e^{\beta x_a}, so

decides to convert it into the inline form exp(\beta x_a). Not a bad

idea! Look, though, what I get back instead:

To make things worse, such errors were pervasive and appeared in many

equations in the paper. I asked the copyeditor to fix these errors and

send me back the paper after the mistakes are fixed, so that I can

check it again. I get reassured that I will be able to inspect the

galley proofs again before they go to print. Well, why would I expect

that someone who does such mistakes will be diligent enough to let me

inspect again the paper...

A couple of weeks later, and despite all the promises, I get an email

indicating that my paper was published and is available online. I

check the ACM Digital Library, and I see my paper online, with the

following formula:

OK, so we managed to get an interesting hybrid :-). Seriously, do the

ACM copyeditors even LOOK at what they are doing? If they do check and

they do not understand that this is an error, why do we even have

copyeditors?

Example #2

I assumed that the previous snafu was just an exception. Well, never

say never. A couple of days back, I got the galleys for another TODS

paper, due to be published in the next few days. Again, the copyeditor

decided to make (minor) changes in the equations. In my originally

submitted paper, I had the following equations:

In the galleys, the same equations look like:

I will repeat myself: do the ACM copyeditors even LOOK at what they

are doing? If they do check and they do not understand that this is an

error, why do we even have copyeditors?

Pet peeve B: Converting vectorized figures into bitmaps

If you have submitted a paper to a conference, you know how crazy the

copyeditors get about getting PDFs with only Type 1 fonts, vectorized,

not-bitmapped, and so on. This is a good thing, as the resulting PDFs

contain only scalable, vector-based fonts that look nice both on

screen and on paper.

For the same reason, I also prepare nice, vectorized figures for my

papers, so that they look nice both on screen and on paper. However,

for some reason, the copyeditors at ACM they seem to like to convert

the vectorized images into horrible, ugly bitmaps that do not scale

and look awful. Here is an example of a figure in the original PDF:

Here is how the same figure looks at the PDF that I received as a

galley:

Am I too picky? Is it bad that I want my papers to look good?

End of pet peeves

(Note: The same copyediting process, described above, at IEEE seems to

work perfectly fine.)

I start believing that the whole idea of publishing is a horribly

outdated process. I assumed that copyeditors were a part of a chain

that adds value to the paper, not a part that subtracts value.

If I need to check carefully my paper, being afraid that the

copyeditor will introduce bugs, that the copyeditor will make

everything look horrible, then why do we even have copyeditors? Just

get rid of them; they are simply parasites in the whole process! Can

you imagine having a professor that teaches a class and at the end the

students know less about a topic? Would you keep this professor

teaching?

Make everything open access. Let every author be responsible for the

way that the paper looks. Let the authors revise papers in digital

libraries that have problems. Why we consider perfectly acceptable to

have bug fixes and new versions for applications and operating

systems, but we want the papers that we produce to be frozen in time

and completely static?

Furthermore, the whole motivation for having journals is to have the

peer-reviewing process that guarantees that the "published" paper is

better than the submitted one. Everything else is secondary. Why keep

in the chain processes that only cause problems?

When are we going to realize that the publication system should be

completely revamped? Why not having an ongoing reviewing process,

improving the paper continuously? Should we keep the system as-is so

that we can be "objectively evaluated" by counting static papers that

are produced once and never visited again?


No comments: