We're guests of the Department of Philosophy, so allow me to introduce the ideas of my paper with a Socratic dialog…


One day, Socrates walks into our office. He says:

I have a problem: one of my students wants to create his perfect republic and for this wants to separate the people in three groups: workers, warriors, and thinkers. But he doesn't know how to automate this process and came to me for help.

So I responded:

You need a classifier, something that maps a person's attributes—such has intelligence, strength, and courage—to one of these classes.

After which our conversation continued:

Mmm, ok. Can you suggest one?

Well, the Naive Bayesian Classifier is nice. It works by using a probabilistic model for attributes and classes to calculate the expected utility of choosing one class over another, given the attributes of the person you want to put in one of the classes.

Mmm, sounds nice. And how do you get this classifier?

Well, you need to make some initial assumptions about the attributes and enough training samples, i.e., enough pre-classified people (of which you know the attributes).

Mmm, too bad, he hasn't got the time to classify anything other than a small group of people.

Well, that doesn't have to be a problem, you can use a modification of the Naive Bayesian Classifier, the Naive Credal Classifier, which also works with a smaller number of training samples. You just have to accept that for some people, this classifier won't be able to return a unique class.

That's very reasonable. So if we take measurements for stamina, testosterone levels, and IQ, we're set to go?

Well, no, you need to discretize these data first, because the Naive Credal Classifier works with the imprecise Dirichlet-multinomial model—or IDMM—, a model for predictive inference for multinomial sampling and a cousin of the better-known imprecise Dirichlet model, which is a model for parametric inference for multinomial sampling. So the attributes' sample space has to be discrete.

My pupil will find this approach hardly ideal. What makes this IDMM thing so practical?

Well, first of all, updating this model is easy—which means training your classifier is easy—, because the probability distributions used in this model are of the same family before and after updating with a multinomial sample, a property that is called conjugacy.
Secondly, the parameters used in the model have a nice interpretation, one can be regarded more-or-less as the number of samples you've used to train the model and the other as a summary of the relevant data in the training set—if you're familiar with the term: an average sufficient statistic.
Then third and last, the optimization problems you have to solve to compare two classes are easy.

And so, aren't there any conjugate probability distributions for, say, the IQ samples we would get?

Well, actually, yes. It would be acceptable to say that IQ is distributed normally, for which the conjugate is well-known.

And the second property, about the interpretation of the parameters, what happens to that, in this case?

Well, with the way in which the distributions are usually written down, this doesn't appear to be the case. But by doing a parameter transformation, such that the normal distribution is written in the standard exponential family form, then, yes, the conjugate will have parameters with this nice interpretation. Come to think of it, the multinomial distribution can also be written in the standard exponential family form. And when you look at this, it's clear that the conjugate distributions to all exponential families will have parameters with the same property!

I'm glad for you. So does this mean that my problem is solved?

Well, perhaps, but I'd still have to look at the optimization problems involved…

Then he looked at his watch, waved goodbye, muttered something about going to court, moved, and left.
I started writing my ISIPTA paper and afterwards made the poster…