Discussion about this post

User's avatar
A Z Mackay's avatar

While some might argue (myself included) that a guide to machine learning containing over 30 e-books sounds a wee bit excessive, it's still an intriguing resource for anyone looking to delve into the depths of this ever-expanding domain. After all, who can resist the temptation of hatching neural networks and unraveling the mysteries of data prediction? It's impossible to ignore the intriguing possibilities brought about by the power of machine learning. Cheers to knowledge enhancement.

Expand full comment
suman suhag's avatar

Hidden Markov Models can be used to generate a language, that is, list elements from a family of strings. For example, if you have a HMM that models a set of sequences, you would be able to generate members of this family, by listing sequences that would fall into the group of sequences we are modelling.

Neural Networks, take an input from a high-dimensional space and simply map it to a lower dimensional space (the way that the Neural Networks map this input is based on the training, its topology and other factors). For example, you might take a 64-bit image of a number and map it to a true / false value that describes whether this number is 1 or 0.

Whilst both methods are able to (or can at least try to) discriminate whether an item is a member of a class or not, Neural Networks cannot generate a language as described above.

There are alternatives to Hidden Markov Models available, for example you might be able to use a more general Bayesian Network, a different topology or a Stochastic Context-Free Grammar (SCFG) if you believe that the problem lies within the HMMs lack of power to model your problem - that is, if you need an algorithm that is able to discriminate between more complex hypotheses and/or describe the behaviour of data that is much more complex.

What is hidden and what is observed: The thing that is hidden in a hidden Markov model is the same as the thing that is hidden in a discrete mixture model, so for clarity, forget about the hidden state's dynamics and stick with a finite mixture model as an example. The 'state' in this model is the identity of the component that caused each observation. In this class of model such causes are never observed, so 'hidden cause' is translated statistically into the claim that the observed data have marginal dependencies which are removed when the source component is known. And the source components are estimated to be whatever makes this statistical relationship true. The thing that is hidden in a feedforward multilayer neural network with sigmoid middle units is the states of those units, not the outputs which are the target of inference. When the output of the network is a classification, i.e., a probability distribution over possible output categories, these hidden units values define a space within which categories are separable. The trick in learning such a model is to make a hidden space (by adjusting the mapping out of the input units) within which the problem is linear. Consequently, non-linear decision boundaries are possible from the system as a whole.

Generative versus discriminative: The mixture model (and HMM) is a model of the data generating process, sometimes called a likelihood or 'forward model'. When coupled with some assumptions about the prior probabilities of each state you can infer a distribution over possible values of the hidden state using Bayes theorem (a generative approach). Note that, while called a 'prior', both the prior and the parameters in the likelihood are usually learned from data. In contrast to the mixture model (and HMM) the neural network learns a posterior distribution over the output categories directly (a discriminative approach). This is possible because the output values were observed during estimation. And since they were observed, it is not necessary to construct a posterior distribution from a prior and a specific model for the likelihood such as a mixture. The posterior is learnt directly from data, which is more efficient and less model dependent.

Mix and match: To make things more confusing, these approaches can be mixed together, e.g. when mixture model (or HMM) state is sometimes actually observed. When that is true, and in some other circumstances not relevant here, it is possible to train discriminatively in an otherwise generative model. Similarly it is possible to replace the mixture model mapping of an HMM with a more flexible forward model, e.g., a neural network.

Expand full comment
1 more comment...

No posts