A Sparse Bayesian Compression Scheme — The Informative Vector Machine

Neil D. LawrenceRalf Herbrich
, 2001.

Abstract

Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.

Cite this Paper


BibTeX
@InProceedings{pmlr-v-lawrence-informative01, title = {A Sparse Bayesian Compression Scheme — The Informative Vector Machine}, author = {Neil D. Lawrence and Ralf Herbrich}, year = {}, editor = {}, url = {http://inverseprobability.com/publications/lawrence-informative01.html}, abstract = {Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.} }
Endnote
%0 Conference Paper %T A Sparse Bayesian Compression Scheme — The Informative Vector Machine %A Neil D. Lawrence %A Ralf Herbrich %B %C Proceedings of Machine Learning Research %D %E %F pmlr-v-lawrence-informative01 %I PMLR %J Proceedings of Machine Learning Research %P -- %U http://inverseprobability.com %V %W PMLR %X Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.
RIS
TY - CPAPER TI - A Sparse Bayesian Compression Scheme — The Informative Vector Machine AU - Neil D. Lawrence AU - Ralf Herbrich BT - PY - DA - ED - ID - pmlr-v-lawrence-informative01 PB - PMLR SP - DP - PMLR EP - L1 - UR - http://inverseprobability.com/publications/lawrence-informative01.html AB - Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point. ER -
APA
Lawrence, N.D. & Herbrich, R.. (). A Sparse Bayesian Compression Scheme — The Informative Vector Machine. , in PMLR :-

Related Material