A Sparse <span>B</span>ayesian Compression Scheme — The Informative Vector Machine

Neil D. Lawrence; Ralf Herbrich

edit

Back to publications

A Sparse Bayesian Compression Scheme — The Informative Vector Machine

Neil D. Lawrence, Ralf Herbrich

, 2001.

Abstract

Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.

Links

Cite this Paper

BibTeX


@Misc{Lawrence:informative01,
  title = 	 {A Sparse Bayesian Compression Scheme — The Informative Vector Machine},
  author = 	 {Lawrence, Neil D. and Herbrich, Ralf},
  year = 	 {2001},
  url = 	 {/publications/lawrence-informative01.html},
  abstract = 	 {Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.}
}

Endnote

%0 Generic
%T A Sparse Bayesian Compression Scheme — The Informative Vector Machine
%A Neil D. Lawrence
%A Ralf Herbrich
%D 2001	
%F Lawrence:informative01
%U /publications/lawrence-informative01.html
%X Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.

RIS


TY  - GEN
TI  - A Sparse Bayesian Compression Scheme — The Informative Vector Machine
AU  - Neil D. Lawrence
AU  - Ralf Herbrich
DA  - 2001/01/01	
ID  - Lawrence:informative01
UR  - /publications/lawrence-informative01.html
AB  - Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.
ER  -

APA


Lawrence, N.D. & Herbrich, R.. (2001). A Sparse Bayesian Compression Scheme — The Informative Vector Machine.  Available from /publications/lawrence-informative01.html.