[edit]

# A Sparse Bayesian Compression Scheme — The Informative Vector Machine

Neil D. Lawrence, Ralf Herbrich , 2001.

#### Abstract

Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.

#### Cite this Paper

BibTeX

```
@InProceedings{pmlr-v-lawrence-informative01,
title = {A Sparse Bayesian Compression Scheme — The Informative Vector Machine},
author = {Neil D. Lawrence and Ralf Herbrich},
year = {},
editor = {},
url = {http://inverseprobability.com/publications/lawrence-informative01.html},
abstract = {Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.}
}
```

Endnote

```
%0 Conference Paper
%T A Sparse Bayesian Compression Scheme — The Informative Vector Machine
%A Neil D. Lawrence
%A Ralf Herbrich
%B
%C Proceedings of Machine Learning Research
%D
%E
%F pmlr-v-lawrence-informative01
%I PMLR
%J Proceedings of Machine Learning Research
%P --
%U http://inverseprobability.com
%V
%W PMLR
%X Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.
```

RIS

```
TY - CPAPER
TI - A Sparse Bayesian Compression Scheme — The Informative Vector Machine
AU - Neil D. Lawrence
AU - Ralf Herbrich
BT -
PY -
DA -
ED -
ID - pmlr-v-lawrence-informative01
PB - PMLR
SP -
DP - PMLR
EP -
L1 -
UR - http://inverseprobability.com/publications/lawrence-informative01.html
AB - Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.
ER -
```

APA

`Lawrence, N.D. & Herbrich, R.. (). A Sparse Bayesian Compression Scheme — The Informative Vector Machine. `*, in PMLR* :-

#### Related Material

BibTeX

```
@InProceedings{/lawrence-informative01,
title = {A Sparse Bayesian Compression Scheme — The Informative Vector Machine},
author = {Neil D. Lawrence and Ralf Herbrich},
year = {},
editor = {},
url = {http://inverseprobability.com/publications/lawrence-informative01.html},
abstract = {Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.}
}
```

Endnote

```
%0 Conference Paper
%T A Sparse Bayesian Compression Scheme — The Informative Vector Machine
%A Neil D. Lawrence
%A Ralf Herbrich
%B
%C Proceedings of Machine Learning Research
%D
%E
%F /lawrence-informative01
%I PMLR
%J Proceedings of Machine Learning Research
%P --
%U http://inverseprobability.com
%V
%W PMLR
%X Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.
```

RIS

```
TY - CPAPER
TI - A Sparse Bayesian Compression Scheme — The Informative Vector Machine
AU - Neil D. Lawrence
AU - Ralf Herbrich
BT -
PY -
DA -
ED -
ID - /lawrence-informative01
PB - PMLR
SP -
DP - PMLR
EP -
L1 -
UR - http://inverseprobability.com/publications/lawrence-informative01.html
AB - Kernel based learning algorithms allow the mapping of data-set into an infinite dimensional feature space in which a classification may be performed. As such kernel methods represent a powerful approach to the solution of many non-linear problems. However kernel methods do suffer from one unfortunate drawback, the Gram matrix contains m rows and columns where m is the number of data-points. Many operations are therefore precluded (e.g. matrix inverse $O(m^3)$) when data-sets containing more than about $10^4$ points are encountered. One approach to resolving these issues is to look for sparse representations of the data-set A sparse representation contains a reduced number of examples. Loosely speaking we are interested in extracting the maximum amount of information from the minimum number of data-points. To achieve this in a principled manner we are interested in estimating the amount of information each data-point contains. In the framework presented here we make use of the Bayesian methodology to determine how much information is gained from each data-point.
ER -
```

APA

`Lawrence, N.D. & Herbrich, R.. (). A Sparse Bayesian Compression Scheme — The Informative Vector Machine. `*, in PMLR* :-