__WORKING PAPER__

## Efficient Online Non-Parametric Density Estimation

**Christophe G. Lambert**

*Department of Computer Science, Duke Unversity, Durham, NC 27708*

**Scott E. Harrington**

*Transworld Numerics Ltd., \#4 Nantucket Lane, Smiths Parish, FL05 Bermuda*

**Nathan D. Bronson**

*Transworld Numerics Ltd., \#4 Nantucket Lane, Smiths Parish, FL05 Bermuda*

**Campbell R. Harvey**

*Duke University, Durham, NC 27708*

National Bureau of Economic Research, Cambridge, MA 02138

**Arman Glodjo**

*Duke University, Durham, NC 27708*

Transworld Numerics Ltd., \#4 Nantucket Lane, Smiths Parish, FL05 Bermuda

Abstract

Non-parametric density estimation has broad applications in computational
finance especially in cases where high frequency data is available.
However, the technique is often intractible given the run times
necessary to evaluate a density. We present a new and efficient algorithm
based on multipole techniques. Given the * n* kernels that estimate the density,
current methods take * O(n)* time to directly sum the kernels to perform a
single density query. The cumulative * O(n*^{2<\sup>)} running time for * n* queries
makes it very costly, if not impractical, to compute the density for large * n*.
Our new Multipole-accelerated Online Density Estimation (MODE) algorithm
is general in that it can be applied to any kernel (in arbitrary dimensions)
that admits Taylor series expansion. The running time
reduces to * O(log(n))* or even constant time, depending on the kernel chosen,
and hence, the cumulative running time is reduced to * O(nlog(n))* or * O(n)*,
respectively.
Our results show that the MODE algorithm provides dramatic advantages over
the direct approach to density evaluation. For example, we show using
a modest computing platform
that on-line density updates and queries for one million points and
two dimensions take 8 days to compute using the direct approach versus
40 seconds with the MODE approach.