Sometimes it is useful to have a function that smooths as well as interpolates data. In cases where the domain of the data is well defined and it is not necessary to preserve the actual data points a Chebyshev polynomial approximation can be useful and quite accurate.
The algorithm below was adapted from Numerical Recipes and overloaded to take a variety of input patterns including functions. The output is a polynomial as a Function.
Data
volumeData1=
Uncompress[
]
;
The data for the examples below is minute volume data for the ten lowest volatility trading days for the SPY ETF between 31 March and 3 September 2021. The idea is to get a polynomial fit that smooths the data for each day to get an idea of the underlying function shape.
In[]:=
GraphicsGridPartitionDateListPlot#,
&/@volumeData1,5,ImageSize->1000
Out[]=
The data was changed from a volume per minute to accumulated volume divided by total volume to standardize the volume function to a distribution function of relative volume per trading minute of the day. The raw data gives the data for the end of the minute and there are 390 minutes per day. A zero volume at minute 0 was added to give a zero volume point before trading starts. Note that if you change the Method to “Hermite” you should apply N to the data, as it is in integers and the ChebyPoly algorithms are set up to support arbitrary precision. Note also that numerically integrating the data with Accumulate[] causes some smoothing.
The thick black line is that of an ArcSinDistribution which was suggested by the shape of the polynomial function.
Here is the ArcSinDistribution compared to the average of the ten polynomial functions. The minute volume distribution accumulates more slowly, than the ArCSinDistribution but catches up rapidly in the final minutes of trading.
What I found interesting was that the fit to the relative volume by minute could be nicely modeled by the first derivative of the polynomial approximations.
This compares the Chebyshev polynomial fit to the actual distribution function, using the same n = 16, then compares the first derivatives. Note that actual ArcSinDistribution density at points 0 and 390 would be infinite.
Higher degree polynomials can run into precision problems, because of the large number of multiplications of the Chebyshev coefficients involved in converting the Chebyshev coefficients to polynomial coefficients. The routine ChebyEval reduces the number of multiplications greatly reducing the loss of precision. The second plot takes a while because of the oscillations caused by loss of precision.
The first derivative may also be calculated more efficiently directly from the Chebyshev coefficients than from the first derivative of the polynomial.