CUBGCV TEST DRIVER RESULTS: IER = 0 VAR = 0.0279 GENERALIZED CROSS VALIDATION = 0.0318 MEAN SQUARE RESIDUAL = 0.0246 RESIDUAL DEGREES OF FREEDOM = 43.97 INPUT DATA OUTPUT RESULTS I X(I) F(I) Y(I) SE(I) C(I,1) C(I,2) C(I,3) 1 0.0046 0.2222 0.0342 0.1004 0.3630E+01 0.0000E+00 0.2542E+02 2 0.0360 -0.1098 0.1488 0.0750 0.3705E+01 0.2391E+01 -0.9537E+01 3 0.0435 -0.0658 0.1767 0.0707 0.3740E+01 0.2175E+01 -0.4233E+02 4 0.0735 0.3906 0.2900 0.0594 0.3756E+01 -0.1642E+01 -0.2872E+02 5 0.0955 0.6054 0.3714 0.0558 0.3642E+01 -0.3535E+01 0.2911E+01 6 0.1078 0.3034 0.4155 0.0549 0.3557E+01 -0.3428E+01 -0.1225E+02 7 0.1269 0.7386 0.4822 0.0544 0.3412E+01 -0.4131E+01 0.2242E+02 8 0.1565 0.4616 0.5800 0.0543 0.3227E+01 -0.2143E+01 0.6415E+01 9 0.1679 0.4315 0.6165 0.0543 0.3180E+01 -0.1923E+01 -0.1860E+02 10 0.1869 0.5716 0.6762 0.0544 0.3087E+01 -0.2985E+01 -0.3274E+02 11 0.2149 0.6736 0.7595 0.0542 0.2843E+01 -0.5733E+01 -0.4435E+02 12 0.2356 0.7388 0.8155 0.0539 0.2549E+01 -0.8486E+01 -0.5472E+02 13 0.2557 1.1953 0.8630 0.0537 0.2139E+01 -0.1180E+02 -0.9784E+01 14 0.2674 1.0299 0.8864 0.0536 0.1860E+01 -0.1214E+02 0.9619E+01 15 0.2902 0.7981 0.9225 0.0534 0.1322E+01 -0.1149E+02 -0.7202E+01 16 0.3155 0.8973 0.9485 0.0532 0.7269E+00 -0.1203E+02 -0.1412E+02 17 0.3364 1.2695 0.9583 0.0530 0.2040E+00 -0.1292E+02 0.2796E+02 18 0.3557 0.7253 0.9577 0.0527 -0.2638E+00 -0.1130E+02 -0.3453E+01 19 0.3756 1.2127 0.9479 0.0526 -0.7176E+00 -0.1151E+02 0.3235E+02 20 0.3881 0.7304 0.9373 0.0525 -0.9889E+00 -0.1030E+02 0.4381E+01 21 0.4126 0.9810 0.9069 0.0525 -0.1486E+01 -0.9977E+01 0.1440E+02 22 0.4266 0.7117 0.8842 0.0525 -0.1756E+01 -0.9373E+01 -0.8925E+01 23 0.4566 0.7203 0.8227 0.0524 -0.2344E+01 -0.1018E+02 -0.2278E+02 24 0.4704 0.9242 0.7884 0.0524 -0.2637E+01 -0.1112E+02 -0.4419E+01 25 0.4914 0.7345 0.7281 0.0523 -0.3110E+01 -0.1140E+02 -0.3562E+01 26 0.5084 0.7378 0.6720 0.0524 -0.3500E+01 -0.1158E+02 0.5336E+01 27 0.5277 0.7441 0.6002 0.0525 -0.3941E+01 -0.1127E+02 0.2479E+02 28 0.5450 0.5612 0.5286 0.0527 -0.4310E+01 -0.9980E+01 0.2920E+02 29 0.5641 0.5049 0.4429 0.0529 -0.4659E+01 -0.8309E+01 0.3758E+02 30 0.5857 0.4725 0.3390 0.0531 -0.4964E+01 -0.5878E+01 0.5563E+02 31 0.6159 0.1380 0.1850 0.0531 -0.5167E+01 -0.8307E+00 0.4928E+02 32 0.6317 0.1412 0.1036 0.0531 -0.5157E+01 0.1499E+01 0.5437E+02 33 0.6446 -0.1110 0.0371 0.0531 -0.5091E+01 0.3614E+01 0.3434E+02 34 0.6707 -0.2605 -0.0927 0.0532 -0.4832E+01 0.6302E+01 0.1164E+02 35 0.6853 -0.1284 -0.1619 0.0533 -0.4640E+01 0.6812E+01 0.1617E+02 36 0.7064 -0.3452 -0.2564 0.0536 -0.4332E+01 0.7834E+01 0.4164E+01 37 0.7310 -0.5527 -0.3582 0.0538 -0.3939E+01 0.8141E+01 -0.2214E+02 38 0.7531 -0.3459 -0.4415 0.0540 -0.3611E+01 0.6674E+01 -0.9205E+01 39 0.7686 -0.5902 -0.4961 0.0541 -0.3410E+01 0.6245E+01 -0.2193E+02 40 0.7952 -0.7644 -0.5828 0.0541 -0.3125E+01 0.4494E+01 -0.4649E+02 41 0.8087 -0.5392 -0.6242 0.0541 -0.3029E+01 0.2614E+01 -0.3499E+02 42 0.8352 -0.4247 -0.7031 0.0539 -0.2964E+01 -0.1603E+00 0.2646E+01 43 0.8501 -0.6327 -0.7476 0.0538 -0.2967E+01 -0.4132E-01 0.1817E+02 44 0.8726 -0.9983 -0.8139 0.0538 -0.2942E+01 0.1180E+01 -0.6774E+01 45 0.8874 -0.9082 -0.8574 0.0542 -0.2911E+01 0.8778E+00 -0.1364E+02 46 0.9139 -0.8930 -0.9340 0.0566 -0.2893E+01 -0.2044E+00 -0.8094E+01 47 0.9271 -1.0233 -0.9723 0.0593 -0.2903E+01 -0.5258E+00 -0.1498E+02 48 0.9473 -0.8839 -1.0313 0.0665 -0.2942E+01 -0.1433E+01 0.4945E+01 49 0.9652 -1.0172 -1.0843 0.0766 -0.2989E+01 -0.1168E+01 0.1401E+02 50 0.9930 -1.2715 -1.1679 0.0998 Documentation: C COMPUTER - VAX/DOUBLE C C AUTHOR - M.F.HUTCHINSON C CSIRO DIVISION OF MATHEMATICS AND STATISTICS C P.O. BOX 1965 C CANBERRA, ACT 2601 C AUSTRALIA C C LATEST REVISION - 15 AUGUST 1985 C C PURPOSE - CUBIC SPLINE DATA SMOOTHER C C USAGE - CALL CUBGCV (X,F,DF,N,Y,C,IC,VAR,JOB,SE,WK,IER) C C ARGUMENTS X - VECTOR OF LENGTH N CONTAINING THE C ABSCISSAE OF THE N DATA POINTS C (X(I),F(I)) I=1..N. (INPUT) X C MUST BE ORDERED SO THAT C X(I) .LT. X(I+1). C F - VECTOR OF LENGTH N CONTAINING THE C ORDINATES (OR FUNCTION VALUES) C OF THE N DATA POINTS (INPUT). C DF - VECTOR OF LENGTH N. (INPUT/OUTPUT) C DF(I) IS THE RELATIVE STANDARD DEVIATION C OF THE ERROR ASSOCIATED WITH DATA POINT I. C EACH DF(I) MUST BE POSITIVE. THE VALUES IN C DF ARE SCALED BY THE SUBROUTINE SO THAT C THEIR MEAN SQUARE VALUE IS 1, AND UNSCALED C AGAIN ON NORMAL EXIT. C THE MEAN SQUARE VALUE OF THE DF(I) IS RETURNED C IN WK(7) ON NORMAL EXIT. C IF THE ABSOLUTE STANDARD DEVIATIONS ARE KNOWN, C THESE SHOULD BE PROVIDED IN DF AND THE ERROR C VARIANCE PARAMETER VAR (SEE BELOW) SHOULD THEN C BE SET TO 1. C IF THE RELATIVE STANDARD DEVIATIONS ARE UNKNOWN, C SET EACH DF(I)=1. C N - NUMBER OF DATA POINTS (INPUT). C N MUST BE .GE. 3. C Y,C - SPLINE COEFFICIENTS. (OUTPUT) Y C IS A VECTOR OF LENGTH N. C IS C AN N-1 BY 3 MATRIX. THE VALUE C OF THE SPLINE APPROXIMATION AT T IS C S(T)=((C(I,3)*D+C(I,2))*D+C(I,1))*D+Y(I) C WHERE X(I).LE.T.LT.X(I+1) AND C D = T-X(I). C IC - ROW DIMENSION OF MATRIX C EXACTLY C AS SPECIFIED IN THE DIMENSION C STATEMENT IN THE CALLING PROGRAM. (INPUT) C VAR - ERROR VARIANCE. (INPUT/OUTPUT) C IF VAR IS NEGATIVE (I.E. UNKNOWN) THEN C THE SMOOTHING PARAMETER IS DETERMINED C BY MINIMIZING THE GENERALIZED CROSS VALIDATION C AND AN ESTIMATE OF THE ERROR VARIANCE IS C RETURNED IN VAR. C IF VAR IS NON-NEGATIVE (I.E. KNOWN) THEN THE C SMOOTHING PARAMETER IS DETERMINED TO MINIMIZE C AN ESTIMATE, WHICH DEPENDS ON VAR, OF THE TRUE C MEAN SQUARE ERROR, AND VAR IS UNCHANGED. C IN PARTICULAR, IF VAR IS ZERO, THEN AN C INTERPOLATING NATURAL CUBIC SPLINE IS CALCULATED. C VAR SHOULD BE SET TO 1 IF ABSOLUTE STANDARD C DEVIATIONS HAVE BEEN PROVIDED IN DF (SEE ABOVE). C JOB - JOB SELECTION PARAMETER. (INPUT) C JOB = 0 SHOULD BE SELECTED IF POINT STANDARD ERROR C ESTIMATES ARE NOT REQUIRED IN SE. C JOB = 1 SHOULD BE SELECTED IF POINT STANDARD ERROR C ESTIMATES ARE REQUIRED IN SE. C SE - VECTOR OF LENGTH N CONTAINING BAYESIAN STANDARD C ERROR ESTIMATES OF THE FITTED SPLINE VALUES IN Y. C SE IS NOT REFERENCED IF JOB=0. (OUTPUT) C WK - WORK VECTOR OF LENGTH 7*(N + 2). ON NORMAL EXIT THE C FIRST 7 VALUES OF WK ARE ASSIGNED AS FOLLOWS:- C C WK(1) = SMOOTHING PARAMETER (= RHO/(RHO + 1)) C WK(2) = ESTIMATE OF THE NUMBER OF DEGREES OF C FREEDOM OF THE RESIDUAL SUM OF SQUARES C WK(3) = GENERALIZED CROSS VALIDATION C WK(4) = MEAN SQUARE RESIDUAL C WK(5) = ESTIMATE OF THE TRUE MEAN SQUARE ERROR C AT THE DATA POINTS C WK(6) = ESTIMATE OF THE ERROR VARIANCE C WK(7) = MEAN SQUARE VALUE OF THE DF(I) C C IF WK(1)=0 (RHO=0) AN INTERPOLATING NATURAL CUBIC C SPLINE HAS BEEN CALCULATED. C IF WK(1)=1 (RHO=INFINITE) A LEAST SQUARES C REGRESSION LINE HAS BEEN CALCULATED. C WK(2) IS AN ESTIMATE OF THE NUMBER OF DEGREES OF C FREEDOM OF THE RESIDUAL WHICH REDUCES TO THE C USUAL VALUE OF N-2 WHEN A LEAST SQUARES REGRESSION C LINE IS CALCULATED. C WK(3),WK(4),WK(5) ARE CALCULATED WITH THE DF(I) C SCALED TO HAVE MEAN SQUARE VALUE 1. THE C UNSCALED VALUES OF WK(3),WK(4),WK(5) MAY BE C CALCULATED BY DIVIDING BY WK(7). C WK(6) COINCIDES WITH THE OUTPUT VALUE OF VAR IF C VAR IS NEGATIVE ON INPUT. IT IS CALCULATED WITH C THE UNSCALED VALUES OF THE DF(I) TO FACILITATE C COMPARISONS WITH A PRIORI VARIANCE ESTIMATES. C C IER - ERROR PARAMETER. (OUTPUT) C TERMINAL ERROR C IER = 129, IC IS LESS THAN N-1. C IER = 130, N IS LESS THAN 3. C IER = 131, INPUT ABSCISSAE ARE NOT C ORDERED SO THAT X(I).LT.X(I+1). C IER = 132, DF(I) IS NOT POSITIVE FOR SOME I. C IER = 133, JOB IS NOT 0 OR 1. C C PRECISION/HARDWARE - DOUBLE C C REQUIRED ROUTINES - SPINT1,SPFIT1,SPCOF1,SPERR1 C C REMARKS THE NUMBER OF ARITHMETIC OPERATIONS REQUIRED BY THE C SUBROUTINE IS PROPORTIONAL TO N. THE SUBROUTINE C USES AN ALGORITHM DEVELOPED BY M.F. HUTCHINSON AND C F.R. DE HOOG, 'SMOOTHING NOISY DATA WITH SPLINE C FUNCTIONS', NUMER. MATH. (IN PRESS) C C----------------------------------------------------------------------- C ALGORITHM CUBGCV calculates a natural cubic spline curve which smoothes a given set of data points, using statistical considerations to determine the amount of smoothing required, as described in reference 2. If the error variance is known, it should be supplied to the routine in VAR. The degree of smoothing is then determined by minimizing an unbiased estimate of the true mean square error. On the other hand, if the error variance is not known, VAR should be set to -1.0. The routine then determines the degree of smoothing by minimizing the generalized cross validation. This is asymptotically the same as minimizing the true mean square error (see reference 1). In this case, an estimate of the error variance is returned in VAR which may be compared with any a priori approximate estimates. In either case, an estimate of the true mean square error is returned in WK(5). This estimate, however, depends on the error variance estimate, and should only be accepted if the error variance estimate is reckoned to be correct. If job is set to 1, bayesian estimates of the standard error of each smoothed data value are returned in the array SE. These also depend on the error variance estimate and should only be accepted if the error variance estimate is reckoned to be correct. See reference 4. The number of arithmetic operations and the amount of storage required by the routine are both proportional to N, so that very large data sets may be analysed. The data points do not have to be equally spaced or uniformly weighted. The residual and the spline coefficients are calculated in the manner described in reference 3, while the trace and various statistics, including the generalized cross validation, are calculated in the manner described in reference 2. When VAR is known, any value of N greater than 2 is acceptable. It is advisable, however, for N to be greater than about 20 when VAR is unknown. If the degree of smoothing done by CUBGCV when VAR is unknown is not satisfactory, the user should try specifying the degree of smoothing by setting VAR to a reasonable value. References: 1. Craven, Peter and Wahba, Grace, "Smoothing noisy data with spline functions", Numer. Math. 31, 377-403 (1979). 2. Hutchinson, M.F. and de Hoog, F.R., "Smoothing noisy data with spline functions", Numer. Math. (in press). 3. Reinsch, C.H., "Smoothing by spline functions", Numer. Math. 10, 177-183 (1967). 4. Wahba, Grace, "Bayesian 'confidence intervals' for the cross-validated smoothing spline", J.R.Statist. Soc. B 45, 133-150 (1983). Example A sequence of 50 data points are generated by adding a random variable with uniform density in the interval [-0.3,0.3] to the curve y=sin(3*pi*x/2). The abscissae are unequally spaced in [0,1]. Point standard error estimates are returned in SE by setting JOB to 1. The error variance estimate is returned in VAR. It compares favourably with the true value of 0.03. The IMSL function GGUBFS is used to generate sample values of a uniform variable on [0,1]. INPUT: INTEGER N,IC,JOB,IER DOUBLE PRECISION X(50),F(50),Y(50),DF(50),C(49,3),WK(400), * VAR,SE(50) DOUBLE PRECISION GGUBFS,DSEED,DN DATA DSEED/1.2345D4/ C N=50 IC=49 JOB=1 VAR=-1.0D0 DN=N C DO 10 I=1,N X(I)=(I - 0.5)/DN + (2.0*GGUBFS(DSEED) - 1.0)/(3.0*DN) F(I)=DSIN(4.71238*X(I)) + (2.0*GGUBFS(DSEED) - 1.0)*0.3 DF(I)=1.0D0 10 CONTINUE CALL CUBGCV(X,F,DF,N,Y,C,IC,VAR,JOB,SE,WK,IER) . . . END OUTPUT: IER = 0 VAR = 0.0279 GENERALIZED CROSS VALIDATION = 0.0318 MEAN SQUARE RESIDUAL = 0.0246 RESIDUAL DEGREES OF FREEDOM = 43.97 FOR CHECKING PURPOSES THE FOLLOWING OUTPUT IS GIVEN: X(1) = 0.0046 F(1) = 0.2222 Y(1) = 0.0342 SE(1) = 0.1004 X(21) = 0.4126 F(21) = 0.9810 Y(21) = 0.9069 SE(21) = 0.0525 X(41) = 0.8087 F(41) = -0.5392 Y(41) = -0.6242 SE(41) = 0.0541