Make a comment to the author.

#
A Test of the Effects of Rounding Precision Upon a Matrix Problem.

##
The Purpose.

This purpose of this test was to determine how easy it was
to find a realistic problem using 'double' variables which would
produce better results under the Linux default 64 bit precision
mode than when run under a 53 bit precision mode.

##
The Problem.

Using recognised matrix code solve the following for x:
A x = b

where A is a 10 by 10 matrix with all elements equal to 1, except
for the diagonal elements which are 1+1.0e-10, i.e. it is 'almost'
singular.
b is a column vector with elements 0, 1, 2, ..., 9.

The (arbitrarily) chosen code is from the meschach code of the 'C' package
of netlib.
Here is the program:

main() {
MAT *A, *QR;
VEC *b, *x, *diag;
u_int m=10, n=10, i, j;
A = m_get(m, n);
for ( i = 0; i < m; i++ )
for ( j = 0; j < n; j++ ) {
A->me[i][j] = 1.0;
if ( i == j )
A->me[i][j] += 1.0e-12;
}
diag = v_get(A->m);
QR = m_copy(A, MNULL);
QRfactor(QR, diag);
b = v_get(A->m);
for ( i = 0; i < b->dim; i++ )
b->ve[i] = i;
x = QRsolve(QR, diag, b, VNULL);
for ( i = 0; i < m; i++ )
printf("%.20Lg (err=%.20g)\n", (long double)x->ve[i]);
printf("\n||A*x-b|| = %Lg\n",
(long double)(v_norm2(v_sub(mv_mlt(A, x, VNULL), b, VNULL))) );
}

##
The Results.

The correct answer (computations performed at 128 bit precision) is:

-4499599982940.764920071048
-3499688875620.494937833037
-2499777768300.224955595027
-1499866660979.954973357016
-499955553659.6849911190054
499955553660.5849911190053
1499866660980.854973357016
2499777768301.124955595027
3499688875621.394937833037
4499599982941.664920071048

Three methods of compiling and running the program and the meschach
library were tried:

###
Default Linux with -O2

The answer differed from the exact answer by the following vector:
-86739704.29
9637744.92
9637744.92
9637744.92
9637744.92
9637744.92
9637744.92
9637744.92
9637744.92
9637744.92

The error estimate:

E = ||A*x-b||

was 0.00444047.
The correct value is 0.00444017.

###
Default Linux with --float-store

This method gives a executable which "almost"
has IEEE-style-rounding behaviour.
The answer vector is almost identical to that produced in the
-O2 case.
However the error estimate E produced by the
program is 9.53674e-06, which is several
orders of magnitude too small.
###
53 bit precision

This gives IEEE-style-rounding operation.
The FPU control bits are set to 53 bit precision
via the fesetprecision() function of my
**wmexcep package**.
All arithmetic results are correctly rounded to 53 bit precision.
The answer differed from the exact answer by the following vector:
2557935794.09
-284215088.23
-284215088.23
-284215088.23
-284215088.23
-284215088.23
-284215088.23
-284215088.23
-284215088.23
-284215088.23

and an error estimate E of 0.0067658.
The correct value of E is 0.0038174.
As should be expected,
the result does not depend upon the level of optimisation or
whether --float-store is used.

Note that the error vector is much larger than in the other two cases.

##
Conclusions

This program has demonstrated the following:
- --float-store is not always sufficient to get
IEEE-style-rounding
behaviour,
- the Linux default behaviour can give significantly better
results than IEEE-style-rounding
behaviour.