As klu library is now linked to ode15i, I could test the performance of the code both in the dense and sparse case. I tested my oct-file against Matlab ode15i m-file, the C implementation of sundials examples and the m-file relying on the mex interface of sundials. Although the code still contains unefficient use of memory, I think that the results are pretty interesting.
Before going into details, I have to specify that C and mex implementations are written for solving these specific problems, so their code is highly optimized (specific Jacobian matrices are constructed directly inside functions required by IDA, while in my oct-file I receive a function that when evaluated it returns the Jacobian matrices I have to pass to IDA). In other words, solving a problem requires more time than solving the problem.
The first test regards the solution of Robertson chemical kinetics problem, in which the Jacobian is a 3x3 dense matrix. I found a solution for RelTol = 1e-4, AbsTol = [1e-8, 1e-14, 1e-6], tspan = [0, 0.4] and I got (in mean):
0.015s for the oct-file
0.13s for the mex-file
0.002s for the C-implementation
0.19s for Matlab ode15i.
The second test regards the solution of a 2D heat equation. In this problem the Jacobian was a 100x100 sparse matrix with 330 non zero elements. I tested with RelTol = 1e-5, AbsTol = 1e-8 and tspan =[0, 0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24] and got:
0.42s for the oct-file
0.005s for the C-implementation
0.45s for Matlab ode15i.
Unfortunately Sundials doesn't provide a mex interface for klu, so I couldn't test a sparse mex file version for this problem.
We can now make some considerations:
-Oct-file is 10 times faster than mex-file and this confirms that mex files cannot reach the performance of an equivalent (in this case, of a more general) otc-file
-Our solver is already faster than Matlab m-file
-The C implementation is the fastest, but code is designed to solve only a specific problem. We can try to get closer to this speed.
In order to improve the performance of the solver, the next step is to try to reduce looping and the use of memory and to write a DAE class, using functors and lambda functions to avoid the use of global pointers.