We used it in a project that performed document recognition and analysis. It was useful for aligning the fields to be extracted from the source image as we formulated it as an optimization problem.
It was straightforward to use and it contains some neat loss functions that reduce the effect of outliers (e.g. CauchyLoss).
I believe Ceres was originally developed for nonlinear least squares optimization, particularly in the context of structure from motion (as bundle adjustment) and related problems in computer vision and robotics (esp. SLAM). As such it has some features that are useful in that context, such as automatic differentiation and covariance estimation. I see that it also has more general nonlinear optimization feature, but even in that feature it seems like it assumes there will be at least first order gradients available (and again, for this the auto differentiation is handy).
On the other hand it seems like NLOPT is oriented towards implementing various more general "black box" optimization methods, only some of which need/support gradient information, and there's no auto differentiation.
So if I was working on some kind of SFM/SLAM problem, I'd probably use Ceres, but if I had a less structured optimization problem - and especially if I didn't have gradients - I'd try NLOPT.
For all of the success applying GPUs to optimization problems in ML, why don't any of the common optimization packages seem to support GPU acceleration?
There are two "directions" along which you can parallelize:
- explore different parts of the parameter hyperspace in parallel.
- for a given parametrization, split the model and/or objective function so that its parts can be computed in parallel.
The second approach is model-specific, and gives you nice speedups (make your model N times faster, and you will converge N times faster), but is often not particularly well-suited for accelerators (which includes GPUs) due to the latency of moving data back and forth, but again with model-specific tuning you can maybe make it work. Most of the time SIMD on CPU is best here for most traditional problems.
The first approach, which already pretty much requires the second so that the model and the optimization can run on the same computing unit, well it isn't particularly great since you're doing computations that are suboptimal and/or redundant to begin with. Any speedup isn't obvious, depends on the optimization algorithm and the convergence characteristics of your problem. Also as you follow along some paths in parallel, you'll eventually need to sync up, and since they have divergent control flow, this means you're not able to make the most of the computing resources which will be stalling quite often. Often, with enough tuning for your particular problem and method, you can make it work.
So why don't generic libraries do it on GPU? Because unless you tune everything for your particular problem, it's just not going to perform as well as on CPU.
Many already do, but not necessarily explicitly. If an optimizer accepts user defined functions for its evaluation and derivatives, these computations can be done using a GPU even though the optimizer itself knows nothing about the GPU. For example, GPU are extensively used in parameter estimation problems associated with PDE contrained optimization. Essentially, the PDE solves use GPUs to solve the differential equation and then then results are fed back into the optimizer. Many of these packages use common open source optimizers.
More generally, there's a question of where the algorithms themselves benefit from GPUs or parallelism in general. For large scale nonlinear, continuous optimization problems using second-order, Newton like methods, the big costs are in the function evaluations, their derivatives, and the linear system preconditioners/solves. Generally speaking, how the function evaluations and derivatives are computed are on the user. For the preconditioning/linear system solves, there's value in parallelism. However, here, the GPUs have traditionally lagged. Basically, we need a factorization, be it sparse or dense, and it's only been recently where good library support has been extended for GPUs. For the longest time, the entire matrix factorization needed to fit onto the GPU and many of these matrices were large. That said, for optimizers that accept a user-defined preconditioner, the use of GPUs is already possible.
I believe that when the number of parameters isn't in the millions, as in deep networks, there's less of an advantage to using GPUs. But there is definitely research on using GPUs to solve nonlinear least squares optimization problems. Here's one I saw a few years ago: https://dl.acm.org/doi/10.1145/3132188
There are a lot of large-scale optimization problems in industry that are still compute bound. Currently available solvers are either single threaded on the CPU, or offer "parallelism" by running copies of the same problem on multiple threads, but with different initial conditions in hopes that one happens to converge faster.
I'll contend that this depends. For example, in mixed-integer linear programs, the function evaluations are trivial, but the combinatorial search is expensive. Alternatively, for nonlinear, continuous optimization problems with constraints, the primary cost is often the preconditioning or factorization of the systems associated either with an augmented or KKT systems. That said, for unconstrained or bound constrained, nonlinear, continuous optimization problems, I largely agree. And, even with general constraints, it's sometimes the case it's the function evaluations. Mostly, I wanted to contend that the linear system solves can often be the limiting factor and better large scale factorization codes are needed.
I'd be interested to see how this compares to Googles also recently released Vizier https://oss-vizier.readthedocs.io/en/latest/ other than one is black box and the other isn't.
Can anyone comment what this project considers "large optimization problems". More than 2 variables, few hundred variables or many thousands of variables?
- Estimate the pose of Street View cars, aircrafts, and satellites.
- Build 3D models for PhotoTours.
- Estimate satellite image sensor characteristics.
- Stitch panoramas on Android and iOS.
- Apply Lens Blur on Android.
- Solve bundle adjustment and SLAM problems in Project Tango.
Microsoft Research uses Ceres for nonlinear optimization of objectives involving subdivision surfaces under skinned control meshes.
http://ceres-solver.org/users.html