VaLiPro: Linear Programming Validator for Cluster Computing Systems

The article presents and evaluates a scalable algorithm for validating solutions of linear programming problems on cluster computing systems. The main idea of the method is to generate a regular set of points (validation set) on a small-radius hypersphere centered at the point of the solution under validation. The objective function is calculated for each point of the validation set that belongs to the feasible region. If all these values are less than or equal to the value of the objective function at the point under validation, then this point is the correct solution. The parallel implementation of the VaLiPro algorithm is performed in C++ through the parallel BSF-skeleton, which encapsulates all aspects related to the MPI-based parallelization of the program. We provide the results of large-scale computational experiments on a cluster computing system to study the scalability of the VaLiPro algorithm.


Introduction
The era of big data [1,2] has generated large-scale linear programming (LP) problems [3].Such problems arise in economics, industry, logistics, statistics, quantum physics, and other fields.To solve them, high-performance computing systems and parallel algorithms are required.Thus, the development of new parallel algorithms for solving LP problems and the revision of current algorithms have become imperative.As examples, we can cite the works [4][5][6][7][8][9].The development of new parallel algorithms for solving large-scale linear programming problems involves testing them on various benchmarks.One of the most well-known benchmark repositories of linear programming problems is the Netlib-Lp benchmark suite [10].The solutions to all the problems from this repository are known.At the same time, in practice, it is often necessary to test a new algorithm on certain problems with unknown solutions.When testing an LP solver on such classes of problems, there is a need for validation (certification) and refinement of the obtained solution.
Several works have been devoted to the problem of certification and refinement of LP solutions.The paper [11] presents the LPlex system, which verifies and repairs a given solution to an LP problem for feasibility and optimality using exact arithmetic to guarantee the correctness of the results.The LPlex system can solve medium to large LP problems to optimality.Based on exact arithmetic (integer, rational, or modular), LPlex implements a module to detect block structures in matrices [12] and supports LU-factorizations of sparse matrices, the Bareiss method [13,14], and the Wiedemann method [15].The main drawback of the approach is that LPlex fails if the certified solution is not close enough to the optimal one.Koch [16] modified this approach to computing optimal solutions for the full set of Netlib-Lp instances.Rather than attempting to repair a nonoptimal basis with rational pivots, Koch recomputes a floatingpoint solution using greater precision in the floating-point representations.He employed the long double type that specifies 128-bit values.In [17], Applegate and co-authors extend Koch's methodology with an implementation that dynamically increases the precision of floating-point computations until a rational solution satisfying the optimality condition is obtained.They modify the conventional simplex algorithm by changing every floating-point type into the rational type provided by the GNU multiple precision arithmetic library (GMP) [18] and replacing every arithmetic operation in the original code with the corresponding GMP operations.The program starts with the best native floating-point precision and then increases it by about 50 % at each iteration (keeping the precision value a multiple of 32 bits to align with the typical word size).The main drawback of this approach is that the use of the multiple-precision arithmetic in the case of large and complex LP problems has high overheads.In [19], Panyukov and Gorbik try to overcome this disadvantage by using parallel computing on distributed memory.In this paper, they utilize rational arithmetic and propose two approaches for parallelizing the simplex method.The first method is based on the decomposition of the simplex tableau by columns.The second method is based on the modified simplex method using the inverse matrix and exploits the decomposition of the original matrix by columns and that of the inverse matrix by rows.However, the results of computational experiments are not sufficiently convincing for the following reasons: there is no comparison with the best sequential solutions; the computations were performed using only three sparse LP problems (the number of nonzero elements did not exceed 5 %); and, in addition, the scalability bound of the proposed parallel algorithm was only 16 processors.Another original approach is suggested by Gleixner and co-authors in [20].This paper describes an iterative refinement procedure for computing extended-precision or exact solutions to LP problems.Arbitrarily precise solutions can be computed by solving a sequence of closely related LPs with limited-precision arithmetic.These LPs share the same constraint matrix as the original problem instance and are transformed only by modification of the objective function, right-hand sides, and variable bounds.This implementation is publicly available as an extension of the academic LP solver SoPlex.
All the methods discussed above concentrate on refining the approximate solution that has already been found.If the found solution is too far from the correct one, which means that there is an error in the algorithm, then the use of these methods becomes impractical.In addition, all of these algorithms have high computational complexity and do not allow efficient parallelization on large cluster computing systems.The method proposed in this article focuses on debugging and validating new LP algorithms on cluster computing systems.It is implemented as a parallel program, VaLiPro (Validator of Linear Program), which shows good scalability on multiprocessor computing systems with distributed memory.The rest of the paper is organized as follows.Section 1 provides a formal description of the proposed method for validating solutions to LP problems and presents a sequential version of the VaLiPro algorithm.The parallel version of the VaLiPro algorithm is discussed in section 2. Section 3 describes the implementation of the VaLiPro parallel algorithm in C++ using the BSF-skeleton.Here, we present the results of computational experiments on a cluster computing system, which confirm the efficiency of the proposed approach.In conclusion, we summarize the obtained results and expose plans for using the VaLiPro validator in the development of an artificial neural network capable of solving large-scale LP problems.

Method for Validating Solutions to LP Problems
Let the following linear programming problem be given in the Euclidean space R n : .By definition, the set M is convex and closed.From now on, we assume that M is a nonempty bounded set, i.e., problem (1) has at least one solution.Let x ∈ R n be an approximate solution of problem (1) obtained using some LP solver that must be certified.
The main idea of the VaLiPro validation method is to construct a finite set of points V covering a hypersphere S of small (compared to the size of the polytope M ) radius ρ centered at the certified solution point x: Here and below, • denotes the Euclidean norm.Let us compute the maximum of the objective function on the set If c, v − c, x < ε, then the approximation x is considered correct.Otherwise, x is considered an incorrect solution.Here, ε ∈ R >0 is a small positive constant that is a parameter of the validation algorithm.
Let us describe the method for constructing the validation set V .It is known [21] that the coordinates of any point v = (v 1 , . . ., v n ) lying on the surface of the hypersphere S defined by the equation can be represented as follows: where 0 φ j π (j = 1, . . ., n − 2) and 0 θ < 2π.Let us explain the method for generating the validation set V using a three-dimensional sphere (see Fig. 1).Fix an odd number of parallels d 3 (poles are excluded).Set . . .
The described method for generating points of the validation set for n 3 in the general case is given in Algorithm 1a.The nested loops with the headers in steps 2, 4, . . ., 7, and 9 generate the following spherical coordinates of a validation point: Algorithm 2 Function g (calculating the point v by its number k) 3: for j = (n − 3) . . .0 do 6: end for v 1 := ρ cos(u 1 ϕ) 12: for j = 2 . . . (n − 2) do 13: := sin(u j−1 ϕ) 14: v j := ρ cos(u j ϕ) 15: end for 16: v n := ρ cos(u n ϕ) 19: In steps 11-18, the spherical coordinates are converted to Cartesian coordinates by Equations (2).Multiplying the quantities of iterations of for-loops with headers 2, 4, . . ., 7, and 9, we conclude that Algorithm 1a outputs 2d(d + 1) n−2 validation points.However, there will be duplicates among the output points.The computational experiments showed that if one sets the dimension n = 4 and the number of parallels d = 5, then Algorithm 1a generates 189 duplicates with a total number of points equal to 360, which is more than 50 %.The duplicates are generated at iterations in which φ i = 0 or φ i = π, which corresponds to j i = 0 and The reason is that one of the factors sin(φ i ) in ( 2) is equal to zero in this case, and therefore the variations of other factors cannot change the value of the corresponding coordinate.This issue can be solved without a major revision of Algorithm 1a, by changing the start values and end values of the control variables in loop headers 4, . . ., 7, and 9, as is done in Algorithm 1b.This algorithm generates a validation set without duplicates but, at the same time, it loses a certain number of unique points.With n = 4 and d = 5, this quantity is 11, which is less than 7 % of the whole set after removing duplicates.Experiments have shown that such a loss does not significantly affect the accuracy of the validation algorithm.The number of points of the validation set V generated by Algorithm 1b is determined by the following equation: ( The main drawback of Algorithm 1b is that the number of nested loops depends on the problem dimension, which does not allow using the dimension as a program parameter.To overcome this drawback, we use a vector-valued function that calculates the coordinates of a point by its sequential number k in the point sequence generated by Algorithm 1b (counting starts from zero).The definition of this function is given in Algorithm 2.
The final implementation of the VaLiPro method, using the vector-valued function g, is given in Algorithm 3.An additional parameter of this algorithm is the small positive constant ε (by default, ε = 10 −6 ), which compensates for possible numerical errors when comparing the values of the objective function in Step 5. Let us make several brief comments on the steps of Algorithm 3. Step 1 reads the source data of LP problem (1), the algorithm parameters, and the solution x that is to be certified.Step 2 calculates the angle ϕ according to Equation (3).Step 3 begins the loop that varies the point number k from 0 to 2d(d − 1) n−2 − 1 as per Equation (5).Using the vector-valued function g (see Algorithm 2), Step 4 computes the next validation point v.
Step 5 checks whether v belongs to the feasible region of problem (1) and compares the objective-function values at the points v and x.If the objective function takes a larger value at the point v, and this point is feasible, then the control is passed to Step 9, which prints a message stating that the certified solution is not correct.Otherwise, the next iteration of the loop if Av b & c, v > c, x + ε goto 9 6: end for 7: output "Solution is correct" 8: goto 10 9: output "Solution is incorrect" 10: stop proceeds.If the loop ends naturally, the control is passed to Step 7, which outputs a message saying that the solution is correct.After that, the control is passed to Step 10, which completes the execution of the algorithm.

Parallel Algorithm for Validating LP Solutions
According to Equation ( 5), the cardinality of the validation set generated by Algorithm 3 depends exponentially on the space dimension.Therefore, Algorithm 3 has high computational complexity for large dimensions.To reduce computational overheads, we developed a parallel version of Algorithm 3, given as Algorithm 4 below.The parallel algorithm is based on the BSF parallel computation model [22,23], which exploits the master-slave paradigm [24].According to the BSF model, the master node serves as a control and communication center.All slave nodes execute the same code but on different data.The BSF model assumes the algorithm representation in the form of operations on lists using the higher-order functions Map and Reduce defined by the Bird-Meertens formalism [25].The higher-order function Map transforms the original list W = [w 0 , . . ., w K−1 ] into the list Z = [z 0 , . . ., z K−1 ] by applying the function f x to each element: In the case considered here, the elements of the list W are the sequential numbers of the validation set points, that is, where K = 2d(d − 1) n−2 .The Boolean function f x : {0, . . ., K − 1} → {true, false} is defined as follows: where the vector-valued function g computes the coordinates of the validation point by its number w.The function f x returns true if the point g(w) belongs to the feasible region and if the value of the objective function at this point is less than or equal to the value of the objective function at the point x.Otherwise, the function f x returns false.Thus, the list Z = [z 0 , . . ., z K−1 ] contains Boolean indicators for all points of the validation set.If at least one element in this list has the value false, then the point x is an incorrect solution of problem (1).

Algorithm 4 Parallel algorithm for validating an LP solution
Master Slave (l =0,. . .,L- output "Solution is correct" output "Solution is incorrect" 13: end if 14: stop 13: 14: stop The higher-order function Reduce transforms the list Z = [z 0 , . . ., z K−1 ] into a single Boolean value s by iteratively applying the conjunction operation to all the elements of the list Z: In Step 4 of Algorithm 4, the l-th slave sets its own part W l of the list W : Here, L denotes the number of slaves.For simplicity, we assume that K is a multiple of L. In Step 5, the slave applies the Map function to its sublist W l .In Step 6, the resulting sublist of Boolean values is folded into a single Boolean value s l by applying the Reduce function, taking the conjunction operation as the first parameter.In Step 7, the l-th slave sends the value s l to the master.In the same Step 7, the master receives all the calculated values from the slaves.In Step 8, the master folds the list of received values into a single Boolean value s using the Reduce function.In steps 9-12, the master examines the calculated Boolean value s and outputs the corresponding conclusion.

Software Implementation and Computational Experiments
We implemented the parallel Algorithm 4 in C++ through the parallel BSF-skeleton [26], which is based on the BSF parallel computation model [22,23] and encapsulates all aspects related to the parallelization of the program using the MPI library [27].The source code of the VaLiPro parallel program is freely available at https://github.com/leonid-sokolinsky/BSF-LPP-Validator.Using this program, we conducted large-scale computational experiments on the cluster computing system "Tornado SUSU" [28].The specifications of this system are given in Tab. 1.For experiments, we used random LP problems generated by the program FRaGenLP [29] with the following parameters: α = 200 (the length of the bounding-hypercube edge), θ = 100 (the radius of the large hypersphere), ρ = 50 (the radius of the small hypersphere), L max = 0.35 (the upper bound of near parallelism for hyperplanes), S min = 100 (the minimum acceptable closeness for hyperplanes), a max = 1000 (the upper absolute bound for the coefficients), and b max = 10 000 (the upper absolute bound for the constant terms).The experiments were conducted for the following dimensions: n = 15, n = 17, and n = 19.The numbers of inequalities were 46, 52, and 58, respectively.The solutions to the LP problems were obtained using the apex method [8].Throughout the experiments, we used the following VaLiPro parameters: d = 5, ρ = 1, and ε = 10 −6 .The results of the experiments are shown in Fig. 2. The verification of a solution for a problem of dimension n = 19 with a configuration consisting of the master node and one slave node took 17 minutes.The verification of a solution for the same problem with a configuration consisting of the master node and 310 slave nodes took 4 seconds.The analysis of the results showed that the scalability bound (the maximum of the speedup curve) of the algorithm significantly depends on the dimension of the problem.For n = 19, the parallel version of the VaLiPro algorithm demonstrated near-linear scalability up to 310 processor nodes.For n = 17, the scalability bound was approximately 260 nodes, and for n = 15, this bound decreased to 60 processor nodes.This is because a problem of such a small dimension is not able to load such a large number of processor nodes: the time spent on data transfer over the network begins to dominate over the time spent on calculations, and the processors begin to stand idle.

Conclusions
The article presents the parallel algorithm VaLiPro for validating linear programming solutions on cluster computing systems.The main idea of the validation algorithm is to generate a regular set of points on a small-radius hypersphere centered at the solution point that is to be certified.The solution is considered correct if all points of the validation set belonging to the feasible region have lower values of the objective function than does the solution point being certified.The implementation of the parallel algorithm VaLiPro was performed in C++ using the parallel BSFskeleton, which encapsulates in the problem-independent part of its code all aspects related to the parallelization of a program using the MPI library.The source code of the developed parallel program is freely available at https://github.com/leonid-sokolinsky/BSF-LPP-Validator.The proposed validation method is generic and suitable for linear programming problems of any kind.The advantage of the parallel VaLiPro algorithm is the near-linear speedup starting with The main drawback that limits the practical use of the suggested method is the exponential growth of the number of points in the validation set as the dimension of the space increases; this results in the exponential growth of the computational complexity.In practice, the proposed algorithm can be effectively used for LP problems with a space dimension of no more than 20.The described algorithm was used together with the FRaGenLP generator and the apex method to prepare a training dataset of 70 000 examples, which will be used to develop an artificial neural network capable of solving multidimensional linear programming problems.

0Figure 1 .
Figure 1.Plotting the points of the validation set V on a three-dimensional sphere with d = 5where c is the vector of the objective function coefficients.Here and below, •, • stands for the dot product of vectors.Let us define M = {x ∈ R n | Ax b} as the feasible region of problem(1).By definition, the set M is convex and closed.From now on, we assume that M is a nonempty bounded set, i.e., problem (1) has at least one solution.Let x ∈ R n be an approximate solution of problem (1) obtained using some LP solver that must be certified.The main idea of the VaLiPro validation method is to construct a finite set of points V covering a hypersphere S of small (compared to the size of the polytope M ) radius ρ centered at the certified solution point x:

Figure 2 .
Figure 2. Speedup curves of the VaLiPro parallel algorithm for various dimensions

Table 1 .
Specifications of the "Tornado SUSU" computing cluster