Advanced Example - Wine Manufacturer ##################################### Explanation and Description =========================== In this example we take the point of view of a wine manufacturer. Thanks to available open source data :cite:t:`wine`, which was accessed `here `_, we can build a ML predictor for wine quality given a set of attributes of the wine. This can either be modelled as a regression model, with the quality being in the range [0,1], or some quality threshold can be set, and the task made into a classification task. For this exercise we model it is a regression task. The goal of the MIP in the following example is to create a diverse bouquet of wines with the highest average quality. To do this, we make an assumption that the features of the grapes and resulting wine are identical. In this example we can purchase grapes from a set of suppliers, all of whom have grapes that by themselves would result in low quality wine. However, we can blend the grapes such that the predicted quality of the wine increases. For additional constraints, each supplier has a limit on the amount of grapes that can be sold, and an associated cost per unit, where we have a total budget. Finally, we assume that each wine of the created bouquet will have the same volume. We describe the MIP formulation mathematically: - Let :math:`I` be the index set of the wine bouquet that will be created. - Let :math:`J` be the index set of features for each grape / wine. - Let :math:`K` be the index set of vineyards where the grapes can be purchased - Let :math:`x_{i,j}` be the value of feature j for wine i - Let :math:`y_i` be the quality of the wine i - Let :math:`m_{i,k}` be the amount of grapes from vineyard k used in wine i - let :math:`f` be the ML predictor that determines the quality of a wine - Let v[j][k] be the constant values of feature j from vineyard k - Let c[k] be the constant cost of one unit of grapes from vineyard k - Let a[k] be the amount of grapes available from vineyard k - Let B be the total budget available .. math:: \begin{align*} &\text{max } & \sum_i y_i & \\ &s.t. & x_{i,j} = \sum_k m_{i,k} * v[j][k] & \quad \forall i, j \\ & & \sum_k m_{i,k} = 1 & \quad \forall i \\ & & \sum_i m_{i,k} \leq a[k] & \quad \forall k \\ & & y_i = f(x_{i,1}, ..., x_{i, |J|}) & \quad \forall i \\ & & \sum_{i,k} m_{i,k} * c[k] \leq B \end{align*} Code Walkthrough ================= Below we will introduce a function for creating this example with a variety of ML frameworks that can train gradient boosting decision trees and random forests. This example is taken directly from one of the tests in the GitHub repository. See more examples by searching there. .. code-block:: python def build_and_optimise_wine_manufacturer( seed=42, n_vineyards=35, n_wines_to_produce=5, min_wine_quality=4.25, sklearn_xgboost_lightgbm="sklearn", gbdt_or_rf="rf", n_estimators=3, max_depth=3, epsilon=0.0001, ): assert sklearn_xgboost_lightgbm in ("sklearn", "xgboost", "lightgbm") assert gbdt_or_rf in ("gbdt", "rf") # Path to red wine data data_dict = read_csv_to_dict("./tests/data/wineQualityReds.csv") features = [ "fixed.acidity", "volatile.acidity", "citric.acid", "residual.sugar", "chlorides", "free.sulfur.dioxide", "total.sulfur.dioxide", "density", "pH", "sulphates", "alcohol", ] n_features = len(features) budget = 1.7 * n_wines_to_produce # Generate the actual input data arrays for the ML predictors X = [] quality = np.array([float(x) for x in data_dict["quality"]]).reshape( -1, ) for feature in features: X.append(np.array([float(x) for x in data_dict[feature]])) X = np.swapaxes(np.array(X), 0, 1) # Train the ML predictor if sklearn_xgboost_lightgbm == "sklearn": if gbdt_or_rf == "rf": reg = RandomForestRegressor( random_state=seed, n_estimators=n_estimators, max_depth=max_depth ).fit(X, quality) else: reg = GradientBoostingRegressor( random_state=seed, n_estimators=n_estimators, max_depth=max_depth ).fit(X, quality) elif sklearn_xgboost_lightgbm == "xgboost": if gbdt_or_rf == "gbdt": reg = XGBRegressor( random_state=seed, n_estimators=n_estimators, max_depth=max_depth ).fit(X, quality) else: reg = XGBRFRegressor( random_state=seed, n_estimators=n_estimators, max_depth=max_depth ).fit(X, quality) else: if gbdt_or_rf == "gbdt": reg = LGBMRegressor( random_state=seed, n_estimators=n_estimators, max_depth=max_depth ).fit(X, quality) else: reg = LGBMRegressor( random_state=seed, n_estimators=n_estimators, max_depth=max_depth, boosting_type="rf", bagging_freq=1, bagging_fraction=0.5, ).fit(X, quality) # Create artificial data from some vineyards np.random.seed(seed) vineyard_order = np.arange(X.shape[0]) np.random.shuffle(vineyard_order) vineyard_litre_limits = np.random.uniform(0.25, 0.35, n_vineyards) vineyard_costs = np.random.uniform(1, 2, n_vineyards) vineyard_features = [] low_quality_vineyards_i = 0 for i in vineyard_order: if low_quality_vineyards_i >= n_vineyards: break if quality[i] <= 5: low_quality_vineyards_i += 1 vineyard_features.append(X[i]) vineyard_features = np.array(vineyard_features) # Create the SCIP Model scip = Model() # Create variables deciding the features of each wine feature_vars = np.zeros((n_wines_to_produce, n_features), dtype=object) quality_vars = np.zeros((n_wines_to_produce, 1), dtype=object) wine_mixture_vars = np.zeros((n_wines_to_produce, n_vineyards), dtype=object) for i in range(n_wines_to_produce): quality_vars[i][0] = scip.addVar(vtype="C", lb=0, ub=10, name=f"quality_{i}") for j in range(n_features): max_val = np.max(X[:, j]) min_val = np.min(X[:, j]) lb = max(0, min_val - 0.1 * max_val) ub = 1.1 * max_val feature_vars[i][j] = scip.addVar(vtype="C", lb=lb, ub=ub, name=f"feature_{i}_{j}") for k in range(n_vineyards): wine_mixture_vars[i][k] = scip.addVar( vtype="C", lb=0, ub=vineyard_litre_limits[k], name=f"mixture_{i}_{k}" ) # Now create constraints on the wine blending for i in range(n_wines_to_produce): for j in range(n_features): scip.addCons( feature_vars[i][j] == quicksum( wine_mixture_vars[i][k] * vineyard_features[k][j] for k in range(n_vineyards) ), name=f"mixture_cons_{i}_{j}", ) for i in range(n_wines_to_produce): scip.addCons( quicksum(wine_mixture_vars[i][k] for k in range(n_vineyards)) == 1, name=f"wine_mix_{i}", ) for k in range(n_vineyards): scip.addCons( quicksum(wine_mixture_vars[i][k] for i in range(n_wines_to_produce)) <= vineyard_litre_limits[k], name=f"vineyard_limit_{k}", ) # Add the budget constraint scip.addCons( quicksum( quicksum(wine_mixture_vars[i][k] * vineyard_costs[k] for k in range(n_vineyards)) for i in range(n_wines_to_produce) ) <= budget, name=f"budget_cons", ) # Add the ML constraint. Add in a single batch! pred_cons = add_predictor_constr( scip, reg, feature_vars, quality_vars, unique_naming_prefix="wine_", epsilon=epsilon ) # Add a constraint ensuring minimum wine quality on those produced for i in range(n_wines_to_produce): scip.addCons(quality_vars[i][0] >= min_wine_quality, name=f"min_quality_{i}") # Set the SCIP objective scip.setObjective( quicksum(quality_vars[i][0] for i in range(n_wines_to_produce)) / n_wines_to_produce, sense="maximize" ) return scip Two important things to note in the insertion of the ML predictors. - While it was a single function call to insert the ML predictor, the ML predictor was actually added n many times. Specifically, the same ML predictor was inserted for each of the generated wines - When using decision trees, or any ML predictors that are based on decision trees, it is important to be aware of the epsilon value. In the above example we set a default value of 0.0001. This ensures that the output of the ML predictor matches the ML framework, but it removes a small portion of the feasible region, risking getting an infeasible model for a feasible problem. This is by default 0, because while the error can be arbitrarily large for decision trees, the error stems only from a numerically insignificant perturbation of the input Back to the example: We can then create various models with different characteristics and different ML frameworks by changing the parameters of the input. For example, to insert ML predictors from XGBoost using Random Forests, we would do the following: .. code-block:: python # Build the SCIP model with embedded ML predictors scip = build_and_optimise_wine_manufacturer( seed=42, n_vineyards=35, n_wines_to_produce=5, min_wine_quality=4.25, sklearn_xgboost_lightgbm="xgboost", gbdt_or_rf="rf", n_estimators=3, max_depth=3, epsilon=0.0001 ) # Optimise the SCIP model scip.optimize() # We can check the "error" of the MIP embedding via the difference between SKLearn and SCIP output if np.max(pred_cons.get_error()) > 10**-3: error = np.max(pred_cons.get_error()) raise AssertionError(f"Max error {error} exceeds threshold of {10 ** -3}") return scip