Animal Breeding Projects

Across the majority of livestock species, ultrasound imaging on live animals is a useful selection tool for carcass-related traits of economic importance. For animals that have the potential to become sires or dams within an elite nucleus population, carcass measurements aren't possible, and the only way to get carcass data is through siblings or their progeny. As a result, genetic selection companies collect ultrasound imaging data to understand carcass leanneas and marbeling of important primals. Historically, annotation of the images to identify what certain parts of the image represented was done using trained ultrasound technicians. Below would be an example of an image with annotations on what the different parts represent.

On a rainy weekend, I started to play around with deep-learning frameworks such as YOLO and UNet to identify and segment certain parts of the image. Once I got the hang of using deep-learning models, using a few Kaggle datasets, I tested it to see if it would work on the ultrasound images. Long-story short it worked very well (e.g. > 98 % accuracy) and eliminated the need to manually annotate images to generate carcass phenotypes for routine genetic evaluations. The final pipeline is outlined below for each image:

A YOLO model to identify the ribs and backfat areas, and if enough are found, move on; otherwise, stop.
A UNet model to segment the image into a backfat section and loin section.
A VGGNet model to predict intramuscular fat percent based on the segmented loin section.

Across all major livestock species, the relationship between an animal's growth rate or production level and its feed consumption during this time is a key trait in the economic equation that determines which traits to select for. In general, it is much easier to measure the growth rate component using start and end weights, but average feed intake during the time period is much more difficult and time-consuming. Individual feed intake data has been collectible in pigs since the 1990s, but the amount of work and cost required to obtain this information has limited its widespread adoption on a large scale. In most cases, the data has been collected on a small subset of selected animals within a generation. Given, the importance of these two traits, I used Geno-Diver to understand what the impact on selection for growth-rate and feed intake would be if you could increase the percentage of animals that had phenotypes prior to selection. The results were you could double genetic progress on feed intake along with a slight increase in growth rate even if heritability was reduced by 10 percent.

Given these results, I began working with a volumetric-based feeder design that adjusts the allowed time between feed drops based on the average feeding rate of pigs at different ages. This approach greatly reduces the number of weight sensors that need to be calibrated and checked, while still enabling feed intake tracking for individual pigs. Multiple iterations of the feeders were tested, and improvements were made to the timing of feed drops based on age and sex. Ultimately, a low-maintenance feed intake recording system was developed, and the company we collaborated with incorporated the modified feeder into their product portfolio.

Optimization Projects

Within any business process, optimization models play a pivotal role in quickly and efficiently generating outcomes that satisfy a variety of constraints. With a clear understanding of the costs, decision variables, and constraints involved, it becomes relatively straightforward to build an optimization model to address real-world problems. One of the first models I developed—using simulated data—was a feed mill transportation problem, solved using a free optimization library in Python. From there, I progressed to more complex models using real data across a range of use cases, including:

Determining which plant a load of pigs should be sent to in order to minimize transportation costs while meeting plant capacity requirements.
Assigning load crews to pig shipments to minimize crew travel time and reduce the risk of disease transmission.
Scheduling the marketing of pigs based on predicted growth rates to maximize sort loss efficiency and minimize trucking costs.
Calculating the optimal sale weight by factoring in current market prices, feed costs, growth rates, and mortality risks.

Below is an example of how, in Python, I simulated four different feed mills—each with unique diet manufacturing costs, feed orders requiring different diet types based on the animal’s stage, and varying distances to each feed mill.

import pandas as pd
import numpy as np
from pulp import LpProblem, LpVariable, LpMaximize, LpMinimize, LpStatus, lpSum, value
import random
## Feed Mills
feed_mill = ["Mill1","Mill2","Mill3","Mill4"]
## Diet Types
diet_types = ["NUR1","NUR2","FIN1","FIN2","FIN3","LAC","GES"]
## Baseline costs / lb by Diet Type
baseline_costs = {"NUR1": 1.60,"NUR2": 1.30,"FIN1": 1.10,"FIN2": 1.00,
"FIN3": 0.85,"LAC": 1.35,"GES": 0.95}
# Difference in costs by mill
feedmill_costs = {"Mill1": -0.20,"Mill2": -0.01,"Mill3":0.15,"Mill4":0.10}
## Cost_lb by diet type and mill
costs_long = []
for diet in diet_types:
for feed in feed_mill:
cost=round(baseline_costs[diet]+feedmill_costs[feed]+random.uniform(-0.05,0.05),2)
costs_long.append({"feed_mill": feed, "diet_type": diet, "cost_lb": cost})
costs_df = pd.DataFrame(costs_long)
## Number of Orders (each order is a farm site)
loads = 25
## Simulate 'n' Feed Orders
orders = []
for i in range(loads):
if random.random() < 0.75:
feeds = [random.choice(diet_types)] # Full order of a single feed type
else:
group = random.choice([["NUR1","NUR2"],["FIN1","FIN2","FIN3"],["LAC","GES"]])
feeds = random.sample(group, random.randint(2, min(3, len(group)))) #split order
orders.append({'order': i+1, 'diet_type': feeds})
## turn it into pandas dataframe and add feed lbs
order_df = (pd.DataFrame(orders)
.explode('diet_type')
.reset_index(drop=True)
.assign(count = lambda x: x.groupby('order').transform('count'))
.assign(feedlbs = lambda x: round(26000 / x['count'],0))
.filter(['order','diet_type','feedlbs'])
)
## randomly create mileage data
mileage_data = {'order': [],'feed_mill': [],'miles': []}
for i in range(loads):
for mill in feed_mill:
mileage_data['order'].append(i+1)
mileage_data['feed_mill'].append(mill)
mileage_data['miles'].append(random.randint(5, 100)) # Random miles between 5 and 100
mileage_df = pd.DataFrame(mileage_data)

Using the simulated dataset I created the following constraints:

Maximum total pounds of feed available at each feed mill.
Maximum pounds of feed available for each specific diet type at a feed mill.
If an order includes multiple diet types, all must be supplied by the same feed mill.

Below is the optimization model that meets the above contraints and minimizes manufacturing costs and miles:

## build cost matrix
cost_matrix = (order_df
## take every order to every mill
.merge(pd.DataFrame(costs_df['feed_mill'].drop_duplicates()), how='cross')
## bring in miles from feed mill to grower
.merge(mileage_df,on=['order','feed_mill'], how='left')
## bring diet cost for each mill
.merge(costs_df, on=['feed_mill','diet_type'], how='left')
## get costs
.assign(cost_miles = lambda x: (x['miles'] * 4.0))
.assign(cost_feed = lambda x: (x['feedlbs'] * x['cost_lb']))
.assign(cost = lambda x: (x['cost_feed'] + x['cost_miles']))
.assign(cost_lb = lambda x: x['cost'] / x['feedlbs'])
.filter(['order','diet_type','feed_mill','cost_lb'])
)
mill_max = (costs_df[['feed_mill']]
.drop_duplicates()
.assign(feedlbs = order_df['feedlbs'].sum()*0.40)
)
dietmill_max = (costs_df[['feed_mill','diet_type']]
.drop_duplicates()
.assign(feedlbs = order_df['feedlbs'].sum()*0.40)
## mill 4 is only sow farm feed
.assign(feedlbs=lambda x: np.where((x['feed_mill'].isin(['Mill1','Mill2','Mill3']))
& (x['diet_type'] == 'LAC'), 0, x['feedlbs']))
.assign(feedlbs=lambda x: np.where((x['feed_mill'].isin(['Mill4'])) &
(x['diet_type'] == 'LAC'), order_df.query('diet_type == "LAC"').sum()['feedlbs'],
x['feedlbs']))
.assign(feedlbs=lambda x: np.where((x['feed_mill'].isin(['Mill1','Mill2','Mill3']))
& (x['diet_type'] == 'GES'), 0, x['feedlbs']))
.assign(feedlbs=lambda x: np.where((x['feed_mill'].isin(['Mill4'])) &
(x['diet_type'] == 'GES'), order_df.query('diet_type == "LAC"').sum()['feedlbs'],
x['feedlbs']))
)
## set input
op_input_cost=cost_matrix.set_index(['order','diet_type','feed_mill']).to_dict()['cost_lb']
op_orders=set(zip(cost_matrix['order'],cost_matrix['diet_type'],cost_matrix['feed_mill']))
op_ordermill=set(zip(cost_matrix['order'],cost_matrix['feed_mill']))
op_orderdiet=set(zip(cost_matrix['order'],cost_matrix['diet_type']))
op_dietmill=set(zip(dietmill_max['feed_mill'],dietmill_max['diet_type']))
op_demand=order_df.set_index(['order','diet_type']).to_dict()['feedlbs']
op_overall_supply=mill_max.set_index(['feed_mill']).to_dict()['feedlbs']
op_dietmill_supply=dietmill_max.set_index(['feed_mill','diet_type']).to_dict()['feedlbs']
op_mill=(cost_matrix.drop_duplicates('feed_mill')).feed_mill
op_order=(cost_matrix.drop_duplicates('order')).order
# 1. Initialise model
model = LpProblem("MinimizeFeedCosts", LpMinimize)
# 2. Define Decision Variables
deliveries=LpVariable.dicts('order_mill',[(o,d,m) for o, d, m in op_orders],lowBound=0,cat='Continuous')
mill_assignment=LpVariable.dicts('assign_mill', [(o,d,m) for o, d, m in op_orders],cat='Binary')
# used to force orders to a single mill
mill_used_load = LpVariable.dicts('millused',[(o,m) for o, m in op_ordermill],cat='Binary')
total_mill = LpVariable.dicts('millcount',[(o) for o in op_order], cat='Continuous')
# 3. Define objective function: costs to ship feed for a load for a given feed mill
model += lpSum([op_input_cost[(o,d,m)] * deliveries[(o, d, m)] for o, d, m in op_orders])
# 4. Define constraints:
### Constraint 1: For each order diet demand met
for o, d in op_orderdiet:
model += lpSum([deliveries[(o, d, m)] for m in op_mill]) == op_demand[(o, d)]
# Add mill assignment constraint
model += lpSum([mill_assignment[(o, d, m)] for m in op_mill]) == 1
### Constraint 1.1: Deliveries for a diet type and order can only occur if corresponding mill is assigned
for o, d in op_orderdiet:
for m in op_mill:
model += deliveries[(o, d, m)] <= op_demand[(o, d)] * mill_assignment[(o, d, m)]
### Constraint 2: total feed from a mill has to be less than supply of a given mill
for m in op_mill:
model += lpSum([deliveries[(o, d, m)] for o, d in op_orderdiet]) <= op_overall_supply[m]
### Constraint 3: total feed from a mill/diet has to be less than supply of a given mill/diet
for m, d in op_dietmill:
tmp_order_comb = [(o) for o, d_inner, m_inner in op_orders if d == d_inner and m == m_inner]
model += lpSum([deliveries[(o, d, m)] for o in tmp_order_comb]) <= op_dietmill_supply[m, d]
### Constraint 4: Ensure all diet types for an order come from the same mill
#Link processor load-suf assignments to processor assignments
for o, m in op_ordermill:
tmp_diet_comb = [(d) for o_inner, d in op_orderdiet if o == o_inner]
model +=(mill_used_load[(o, m)] * 5 >= (lpSum(mill_assignment[(o, d, m)] for d in tmp_diet_comb)))
model +=(mill_used_load[(o, m)] <= lpSum(mill_assignment[(o, d, m)] for d in tmp_diet_comb))
# total processors for a load
for o in op_order:
model += (total_mill[o] == lpSum([mill_used_load[(o, m)] for m in op_mill]))
## An order can only come from one mill
for o in op_order:
model += (total_mill[o] == 1)
## optimize model
solver = PULP_CBC_CMD(threads=4, cuts=1, gapRel=0.01)
model.solve(solver)
## Generate Solution
variable_values = [(var.name, var.value()) for var in model.variables()]
solution = pd.DataFrame(variable_values, columns=["Variable", "Value"])
solution = solution[solution['Variable'].str.contains('assign_mill')]
solution = (solution
.assign(order=(solution['Variable'].str.split(",").str[0]).str.replace("assign_mill_(","").astype(int),
diet_type=(solution['Variable'].str.split(",_'").str[1]).str.replace("'",""),
feed_mill=(solution['Variable'].str.split(",_").str[2]).str.replace("'","").str.replace(")",""))
.filter(['order','diet_type','feed_mill','Value'])
.query('Value == 1.0')
.merge(order_df,on=['order','diet_type'],how='left')
.sort_values(by=['order'])
.filter(['order','diet_type','feed_mill','feedlbs'])
)

The 'solution' dataframe has for each order which mill the order should come from. The full python code can be downloaded here .

Predictive Modeling Projects