Econometrics uses statistical and probability theory to obtain empirical models that represent a real-life economic process. In most cases, econometric modeling entails using a functional form with parameters estimated using real-life data. In doing so, it is essential to ensure that: 1) the functional form provides a meaningful representation of the phenomenon being modeled, in terms of the actual equation used and the variables, dependent(s) and independent; 2) the data contain sufficient variability to capture the wide range of behaviors exhibited in real-life; and 3) appropriate estimation procedures are used to calibrate the model parameters. These considerations play an important role in influencing the selection of the econometric techniques to estimate freight mode choice.

Freight mode choice could be studied at the level of markets or at the shipment-level, which are also referred to as “Aggregate” and “Disaggregate,” respectively. Aggregate models seek to explain the collective decisions of the hundreds, or thousands, of individual decision-makers that are implicitly captured in the market share of a given freight mode; hence the use of the name “market share” models. In contrast, disaggregate or shipment-level freight mode choice models, seek to represent the decision-making process used when a shipper decides on the freight mode to be used to transport a specific shipment.

It is widely acknowledged that shipment-level models are the best methodological alternative for several reasons, as they: 1) establish a more direct connection between the choice being modeled and the independent variables; 2) use the data in the most efficient way; and 3) enable a seamless incorporation of policy variables.

The branches of econometrics used to estimate these models (market-share and shipment-level) are: 1) continuous dependent variable models; and 2) discrete outcome models. The former, as suggested by the name, allow the dependent variable to take decimal values, e.g., to estimate the percent of all shipments that will use rail. The latter models estimate the choice among discrete outcomes, e.g., to select rail or truck. These models are described next. Table 13 shows the main types of econometric methods used.

**Continuous dependent variable models**

These models estimate mode choice as a continuous dependent variable, typically the market share of a freight mode. Frequently used independent variables are: (1) average values of modal attributes, such as costs, travel times, distances; and (2) average values of shipment attributes, such as shipment size, commodity type, origin, or destination. Different mathematical functional forms are available. One of the most appealing forms is the logistic form shown in Equation (1), as it is able to replicate the range of values of market shares that are bound between zero and one.

As shown, Equation (1) expresses the market share of a given mode i as a function of its attributes and of the other alternatives. The higher the performance of a given mode, the higher the market share estimated by Equation (1). In the case of two modes, i, j, Equation (1) reduces to:

Equation (2) could be linearized by taking natural logarithms and obtaining the inverse, then expressing Ui as a function of corresponding explanatory variables, by which one obtains Equation (3). Thus, the parameters of the models can be obtained by Ordinary Least Squared (OLS) regression and similar techniques.

In the case of multiple modes the estimation becomes a set of N-1 linear equations, as shown in Equation (4), where N is the number of mode choice options

However, since OLS cannot solve the system of linear equations, estimating market share models for multiple modes requires advanced modeling approaches such as maximum likelihood estimation.

**Discrete choice models**

These models estimate freight mode choice by computing the probability that the corresponding decision-maker selects mode i to transport shipment n. Hence, these models are disaggregate in nature, where the dependent variable is discrete. Discrete choice models, also known as discrete outcome models, are based on the Random Utility Theory (RUT), which states that the user selects the choice that maximizes the utility derived from it. However, since it is not possible to have perfect information about the alternatives considered on the nature of the decision process, the choice problem is expressed in probabilistic terms. Thus, the utility of choice ‘i’ for shipment ‘n’ (Uin) is given by Ben-Akiva and Lerman (2010):

Different models can be derived using different assumptions for the error terms. The most widely used approaches are binary logit, in cases where there are only two choices, or Multinomial Logit (MNL), in cases where there are more than two modal alternatives. The logit model assumes that the error terms are Extreme value Type 1 (Gumbel) distributed. This assumption leads to the estimation for probability of an individual ‘n’ selecting an alternative ‘i’ as given below:

Like OLS, the MNL is based on a few assumptions. The MNL assumes homogeneity across individual variables in the data, Independence of Irrelevant Alternatives (IIA), Independently and Identically Distributed (IID) error terms, and common variance of the error terms (homoscedasticity). However, it happens that quite frequently, the alternatives being considered share common attributes. In these cases, the assumption of independence embedded in the MNL breaks down, and the MNL produces erroneous results. To overcome this issue, more advanced forms of discrete choice models have been developed. To this effect, Jiang et al. (1999) adopted a Nested Logit (NL), and Kim (2002) introduced a random heterogeneity logit model. Norojono and Young (2003) adopted a heteroscedastic extreme value method; Arunotayanun and Polak (2007), Patterson et al. (2007), and Abate and de Jong (2014) use mixed MNL (MMNL) models, where the βi in Equation (4) is assumed to follow a specified distribution. Other models include an MNL-MNL Archimedean class of copula model, like the one used by Pourabdollahi et al. (2013).

**Discrete-continuous models**

An important feature of freight mode choice models is that the decision-makers make two choices at the same time. The first is the choice of the freight mode, and the second is the choice of the corresponding shipment size, the combination of which, hopefully, minimizes total logistics cost (the summation of transportations and inventory cost). As a result, the econometric interactions between the choice of mode and shipment size must be econometrically considered. As a result, it turns out that the decision concerning shipment size has tremendous influence on freight mode choice. In terms of shipment size, it has been found that the closer the shipment size is to the capacity of a freight vehicle or mode, the more likely that vehicle or mode will be selected to transport the shipment (Holguín-Veras 2002). In the words of Samuelson (1977), the decision of shipment size is “mode determining.” Capturing this combination of decisions requires the use of discrete-continuous choice models. The mathematical form for discrete-continuous models is similar to Equation (2), but includes endogenous variable(s) Zn, as shown in Equation (7).

There are multiple approaches to address the issues created by the correlation between the discrete and continuous choices. A partial list of techniques include: (1) control function or instrumental variable approach (Heckman 1976); (2) the Berry, Levinsohn and Pakes (BLP) approach (Louviere et al. 2005); (3) dual approach (Matzkin 2004); (4) latent variable approach (Ben-Akiva and Boccara 1995); and (5) special regressor approach (Lewbel 1998). The technique used in this research is the instrumental variable approach, which was selected for practicality reasons. In this technique, the values of Zn are replaced with either the estimated values from an instrumental variable, as shown in Equation (8) (Holguín-Veras 2002).

The majority of the publications on freight mode choice modeling use instrumental variable approach with various functional forms and exogenous variables in Equation (8). For example, Abdelwahab and Sargious (1991), Abdelwahab and Sargious (1992) and Abdelwahab (1998) adopt a maximum likelihood binary probit model with continuous shipment sizes for various modes, estimated as a function of other exogenous variables. Holguín-Veras (2002), uses and instrumental variable approach MNL with continuous shipment size. Dewey et al. (2002) and Lloret-Batlle and Combes (2013) use OLS to estimate the demand function, where the shipment size is calculated using the inventory theory approach (discussed later in this section, in Supply Chain Methods).