Introduction

In a recent posted titled Using Artificial Intelligence to Predict SCOTUS Judging, I discussed a machine learning model I used to make predictions regarding Justice Brennan's voting record on the Supreme Court of the United Status (SCOTUS). This model is a neural network, coded in Python, and uses the Keras framework. In the present post, I review the code, line-by-line, and explain it.

This post focuses on code and not the dataset. For more information regarding the latter, please see my earlier post (though it's worth repeating I obtained the dataset from from Prof. Alschner's great site, Data Science For Lawyers).

Further, as noted in the README for this blog, I'm assuming the reader has a basic level of familiarity with object oriented programming and common Python libraries.

That said, I'm also aiming to write this in a way so that a general reader will still be able to follow along (more or less). If you are interested in this content but I'm assuming too much background knowledge, please let me know. I'd be happy to explain further.

For reference, I developed the model using Google Colab.

Workflow

Before delving into the details, this is the workflow underpinning the code:

  1. Import modules
  2. Load data
  3. Define the training set and testing set
  4. Preprocess the data
  5. Define the model
  6. Run the model
  7. Report on the results

Modules

We begin our script by loading the modules we require. These are our tools to preprocess the data and assemble the model. This model does not involve coding any functions or classes from scratch. We are not building any new tools. Instead, we import everything we need. As a result, this script is remarkably short.

import pandas as pd
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, InputLayer
from sklearn.metrics import classification_report
from tensorflow.keras.utils import to_categorical
import numpy as np

In brief, we import libraries or portions thereof from:

  • pandas - for loading the data regarding Justice Brennan's voting record
  • Scikit-learn - for preprocessing the data
  • NumPy - to help with the reporting; and, of course
  • Keras - the framework this model uses

Data

At this step, our objective is to load the data into a pandas DataFrame. Once in this format, we can begin preprocessing it for the model to analyze.

dataset = 'https://raw.githubusercontent.com/litkm/WJBrennan-Voting/main/WJBrennan_voting.csv'
dataset = pd.read_csv(dataset)

We obtain our data from a CSV file. For convenience, I uploaded it to GitHub (in raw format). We then create a variable and assign it to the web address where the file is located.

Next, we pass this variable into the the pandas read_csv() method to load the CSV file as a DataFrame, and reassign the variable for the CSV file to now be the variable for the DataFrame.

Training Set and Testing Set

To train this model, we must split the data into training and test sets. The model "learns" from the training data; during the learning phase, the test data is excluded from review. Once it completes a learning phase, the model switches to a test phase, where it evaluates its predictive capacity (i.e. how well it learned from the training data) using the test data.

This model uses supervised learning. Often, this is framed in terms of x and y variables:

  • x represents the data inputted into the model. These inputs are sometimes called features.
  • y represents an outcome the model is to predict. This is sometimes called the target variable or label.

During training, the model processes the features and uses them to make predictions. These predictions are compared against the corresponding label. The outcome of this comparison is a "supervisory signal", i.e. whether the prediction was correct or not; and, if not, by how much. The model then uses this "signal" to recalibrate with the objective of improving its predictive capacity.

With this in mind, we need to identify our x and y variables in the Justice Brennan dataset. The first five rows of the dataset is reproduced below:

Our target variable for this model is the outcome identified in the "vote" column, i.e. whether Justice Brennan voted with the majority or the minority. This is what we want to predict in respect of each row (where each row represents one SCOTUS case).

The data in the preceding columns comprises the information we intend to input into the model, and which the model will use to predict the target variables.

During training, the model will review each row of features, case by case, and make a prediction relating to that case. This prediction will then be compared to the label for that case, i.e. whether Brennan voted with the majority or not. This comparison provides the supervisory signal. Based on the result, the model recalibrates and proceeds to the next row in the dataset, i.e. the next case.

After the model cycles through the training dataset, it then evaluates its predictive capacity against the test dataset.

To provide the foregoing, we must split the original dataset into four subsets:

  1. X_train - the set of features for training
  2. Y_train - the corresponding set of labels for training
  3. X_test - the set of features for testing
  4. Y_test - the correspondence set of labels for testing

We accomplish this in a few steps.

First, we split the DataFrame into features and labels:

y = dataset['vote']
x = dataset[['term', 'petitioner', 'respondent', 'jurisdiction', 'caseOrigin', 'caseSource', 'certReason', 'issue', 'issueArea']]

We now have a DataFrame assigned to a variable called y, and it contains all of the labels.

We now also have a DataFrame assigned to a variable called x, and it contains all of the features.

Next, we further split these two DataFrames into training sets (X_train and Y_train) and test sets (X_test and Y_test). To do this, we use the test_train_split() function from the sci-kit learn library:

X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)

As you see, test_train_split takes several parameters:

  • x - representing DataFrame of features to be split
  • y - representing the DataFame of labels to be split
  • test_size - this parameter specifies the size of the test set; in this instance, we allocate 30% of the dataset for testing
  • random_state - this parameter controls the shuffling (randomization) applied to the data before applying the split

When calling this function, we assign the data subsets to the variables X_train, Y_train, X_test, and Y_test. Now we are ready to proceed to the next stage.

Preprocessing the Data

Before the data can be fed into the model, it must be preprocessed for optimal results. In this instance, we need to:

  1. Scale the features
  2. Convert the labels from categories to integers

As you can see from the printout of the dataset above, all of the features are represented using numbers. Based on the first five rows alone, we note dissimilarity; for example:

  • respondent - ranges from 3 to 369
  • jurisdiction - ranges from 1 to 2
  • issue - ranges from 40,070 to 120,020

If we were to dive into the rest of the dataset, we would see this dissimilarity is representative. Each of the columns has a different range, mean, etc.

When we scale the features, the model recalculates the numbers that comprise the features so that there is zero mean variance between them.

To accomplish this, we use another tool from the sci-kit learn library, namely the ColumnTransformer() class.

columns_for_standard = ['term', 'petitioner', 'respondent', 'jurisdiction', 'caseOrigin', 'caseSource', 'certReason', 'issue', 'issueArea']

ct = ColumnTransformer([('numeric', StandardScaler(), columns_for_standard)])

X_train = ct.fit_transform(X_train) 
X_test = ct.transform(X_test)

For convenience, we first assign the features we wish to scale to a variable called columns_for_standard. This variable is used in the next line of code.

Then, we create a ColumnTransformer() object assigned to the variable ct. When initializing this object, we configure the form of scaling and specify the features to be scaled.

Next, we use the related class methods, fit_transform() and transform(), to apply the scaler to the features. Each of these methods produces NumPy arrays comprising the now scaled features. These arrays are converted back into DataFrames and assigned, respectively, to our X_train and X_test variables. Done! We've scaled our features.

Now to convert the labels from categories to integers (whole numbers) and ensure they, too, are scaled. On review of the "vote" column above, you will note there are no numbers. Rather, there is only the word "majority". Throughout the dataset, the "vote" column only ever has the word "majority" or "minority" under it. To make these "categories" digestable for the model, we must convert them into integers and scale them.

Here again, we use a tool from the sci-kit learn library, namely the LabelEncoder() class along with the pandas class method astype().

le = LabelEncoder()
Y_train = le.fit_transform(Y_train.astype(str))
Y_test = le.transform(Y_test.astype(str))

Y_train = to_categorical(Y_train)
Y_test = to_categorical(Y_test)

We first initialize a LabelEncoder() object and assign it to the variable le.

Then, for each of Y_train and Y_test, we call the astype() method to convert the categories (i.e. words) into integers, while calling the LabelEncoder() class methods fit_transform() and transform(), as applicable, to scale each of these DataFrames, too.

We must then use a Keras function called to_categorical() to convert the integers in Y_train and Y_test into a form called one-hot-encodings, which the model requires to factor properly for the "supervisory signals" discussed above.

At this point, we are finally ready to create the neural network itself.

Define the Neural Network

To create the neural network, we use Keras' Sequential() class. This is one of the most popular types of models and provides for adding layers to the neural network, one after the other in a straightforward way.

We need to sepecify:

  1. An input layer
  2. Any hidden layers
  3. The output layer

Below is a diagram of a simple neural network:

The model we are building is also very simple and is likewise commprised of only three layers: an input layer, one hidden layer, and then the output layer. The code to create this is below.

model = Sequential()
model.add(InputLayer(input_shape=(X_train.shape[1],)))
model.add(Dense(10, activation='relu'))
model.add(Dense(2, activation='softmax'))

We invoke the Sequential() class and assign it to the variable model. Then, we add the input layer. We configure it (the input_shape parameter) so that there is one input unit ("synthetic neuron") for each type of feature in the X_train DataFrame. The example model in the diagram above has two input units (the circles in the input layer). By contrast, this model has nine input units because the dataset has nine different types of features.

Then, we add the hidden layer and the output layer. The integers "10" and "2" indicate the number of neurons in each layer. The activation parameter is a setting that configures some of the math the model performs each time a row (i.e. case) from the dataset is passed through the neural network.

Now we need to "compile" our model. Among other things, the below line of code further configures the math the model uses across all layers of the neural network when processing the dataset.

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Running the Model

At this stage, we are finally ready to run our model. To do this, we use Keras' fit() method with the following paramaters:

  • X_train - the training features
  • Y_train - the training labels
  • epochs - the numbers of times the model cycles through the entire dataset
  • batch_size - the number of rows from the dataset (i.e. cases) the model will feed through the neural network before recalibrating in response to the supervisory signals
  • verbose - set to 1 tells the model to print its progress to the screen
  • validation_data - indicates the test features and test labels to use during the testing phase
model.fit(X_train, Y_train, epochs=5, batch_size=8, verbose=1, validation_data=(X_test, Y_test))
Epoch 1/5
416/416 [==============================] - 1s 2ms/step - loss: 0.4487 - accuracy: 0.7953 - val_loss: 0.4750 - val_accuracy: 0.7795
Epoch 2/5
416/416 [==============================] - 1s 1ms/step - loss: 0.4468 - accuracy: 0.7929 - val_loss: 0.4782 - val_accuracy: 0.7767
Epoch 3/5
416/416 [==============================] - 1s 1ms/step - loss: 0.4476 - accuracy: 0.7908 - val_loss: 0.4727 - val_accuracy: 0.7767
Epoch 4/5
416/416 [==============================] - 1s 1ms/step - loss: 0.4476 - accuracy: 0.7932 - val_loss: 0.4746 - val_accuracy: 0.7704
Epoch 5/5
416/416 [==============================] - 1s 1ms/step - loss: 0.4479 - accuracy: 0.7908 - val_loss: 0.4770 - val_accuracy: 0.7760
<tensorflow.python.keras.callbacks.History at 0x7fabab0d4e50>

The output above suggests the model learns from the data fairly quickly. After one round of training, training accuracy is ~80% and testing accuracy is ~77%. Sometimes when I've run this model, training accuracy after the first round has been around ~62%; testing accuracy has also been lower than this example. However, by the second round the model seems consistently to max out at around ~79%-80% training accuracy and ~77%-78% testing accuracy, per around.

I've played around a bit with the hyperparameters (i.e. the configuration of the model in terms of layers, number of neurons, number of layers, epochs, etc.), and have yet to improve performance materially. So I've stuck with the simplest implementation of this model for this post.

Reporting the Results

To evaluate the performance of the model over all epochs, we can use Keras' evaluate() method.

loss, acc = model.evaluate(X_test, Y_test, verbose=0)
print("Loss:", loss, "Accuracy:", acc)
Loss: 0.4769740104675293 Accuracy: 0.7759831547737122

These lines of code evaluate the model using the testing data (X_test) and the testing labels (Y_test), and outputs (prints) the results. Overall, the model accurately predicted Justice Brennan's vote with ~78% accuracy.

The other Loss number is a calculation relating to how far off the mark the model's predictions were, overall. The closer this number is to 0, the better. This number can also be used to tune the model and further details will have to wait for another post.

Final Thoughts

My comments on the foregoing code glosses over a lot of detail, particularly in terms of the calculcations the model performs. I am planning to address this in a future post.

Candidly, I still don't understand everything that is going on in the code discussed. But it seems to work.

Thanks for reading! Did I make a mistake? Does something not make sense? Hit me up.

Appendix

For ease of review, the entire script is set out below.

import pandas as pd
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, InputLayer
from sklearn.metrics import classification_report
from tensorflow.keras.utils import to_categorical
import numpy as np

dataset = 'https://raw.githubusercontent.com/litkm/WJBrennan-Voting/main/WJBrennan_voting.csv'
dataset = pd.read_csv(dataset)

y = dataset['vote']
x = dataset[['term', 'petitioner', 'respondent', 'jurisdiction', 'caseOrigin', 'caseSource', 'certReason', 'issue', 'issueArea']]

X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)

columns_for_standard = ['term', 'petitioner', 'respondent', 'jurisdiction', 'caseOrigin', 'caseSource', 'certReason', 'issue', 'issueArea']

ct = ColumnTransformer([('numeric', StandardScaler(), columns_for_standard)])

X_train = ct.fit_transform(X_train) 
X_test = ct.transform(X_test)

le = LabelEncoder()
Y_train = le.fit_transform(Y_train.astype(str))
Y_test = le.transform(Y_test.astype(str))

Y_train = to_categorical(Y_train)
Y_test = to_categorical(Y_test)

model = Sequential()
model.add(InputLayer(input_shape=(X_train.shape[1],)))
model.add(Dense(10, activation='relu'))
model.add(Dense(2, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X_train, Y_train, epochs=5, batch_size=8, verbose=1, validation_data=(X_test, Y_test))

loss, acc = model.evaluate(X_test, Y_test, verbose=0)
print("Loss:", loss, "Accuracy:", acc)
Epoch 1/5
416/416 [==============================] - 1s 2ms/step - loss: 0.5343 - accuracy: 0.7485 - val_loss: 0.4861 - val_accuracy: 0.7669
Epoch 2/5
416/416 [==============================] - 1s 1ms/step - loss: 0.4716 - accuracy: 0.7776 - val_loss: 0.4800 - val_accuracy: 0.7788
Epoch 3/5
416/416 [==============================] - 1s 1ms/step - loss: 0.4673 - accuracy: 0.7794 - val_loss: 0.4776 - val_accuracy: 0.7809
Epoch 4/5
416/416 [==============================] - 1s 1ms/step - loss: 0.4692 - accuracy: 0.7838 - val_loss: 0.4758 - val_accuracy: 0.7802
Epoch 5/5
416/416 [==============================] - 1s 1ms/step - loss: 0.4497 - accuracy: 0.7912 - val_loss: 0.4755 - val_accuracy: 0.7809
Loss: 0.47549518942832947 Accuracy: 0.7808988690376282