- Logistic Regression
- The Dataset
- Importing Dependencies
- Loading and Splitting the Dataset
- Normalizing the Dataset
- The Logistic Regression Model
- Final Thoughts
In previous posts, I've written about using a neural network and k-nearest neighbor to predict SCOTUS judging.1 These posts were based on lesson 8 from Prof. Wolfgang Alschner's fantastic course, Data Science for Lawyers. In this lesson, Prof. Alschner reviews several machine learning algorithms and explains how to use them to predict Justice Brennan's voting record.
Logistic regression is not among the algorithms Prof. Alschner discusses. I was curious about how this algorithm would perform. In this post, I apply a logistic regression model to the Justice Brennan dataset. To do this, I use the
LogisticRegression() class from scikit-learn.
In a nutshell, logistic regression is a supervised machine learning algorithm that can be used for classification. In its simpler forms, it predicts the probability of a datapoint belonging to one group or another.
In this post, I use a logistic regression model to predict whether Justice Brennan votes with the majority of the court in relation to 4746 cases.
You can find an overview of the dataset in my earlier post, Using Artificial Intelligence to Predict SCOTUS Judging.
As usual, we begin by importing our dependencies.
import pandas as pd import numpy as np from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler, LabelEncoder from sklearn.compose import ColumnTransformer
The next steps to building this model are very similar to the other models I've reviewed on LitKM to date.
So, we first load the dataset from a CSV and turn it into a DataFrame.
Next, we split this dataset into training features and labels, and testing features and labels.
In lesson 8, each model is trained on Justice Brennan's voting data prior to 1980 and tested on his voting data from 1980 and onwards. So we'll split our dataset likewise.
dataset = 'https://raw.githubusercontent.com/litkm/WJBrennan-Voting/main/WJBrennan_voting.csv' dataset = pd.read_csv(dataset) #Features x = dataset[['term', 'petitioner', 'respondent', 'jurisdiction', 'caseOrigin', 'caseSource', 'certReason', 'issue', 'issueArea']] #Labels y = dataset['vote'] #Features for training X_train = x[x['term'] < 1980] #Features for testing X_test = x[x['term'] > 1979] #Labels for training Y_train = y.iloc[0:3368] #Labels for testing Y_test = y.iloc[3367:4745]
Now we must normalize the dataset. The features consist of numbers, all of varying ranges. So we need to scale them to use the same range.
The labels consist of one of two words, either "majority" or "minority". We must convert these categorical labels into either a "1" (majority) or a "0" (minority).
#Scale features columns_for_standard = ['term', 'petitioner', 'respondent', 'jurisdiction', 'caseOrigin', 'caseSource', 'certReason', 'issue', 'issueArea'] ct = ColumnTransformer([('numeric', StandardScaler(), columns_for_standard)]) X_train = ct.fit_transform(X_train) X_test = ct.transform(X_test) #Convert categorical labels to numbers le = LabelEncoder() Y_train = le.fit_transform(Y_train.astype(str)) Y_test = le.transform(Y_test.astype(str))
model = LogisticRegression() model.fit(X_train, Y_train)
Now that our model is trained, let's see how well it predicts Justice Brennan's voting record based on the training data.
Not bad. Using voting data from prior to 1980 only, the model predicts Justice Brennan's voting record with approximately 82% accuracy.
But, trained on this data, how well does the model predict Justice Brennan's voting from 1980 onwards?
The model achieves approximately 60% accuracy. This is better than a coin toss, but not as good as the other algorithms I've reviewed on LitKM. Both KNN and the neural network achieved 69% accuracy.
When working with KNN, it occurred to me that perhaps the dataset, as normalized per the approached detailed above, was incomplete. Specifically, the features and labels were not scaled to the same range. Whereas the labels use a range of 0 to 1, the features do not.
Below I reproduce the code to scale the features to a range of 0 to 1, same as the labels, and fit a new logistic regression model to this updated dataset.
#Convert the features from numpy arrays to DataFrames in prep for min-max scaling, and ease of review X_train = pd.DataFrame(X_train) X_test = pd.DataFrame(X_test) #Apply min-max scaling to the training features min = X_train.min().min() max = X_train.max().max() X_train = (X_train - min) / (max - min) #Apply min-max scaling to the testing features min = X_test.min().min() max = X_test.max().max() X_test = (X_test - min) / (max - min) #Convert the labels to DataFrames from numpy arrays (because the features are now DataFrames) Y_train = pd.DataFrame(Y_train) Y_test = pd.DataFrame(Y_test) #Fit a new logistic regression model with the updated dataset model = LogisticRegression() model.fit(X_train, Y_train.values.ravel())
So, how does this new model perform?
On the training data, the model performs slightly better.
But check out the results on the testing data:
Big drop! Almost 4%.
With KNN, normalizing the features to likewise use a range of 0 to 1 boosted performance by about 1%. But, for logistic regression, this same change decreased perforance, and by a relatively significant amount. Candidly, at this point I have no idea why this occurred. Any ideas? I'm all ears!