Introduction

Lately I've been thinking about how natural language processing techniques could possibly be applied to the docket narratives lawyers draft for their timesheets. Categorizing dockets according to phase/task codes is an obvious possibility.

But I also wonder about other potential insights; for example, to develop an understanding of the type of work associates and/or partners are doing, and the distribution of this work amongst timekeepers. Are associates being provided enough opportunities on their feet in court or conducting examinations? Are senior associates doing too much research that is better suited for junior associates? To what extent are partners doing "associate" work and vice versa?

There is potentially a wealth of information in docket narratives that could be used to improve the operations of a litigation department. However, in most law firms, analyzing these narratives manually would not be practical. Hence my interest in applying NLP techniques to this type of data.

Generating a Synthetic Dataset of Docket Narratives

Of course, dockets are confidential. For my experiments I need to develop a synthetic dataset.

Below is a script I wrote to generate a synthetic dataset of docket narratives. It's very simple and, admittedly, only roughly approximates genuine docket narratives. I'm hoping it will be a sufficient place to start.

My first goal will simply be to see if I can train a language model to distinguish between narratives that involve drafting and those that do not. I plan to try a range of supervised learning techniques, approaching this as a categorization problem.

This script generates a dataset with two columns of information. The rows in the first column contain the randomly generated docket narratives. The rows in the second column contain either a 0 or a 1. So-called drafting narratives are assigned a 1 and all other narratives are assigned a 0. These will be the labels required to train the language model.

Final Thoughts

Have you worked on this type of problem with similar data? Written anything on it? Is my dataset unsuited for this purpose? Please let me know!

Script

import random
import pandas as pd

#Lists consisting of the components for the docket narratives
actions = ["writing", "drafting", "editing", "revising", "briefing", "reviewing", "analyzing", "preparing", "proofing", "researching", "attending to", "finalizing", "considering", "discussing", "addressing"]

object = ["notice of motion", "affidavit", "factum", "memorandum", "memo", "compendium", "book of authority", "motion record", "analysis", "order", ]

subject = ["summary judgment", "injunction", "enforcing foreign judgment", "motion to strike", "refusals", "disqualifying expert"]

dockets = []

#Function to randomly generate a docket narrative and add it to a list containing all of the narratives
def add_docket():
  docket = random.choice(actions) + " " + random.choice(object) + " re: " + random.choice(subject)
  dockets.append(docket)

#While loop to generate the dataset. Edit the number to the right of the < to specify the number of docket narratives you desire.
x = 0
while x < 10000:
  add_docket()
  x += 1

#Convert list with narratives to DataFrame
df = pd.DataFrame(dockets, columns=['narrative'])

#List containing the drafting words
drafting = ["writing", "drafting", "editing", "revising", "briefing"]

#Function to identify drafting narratives
def is_drafting(row):  
    for i in drafting:
      if i in row['narrative']:
        return 1

#Lambda function to review each narrative and identify the drafting narratives
df['drafting'] = df.apply (lambda row: is_drafting(row), axis=1)

#Put a 0 in the drafting column where there is no 1 and convert the column from float to integer
df['drafting'] = df['drafting'].fillna(0)
df['drafting'] = df['drafting'].astype(int)

#Convert DataFrame to CSV file
df.to_csv('dockets')
df.head(20)
narrative drafting
0 considering memorandum re: injunction 0
1 editing notice of motion re: summary judgment 1
2 discussing memo re: injunction 0
3 proofing memorandum re: injunction 0
4 writing motion record re: summary judgment 1
5 addressing factum re: summary judgment 0
6 addressing factum re: enforcing foreign judgment 0
7 drafting book of authority re: motion to strike 1
8 attending to compendium re: motion to strike 0
9 proofing memo re: summary judgment 0
10 addressing factum re: disqualifying expert 0
11 revising memo re: disqualifying expert 1
12 discussing analysis re: disqualifying expert 0
13 addressing notice of motion re: enforcing fore... 0
14 editing analysis re: injunction 1
15 considering affidavit re: disqualifying expert 0
16 discussing factum re: enforcing foreign judgment 0
17 addressing memorandum re: refusals 0
18 writing notice of motion re: enforcing foreign... 1
19 considering analysis re: refusals 0
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   narrative  10000 non-null  object
 1   drafting   10000 non-null  int64 
dtypes: int64(1), object(1)
memory usage: 156.4+ KB
df['drafting'].value_counts()
0    6618
1    3382
Name: drafting, dtype: int64