ghtop

Introduction

We recently refactored the CLI tool ghtop, created by the CEO of GitHub, Nat Friedman. Nat even described our refactor as a “tour de force”. This post describes what we learned along the way.

Motivation

Recently, we released ghapi, a new python client for the GitHub API. ghapi provides unparalleled ease of access to the GitHub api, as well as utilities for interacting with GitHub Actions. Part of our motivation for creating ghapi was to accelerate the development of build, testing and deployment tools that help us in maintaining fastai projects.

We recently started using GitHub Actions to perform a wide variety of tasks automatically like: unit and integration tests, deploying documentation, building Docker containers and Conda packages, sharing releases on Twitter, and much more. This automation is key to maintaining the vast open source fastai ecosystem with very few maintainers.

Since ghapi is central to so many of these tasks, we wanted to stress-test its efficacy against other projects. That’s when we found ghtop. This tool allows you to stream all the public events happening on GitHub to a CLI dashboard. We thought it would be a fun learning experience to refactor this code base with various fastai tools such as ghapi and fastcore, but also try out new libraries like rich.

Features we added to our tools

While exploring ghtop, we added several features to various fastai tools that we found to be generally useful.

ghapi Authentication

We added the function github_auth_device which allows users to authenticate their api client with GitHub interactively in a browser. When we call this function we get the following prompt:

github_auth_device()
First copy your one-time code: 276E-C910
Then visit https://github.com/login/device in your browser, and paste the code when prompted.
Shall we try to open the link for you? [y/n]

The browser opens a window that looks like this:

The function then returns an authenticated token which you can use for various tasks. While this is not the only way to create a token, this is a user friendly way to create a token, especially for those who are not as familiar with GitHub.

ghapi Events

As a result of our explorations with ghtop, we added an event module to ghapi. This is useful for retrieving and inspecting sample events. Inspecting sample events is important as it allows you to prototype GitHub Actions workflows locally. You can sample real events with load_sample_events:

from ghapi.event import load_sample_events
evts = load_sample_events()

Individual events are formatted as markdown lists to be human readable in Jupyter:

print(evts[0])
- id: 14517925737
- type: PushEvent
- actor: 
  - id: 17030246
  - login: BeckhamL
  - display_login: BeckhamL
  - gravatar_id: 
  - url: https://api.github.com/users/BeckhamL
  - avatar_url: https://avatars.githubusercontent.com/u/17030246?
- repo: 
  - id: 154349747
  - name: BeckhamL/leetcode
  - url: https://api.github.com/repos/BeckhamL/leetcode
- payload: 
  - push_id: 6194986903
  - size: 1
  - distinct_size: 1
  - ref: refs/heads/master
  - head: 2055b0fcf22f1c3543e38b60199f6882266d32a5
  - before: cb16921949c969b5153a0c23ce8fe516d2c8d773
  - commits: 
    - 
      - sha: 2055b0fcf22f1c3543e38b60199f6882266d32a5
      - author: 
        - email: beckham.lam@mail.mcgill.ca
        - name: Beckham Lam
      - message: Create detectCapital.ts
      - distinct: True
      - url: https://api.github.com/repos/BeckhamL/leetcode/commits/2055b0fcf22f1c3543e38b60199f6882266d32a5
- public: True
- created_at: 2020-12-13T21:32:34Z

You can also inspect the json data in an event, which are accessible as attributes:

evts[0].type
'PushEvent'

For example, here is the frequency of all full_types in the sample:

x,y = zip(*Counter([o.full_type for o in evts]).most_common())
plt.figure(figsize=(8, 6))
plt.barh(x[::-1],y[::-1]);

We can fetch public events in parallel with GhApi.list_events_parallel. In our experiments, repeatedly calling list_events_parallel is fast enough to fetch all current public activity from all users across the entire GitHub platform. We use this for ghtop. Behind the scenes, list_events_parallel uses Python's ThreadPoolExecutor to fetch events in parallel - no fancy distributed systems or complicated infrastructure necessary, even at the scale of GitHub!

%time
api = GhApi()
evts = api.list_events_parallel()
len(evts)
CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 4.29 µs
240

Note that the GitHub API is stateless, so successive calls to the API will likely return events already seen. We handle this by using a set operations to filter out events already seen.

ghapi pagination

One of the most cumbersome aspects of fetching lots of data from the GitHub api can be pagination. As mentioned in the documentation, different endpoints have different pagination rules and defaults. Therefore, many api clients offer clunky or incomplete interfaces for pagination.

In ghapi we added an entire module with various tools to make paging easier. Below is an example for retrieving repos for the github org. Without pagination, we can only retrieve a fixed number at a time (by default 30):

api = GhApi()
repos = api.repos.list_for_org('fastai')
len(repos)
30

However, to get more we can paginate through paged:

from ghapi.event import paged
repos = paged(api.repos.list_for_org, 'fastai')
for page in repos: print(len(page), page[0].name)
30 fast-image
30 fastforest
30 .github
8 tweetrel

You can learn more about this functionality by reading the docs.

fastcore Sparklines

Part of goals for refactoring ghtop were to introduce cool visualizations in the terminal of data. We drew inspiration from projects like bashtop, which have CLI interfaces that look like this:

Concretely, we really liked the idea of sparklines in the terminal. Therefore, we created the ability to show sparklines with fastcore:

from fastcore.utils import sparkline
data = [9,6,None,1,4,0,8,15,10]
print(f'without "empty_zero": {sparkline(data, empty_zero=False)}')
print(f'   with "empty_zero": {sparkline(data, empty_zero=True )}')
without "empty_zero": ▅▂ ▁▂▁▃▇▅
   with "empty_zero": ▅▂ ▁▂ ▃▇▅

For more information on this function, read the docs. Later in this post, we will describe how we used Rich to add color and animation to these sparklines.

fastcore EventTimer

Because we wanted streaming event data to automatically populate sparklines, we created EventTimer that constructs a histogram according to a frequency and time span you set. With EventTimer, you can add events with add, and get the number of events and their frequency:

from fastcore.utils import EventTimer
from time import sleep
import random

def _randwait(): yield from (sleep(random.random()/200) for _ in range(100))

c = EventTimer(store=5, span=0.03)
for o in _randwait(): c.add(1)
print(f'Num Events: {c.events}, Freq/sec: {c.freq:.01f}')
print('Most recent: ', sparkline(c.hist), *L(c.hist).map('{:.01f}'))
Num Events: 6, Freq/sec: 301.1
Most recent:  ▃▁▁▇▁ 323.6 274.8 291.3 390.9 283.6

For more information, see the docs.

CLI Animations With Rich

Rich is an amazing python library that allows you to create beautiful, animated and interactive CLI interfaces. Below is a preview of some its features:

Rich also offers animated elements like spinners:

... and progress bars:

While this post is not about rich, we highly recommend visiting the repo and the docs to learn more. Rich allows you to create your own custom elements. We created two custom elements - Stats and FixedPanel, which we describe below:

Stats: Sparklines with metrics

Stats renders a group of sparklines along with a spinner and a progress bar. First we define our sparklines, the last argument being a list of event types to count:

from ghtop.richext import *
from ghtop.all_rich import *
console = Console()


s1 = ESpark('Issues', 'green', [IssueCommentEvent, IssuesEvent])
s2 = ESpark('PR', 'red', [PullRequestEvent, PullRequestReviewCommentEvent, PullRequestReviewEvent])
s3 = ESpark('Follow', 'blue', [WatchEvent, StarEvent])
s4 = ESpark('Other', 'red')

s = Stats([s1,s2,s3,s4], store=5, span=.1, stacked=True)
console.print(s)
 🌍       Issues           PR           Follow         Other               Quota        
/min       0.0            0.0            0.0            0.0             ━━━━━━━   0%    

You can add events to update counters and sparklines with add_events:

evts = load_sample_events()
s.add_events(evts)
console.print(s)
 🌍       Issues           PR           Follow         Other               Quota        
/min    11772 ▁▇       16546 ▁▇        5991 ▁▇        6484 ▁            ━━━━━━━   0%    

You can update the progress bar with the update_prog method:

s.update_prog(50)
console.print(s)
 🌍       Issues           PR           Follow         Other               Quota        
/min     4076 ▁▇        5408 ▁▇        1834 ▁▇        5998 ▁            ━━━╸━━━  50%    

Here is what the animated version looks like:

FixedPanel: A panel with fixed height

A key aspect of ghtop is showing events in different panels. We created FixedPanel to allow us to arrange panels in a grid that we can incrementally add events to:

p = FixedPanel(15, box=box.HORIZONTALS, title='ghtop')
for e in evts: p.append(e)
grid([[p,p]])
 ─────────────────── ghtop ───────────────────  ────────────────── ghtop ─────────────────── 
  📪  dependabo…closed PR #3 o…herzli…"Bump …    📪  dependabo…closed PR #3 …herzli…"Bump …dongjun13 pushed 1 commi…dongjun13/2dongjun13 pushed 1 comm…dongjun13/2admmonito…pushed 1 commi…admmonitors/t…admmonito…pushed 1 comm…admmonitors/t…randomper…pushed 1 commi…randomperson1…randomper…pushed 1 comm…randomperson1…ahocevar pushed 6 commi…openlayers/ope…ahocevar pushed 6 commi…openlayers/op…  
  🏭  arjmoto created branch …arjmoto/redux-…    🏭  arjmoto created branch…arjmoto/redux-…  
  💬  stale[bot…created commen…ironha…"This …    💬  stale[bot…created comme…ironha…"This …commit-b0…pushed 1 commi…commit-b0t/co…commit-b0…pushed 1 comm…commit-b0t/co…yakirgot pushed 2 commi…yakirgot/snakeyakirgot pushed 2 commi…yakirgot/snake  
  💬  awolf78 created comment…Impulse…"If yo…    💬  awolf78 created commen…Impulse…"If yo…kreus7 pushed 1 commit…kreus7/kreusada…kreus7 pushed 1 commit…kreus7/kreusad…rgripper pushed 1 commi…rgripper/webco…rgripper pushed 1 commi…rgripper/webc…  
  👀  thelittle…started watchi…ritchie46/pol…    👀  thelittle…started watch…ritchie46/pol…  
  🏭  adrian698 created branch…adrian698/Test    🏭  adrian698 created branc…adrian698/Testmergify[b…pushed 2 commi…spbu-coding/6…mergify[b…pushed 2 comm…spbu-coding/6…  
 ─────────────────────────────────────────────  ──────────────────────────────────────────── 

To learn more about our extensions to rich see these docs.

A demo of ghtop animations

Putting all of this together, we get the following results:

4 Panels with a sparkline for different types of events:

ghtop

single panel with a sparkline

ghtop-tail

To learn more about ghtop, see the docs.

Interesting python features used

While making these docs, we used the following python features that at least one person we demoed it to found interesting or didn't know about. If you have been using python for sometime, you might know about all or most of these features:

yield from

Generators are a powerful feature of python, which are especially useful for iterating through large datasets lazily.

dequeue

f-strings