Data visualizations

  • Dashboards

    Working with Tableau to analyze datasets and tell data stores.

  • Maps

    Using spatial data to map information.

  • Network Analysis

    Finding and exploring connections through radial and chord graphs.


Dashboards

The 2024 Summer Olympic games marks the 40th anniversary for the women’s marathon. But, despite changes in training styles and understanding of nutrition and the invention of “super shoes”, women are competing at about the same pace as they were four decades ago.

At the first women’s Olympic marathon in Los Angeles in 1984, Joan Benoit Samuels – then Joan Benoit – of the United States broke the tape 2 hours, 24 minutes and 52 seconds later for the gold and the first women’s marathon olympic record. The most recent champion, Peres Jepchirchir, took home the gold with a final time of 2 hours, 27 minutes and 20 seconds at the Tokyo Marathon on a flatter course and a lower daily average temperature, two things that feasibly would make the course easier.

This dashboard looks at both the way that the fastest times have remained mostly stagnant as well as the average finishing times. Comparing the average temperature and elevation of the courses also informs why certain races are slower or faster than others. The 2004 Athens marathon was one of the on-average slowest Olympic races, but it was also the hilliest race and took place on one of the hotter days comparatively.

And, on-average, Olympic marathoners are getting older. This could be in-part because the number of runners increased every year until 2020, when the race was delayed for a year for the pandemic and the number of participants was limited.

Lastly, this dashboard looks at which countries have medaled the most since the women’s event began. Kenya, Japan and the United States lead the medal count for women, with 7, 4 and 3 respectively.

This dashboard cannot explain all of the reasons why the Olympic marathon hasn’t gotten much faster for women over the years. But the Olympics is not known for setting world records: the courses are not set up to see how fast the human body can go on a flat, easy course. Instead, they are splintering tests of endurance, and the data does show that heat and elevation play a huge role in the final outcome. Using this analysis, in tandem with sports research, could offer new and interesting data stories about how women in endurance sports have evolved since their introduction on the main stage.


Maps

I worked on a map of all of the running clubs in Brooklyn and Queens. To build the map, I had to research and find the running club names, social media sites, location of meeting point and various data about the runs the clubs take themselves. This is an on-going project I will continue to work on beyond the semester and will likely continue to change form as more run clubs are added for Manhattan, Staten Island and the Bronx.

The majority of this map was created by geocoding points from latitude and longitude points nested at the exact place run clubs meet. The other points of analysis include filtering by day and time and neighborhood to allow individual users to narrow down into when they are available to run. A descriptor that is included in the point data is what type of run each club is doing at that meet-up spot, which I would like to create as a filter as well. However, that information is a different data lift since not all run clubs list this data or make it clear beforehand. 

Another consideration that is beyond the scope of this course but will make it into a future iteration is how to maintain this as time goes on — having it built and hosted out of ArcGIS where it can connect to a live-updating spreadsheet would offer the option to update it in more real time than how it is currently built. 


Network Analysis

For the network analysis, I looked at how the 116th Congress used proxy voting during the first week of it’s institution at the onset of the coronavirus pandemic. Letters of proxy can be mapped into a dataset by pulling individual letters from the House Clerk’s website and finding who people named their proxy on a PDF. This dataset had been originally worked on in 2020 and I tried to gather more names but the pdf’s no longer load — or if they do, take too long to be recognizable.

Gephi continued to not fully operate on my machine, so I relied on Flourish, a separate data visualization website, to create a series of small multiples showing some of the ways these connections can be visualized.

In the first week of proxy voting, only Democrats participated and they all nominated other Democrats to vote on their behalf. The following charts explore some of the connections that can be seen from this one week of proxy voting.


First, this simple network shows the relationship between designators and designees. There are some clear connections where one person is the connector of many others, but the majority are one-off connections between two people. There are a few single, unconnected dots and this is because Flourish isolates the individuals in the middle of the connections to also have their own free-floating dot.


Next, I expanded the nodes table to include the committees that each member sat on. The first radial graph shows the connections of individuals, colored by their primary committee affiliation.


I also included radial graphs showing which individuals were designated the most as proxies. Don Beyer, Jamie Raskin and Debbie Wasserman Schultz are become more visible as primary designatees. And, in the radial graph to the right, California and Florida stand out as having the majority of members who acted either as a designator and designatee, which makes sense give the size of both states.

For better clarity on the connection between committees, I made an edge table where the source was the primary committee of the designator and the target was the primary committee of the designatee, with the count being the number of times someone from the one committee designated someone from that specific other. Appropriations, Science, Financial Services, Judiciary and Armed Services all show as having a large showing for carrying designatees.

Most notably, getting access to all the records and being able to analyze a larger group of people would strengthen these visuals. The connections of committees and states is interesting, but there is two year’s worth of data that, once scraped, could show meaningful and maybe lesser known connections between members of the House of Representatives.

thanks for a great semester!