Tuesday, August 8, 2017

BM API + dataviz



I'm having fun playing with the Burning Man API and d3.  My noodling is here: http://rendall.tv/bm/v001/

I don't know if I will ever set it up to be an actual, full, proper app, but for now it's fun to see pie charts of the types of scheduled events, or bubble charts of theme camp surface area, or bar charts of the number of events each theme camp is hosting. I'm using Ramda.js to keep the data flow reasonable. As I develop it, I will likely use Rxjs to make interaction states easier to reason about.

One interesting challenge:

I'd like to get data with respect to hometowns of the various theme camps. Data for a theme camp comes in this kind of JSON object, for example:


{"uid": "a1X0V000002rka2UAA",
    "year": 2017,
    "name": "Houphoria",
    "url": null,
    "contact_email": null,
    "hometown": "Houston",
    "description": "The purpose of Houphoria is to provide multiple spaces for unique interactions to explore all of our senses. We are a collective of Houston based makers, visual artists, musicians, and DJs coming together and creating a environment for participants to come and interact with us and our creations. By day, we're a well shaded coffee lounge and bar, but at night is when our space really comes alive with vibrant lighting, sounds, and projections." }

As you can see, the hometown for the camp Houphoria is Houston. Most hometowns are pretty clear:

    San Francisco
    Los Angeles
    Unknown
    Reno
    Oakland
    Seattle
    Portland
    San Diego


However, others are not so clear:
  • Some hometown fields contain several locales, and this concept is expressed in various ways:

    • Vancouver/Louisville
    • San Frandencouver
    • Woodacre-Seattle
    • SD-OC-LA-SF
    • Portland & Bend
    • San Fran and LA
    • New York, Dubai, San Francisco, Hong Kong
    • Woodstock NY to Arcata CA, Shanghai/Bahrain/Norfolk VA
  • There are several ways to express the same city. Including misspells, San Francisco is expressed variously as:

    • SF
    • San Francisco
    • Bay Area
    • SF Bay Area
    • San Francisco bay area
    • Bay Area, CA
    • San Francisco and the surrounding Bay Area
    • San Fra
    • San Fran
    • San Fransisco
    • San Franscisco
  • Some camps add extraneous information:

    • Reno, Nevada (mostly), with a couple from Salt Lake City, Utah, and California
    • Ibiza. This is were the founding members of the camp met, most people are from different countries but Ibiza is the root
    • Nikola-Lenivets, GPS 54.75782,35.60123
If I do choose to tackle this, my approach will use a combination of normalization and regex to extract locales from the data, and then mapping to get a canonical spelling.

e.g: ["SF", "Bay Area", "SF Bay Area", "San Francisco bay area", "Bay Area, CA", "San Francisco and the surrounding Bay Area", "San Fra", "San Fran", "San Fransisco", "San Franscisco"] => "San Francisco")

At this point, though, I feel like answering this challenge would cease flow on more interesting challenges.

So why not simply leave it? In other words, why not simply treat the camp with the hometown of Los Angeles, San Francisco, New York as in a distinct bucket from San Francisco, Los Angeles, New York?

Well, that's what I'm doing, for now, but it feels inelegant. If the data will impart any insights with respect to understanding where camps are coming from, it will need to be done.

But, this project is about exploring the d3 framework in a fun way, not getting bogged down in implementation.  So, moving on.