Asia Pacific Startups Analysis
In this project, I scraped and analyzed Asia Pacific startups. I am interested to identify startups distribution in APAC and outliers.
Project Overview
The objective of this data science project is to discover early-stage (<= Series A) Asia Pacific (APAC) startups with the potential to become a unicorn (> $1 billion USD valuation).
Tech objectives:
- To scrape data from an infinite scrolling website
- Interactive data visualizations
I am a globalist. Before I look beyond APAC, I want to have an overview of my surroundings.
I am interested in startups that:
- Build unique product / service
- Consist of 1st class executor
- Solve near mission-impossible level challenges on the path to become a unicorn
- Are not an "uber" of something
Tech in Asia (TIA) is a Y-combinator alumni Singapore startup. The Asia version of Techcrunch and Crunchbase.
TIA plays a part in helping Asia tech scene, so I will not share the data nor the method I used to get it.
Extract, Transform, Load (ETL)
From the data, I managed to reverse engineer and get parts of the TIA database schema.
TIA has invested decent time in compiling the companies data as this is the cleanest data I have scraped.
TIA updates the data regularly. The number of companies increases from 57306
to 57620
between 2019-05-26
and 2019-06-01
.
Based on the data, I do see some problems, so it's not perfect.
I manage to gauge the strength of TIA's Data & Software Engineer from this process.
Choosing data visualization tools
#1. D3.js
D3 is a JS library for manipulating documents based on data. It helps to combines powerful visualization components and a data-driven approach to DOM manipulation.
D3 is the de facto standard for building complex data visualizations on the web.
Usually, engineers use backend language (Python, Ruby, Java) for the ETL process and pass the finalized data to the frontend. With D3, you can do ETL at the frontend directly.
D3 is a low-level JS library with a steep learning curve. To fully utilize D3, you need:
- Mastery of data structure: You will be dealing with deeply nested data
(array => object => array => object)
- Mastery of JS: You will need to build your custom JS code for ETL purposes
The data to feed to D3 will bloat my web app. I am outdated (v3) with D3 (v5). I do not have the time nor interest at this moment to update my D3 knowledge.
#2. Qlik
Based on my experience using Qlik (in 2015), the performance is horrible even if you only sync specific tables from the database.
The charts available are limited and not modern.
#3. Tableau
Similar problems with Qlik.
I'm not going to pay for a BI tool unless:
- It has proprietary, out of this world data visualization
- It builds a complex chart faster than I can code => which I doubt so
My chosen tool: Bokeh
Bokeh is a python interactive visualization library that targets modern web browsers for presentation.
It is a tough decision choosing between Bokeh and Plotly. I choose Bokeh because it has a stronger community.
Startups countries - Choropleth map
Please watch this video to understand how to interact with this visualization:
This choropleth map is zoom to Singapore by default. I do this because:
- Singapore is too small to detect
- Singapore has the highest count, mean and total fundings in APAC countries (excluding China and India). Singapore is in the top 10 countries in the world in terms of GDP per capita, so I expect nothing less in terms of the stats
The grey region on the map implies that I do not have data for these countries. Please click on the wheel-zoom button for the mouse scroll to work. The position (x-axis or y-axis) where you scroll the mouse affects the zoom, be sure to place the mouse on the right axis.
Funding series distribution per year - Grouped bar chart
From here onwards, I am only using APAC countries startups data.
Video:
The year used is the founding date of these APAC startups.
Observations:
- 2015 has the highest number of startups. I am not sure why 2015 is unique, interesting
- The number of startups decreases from 2015 across all the different funding stages
Funding series distribution per category - Grouped bar chart
Video:
This visualization helps to identify outlier startups based on their uncommon funding raised for their series.
In the seed round, 48 startups raised between $10-$100 Million USD. 1 startup raised $100-$500 Million USD. There are a few explanations:
- TIA gather the wrong data
- The number is not in USD
- These startups are truly understanding
If these startups are so understanding, getting their shares will instantly make you a millionaire on paper.
Industries distribution per year - Stacked bar chart
For this chart, I only use the year starting from 2000. The earliest year I have seen in this data is 1804.
Video:
The idea is to see if we can observe any trend in the startup industries. Perhaps in late 2017 and early 2018, we see more AI startups due to the hype.
I think industries information is not that critical for entrepreneurs. After all, you wouldn't change your industry just because it's less popular now.
Final thoughts
From the data, I can do more, such as social media scoring and APAC VC analysis, but for this article, I decided to focus on solely the startups.
Now, I have a list of startups to keep an eye on.
In time, when I become a world-class executor, I would like to get a piece of you
Published: 2019-06-09 | Updated: 2021-04-02