
Introduction
Here is my journey as a data analyst, software engineer, data scientist, and engineer.
To simplify things, I will use the term AI to represent machine learning (ML), natural language processing (NLP), computer vision (CV), and deep learning (DL).
Data Analyst
I think the job of a data analyst is to analyze and derive business insights from the data.
When I was working as a data analyst, the main tools I used were spreadsheets, business intelligence tools (Qlik, Tableau, Power BI), and some programming.
Here are a few things you need to do to excel in this role:
- Be detail-oriented. You must be able to spot mistakes that others will miss
- Understand the business objective. Produce content that will contribute to marketing or sales
- Clear communication. Deliver the insights to the target audience in a clear and easy-to-understand format
After two years of working as a data analyst, I have gotten decent at building monthly reports and dashboards. I have worked with data in various industries (mobile dating, startups accelerator, logistics). I see that most of the data doesn't differ much, and I felt ready to take on a new challenge.
Software Engineer
Back then, I was focused on building my data visualization skills. I felt limited by the BI tools. I decided to learn how to create them using JavaScript. It seems like the first step for me to transition into the world of software engineering (SE).
Without much background in SE, I had a steep learning curve. Since the job was to build data visualization for the internal web dashboard, I decided to focus on front-end engineering.
SE is a means to an end, not an end in itself. I told myself not to go down the SE rabbit hole. Eventually, I realized that it's hard to focus on one area in SE and ignore the rest, as they are connected. Since then, I have still been learning about the different areas of SE.
SE tips:
- Learn a language by building something
- Don't drift too far and focus on solving the task
- Learn by reading experts' code
After working as a Software Engineer who focuses on building data visualization (D3.js), I thought that my skillset was too niche. BI tools such as Tableau are sufficient for most companies to answer their data questions. It will not be cost-effective for them to hire an engineer to focus on building dashboards and reports. I got to pivot into a more general role with higher demands. I knew that data science is my passion, so I got a job as a Data Scientist.
Data Scientist
In this job, I am both a Data Scientist and a consultant. The company I worked for provides ML software to help financial institutions with fraud detection. As the bank keeps the data center on-premises, I have to work in the bank every day as a vendor.
As the second Data Scientist in a company with fewer than 15 employees, I got to work with different aspects of solution delivery. First, I will set up the equipment and install all the essential software required for the ML software. Then I will have to look through and confirm if the bank provides the necessary data. Once verified, I will conduct data exploration, cleaning, transformation, and training of ML models. Finally, when the results look promising, we will present to the client our proof of concept (POC). For a POC that succeeds and becomes a production project, I will spend months in a single bank. I work closely with the Software Engineers to iterate and improve the ML software with new insights and outliers.
At the start, I enjoyed the job a lot as I got to work with the financial (big) data and different data schemes by the banks. But after 4 POC, when I knew what data points to look for, it became a routine. While I still discover small insights sometimes, I feel that my rate of growth slows down. I felt like a DS generalist. I can do ML and NLP. DL is not required in this company as the financial regulators do not accept a black-box ML solution. Specialization is not required, as a simple solution works well enough for most cases.
Understanding the limitations of AI, I only trust it for data exploration purposes. I will not bet my money or life on it to make the final decision. I think only companies like Google have the resources (data, money, brains) to build a reliable general-purpose AI. For a startup, the only way to survive is to focus on building a specialized AI that only does one thing, but is the best in the market.
I see myself as an executor rather than a researcher. Spending the majority of the time reading research papers to improve the model by 0.1% accuracy is not what I want. I know that in the long term, I want to start my own company, and SE will be more useful than DS. Hence, I look for a job as a data engineer/architect. It is a role that requires both SE & DS skills.
Data Scientist tips
- Clear communication is key. Normal people don't understand the model you built, you have to make it clear and simple
- Storytelling. People rely on ML predictions to get a rough idea, but do not trust them to make any serious decision, especially when money is at stake. You have to convince people why they should continue to hire you when it's a system that they can probably live without in most cases
- Software engineering. You may not be the one who codes everything, but you need to write decent code and have a good understanding of basic SE to work flawlessly with your software engineers
Data Engineer / Architect
As always, in a small startup, many things are barely functioning. I have the opportunity to design and build the data architecture without obstacles since I am the only person doing it.
With my software engineering and data science experience, I can see the gap between software and data teams.
I will use database design as an example.
To the software team, database design is to make sure the data is normalized, relationships are defined, and it's clean and fast.
To the data team or non-technical team, data normalization is unnatural and doesn't make sense. Why would you split the data into different tables instead of consolidating everything in a table? To build an ML model, we will need all the available features in columns (denormalized data), the complete opposite of what the software team is doing.
In a big company, the solution is data warehouses. The software team can design their normalized data in the database, while the data team can get their consolidated data in the data warehouse.
In a startup with limited resources, a data warehouse is not the solution, as it will incur extra hardware costs.
My solution is to build both normalized and denormalized versions of the same data inside a database, one version for each team.
It would not be a problem for me to rebuild if necessary, since I am the creator. I am in charge of the data team. I have data scraping, pipelines, modeling, infrastructures, databases, APIs, and R&D projects to handle. With limited time, I know I can live with this compromise. Startups move fast, and most wouldn't survive long enough for you to worry about scalability issues. I need to balance the speed of execution and the scalability of my design.
Data Platform / Infrastructure Engineer
This role is in an MNC. My team offers data tools and a platform for data professionals (analysts, scientists, and engineers).
I build automations and services to automate the creation of these virtual machines and containers with these tools fully set up, so that these professionals do not have to set up, maintain, and update them.
Working here, I meet many people with decades of experience in their "specialization". But I haven't met any experts. They are senior in terms of age, YOE, and corporate rank. However, in terms of skills, they are junior. This is an objective view because most do not code. I will classify them as IT support and not developers. These IT support staff have their high corporate rank because they have stayed long enough and are now leading teams, making technical decisions. The projects are in chaos because they do not have experience in developing, hence all their decisions are "as good as" a non-technical business person.
From a business owner's perspective, these seniors IT supports are waste of resources. Firstly, budget, we can hire at least 2 computer science graduates who can code for that salary. Secondly low low-quality work, they have YOE doing the wrong things. No automation, all low-quality manual work, because they have been doing the same thing for years and are tired. Thirdly arrogance, they have high corporate rank, anyone with lower rank is not worthy of their time and respect. Yet, the actual work are done by the younger generation. Fourth toxic behavior, aggregating the above 3 points, they have difficulty finding a new job that pays at similar level as they are overpaid and severely unqualified. No matter how unhappy they are, they can't leave, thus the toxicity builds up over the years. The young generation are on average more technically capable than these senior IT supports.
From an employee's prespective, if you are an average employee, this is the best company. You don't have to be outstanding. So long you stay long enough, you will eventually reach that level. The company pays Singapore market rate, so there are very few companies that pay better. The pay is fair for the work expected to be done. But if you are high performer, this is not ideal. Because you will be carrying others while not compensated proportionally. You have 2 options, go to companies with above market rate compensation, or start your own business.