Data Science teams today requires a lot of back-on-forth communication effort to answer a single data question from business requirements.
ย
ใ
ค | Business Analysts (NT) | Data Analysts (Somewhat NT) | Data Scientists | Machine Learning/Data Engineer |
(1) Collect data
E.g: Connect from db source, crawl from public, import from CSV
| Heavily depends on MLE for engineering | Somewhat depends on MLE for engineering | Somewhat depends on MLE for engineering | DIY |
(2) Preprocess data
E.g: Remove outliers, Remove stopwords | Heavily depends on MLE for engineering | Somewhat depends on MLE/DS for engineering | DIY | Somewhat depends on BA/DS for business requirements |
(3.1) Extract low level insights using low-order stats
E.g: mean, avg | Can DIY using excel
Somewhat depends on MLE for engineering | DIY | DIY | DIY |
(3.2) Extract high-level insights using high-order stats
E.g: sentiment, clustering | Heavily depends on MLE/DS for engineering/ML models | Somewhat depends on requirements from BA | Somewhat depends on requirements from DA/BA | Heavily depends on requirements from DA/BA |
(4) Report
E.g: Charts, interactive visualization | DIY using presentation or BI tools
Heavily depends on DS/MLE for complex & interactive chart | DIY using presentation or BI tools
Somewhat depends on DS/MLE for complex & interactive chart | DIY everything
Somewhat depends on requirements from DA/BA | Not necessary |
ย
What do you think? Is there anything else that produce these types of friction:
- unclear communication
- cluttered set of tools that requires heavy coding
- unknown business logic that embeds in code
ย
Note:
- NT: Non technical
- DIY: Do it yourself
- MLE: Machine Learning Engineer
- DS: Data Scientists
- BA: Business Analysts