Top 10 Pain Points for Data Scientists working in the real world

- Access to relevant data

Relevant data may not be directly available to the analyst (may need org permission, support infrastructure in place, different process for "one off" access vs. need to regularly refresh data)

- Data availability

Relevant data may still need to be identified and collected (same as above re. need for infrastructure in place before starting with the analysis job)

- Data Integration

Data from different sources need to be integrated into a normalised form, specific issues like record merge, record deduplication, missing attributes need to be tackled. Lack of documentation on schema (e.g. is "customer ID" from database A the same as "customer code" from database B?)

- Data Siloes

Following org siloes, data may be grouped and accessible by one department (or team, or business unit) but isolated from the rest of the org

- Data scientist as a vanity title

When you're hired as "data scientist" but the job is good old BI reporting

- Unrealistic expectations

Companies want a data scientist (because they've heard data science is

cool) and they expect one person to cover multiple roles (data engineer, backend engineer, dba, analyst, scientist and everything in between)

- Leadership has no data science experience (see above)

- No infrustructure or support in place

Your first data science hire should be a data engineer, not a data scientist

- Working with uncertainty

(Also the fun part of the job) Research tasks can be more difficult to estimate and time-box, especially in high-risk high-reward R&D efforts.

Need to break down complexity to reduce risk

- Access to business domain experts

Data scientist are expected to be expert software engineers, expert statisticians and expert in business domain -- it's more common to have

stats+SW background, still need to be exposed to business domain

knowledge (e.g. in medical applications, need to talk to doctors, in financial applications, need to talk to traders, etc.)

- Friction between R&D and production

When data science / R&D is completely separate from engineering, there's friction to bring R&D work into production. Need for embedded teams and offer engineering support.

- Forcing the favourite "agile" methodology Especially with R&D efforts, data science projects don't necessarily fit in the exact same frameworks used for software projects

- Process in place "because Google does it" (or Microsoft, Amazon, Netflix, Spotify, ...)

- Scalability

Techniques that work on small datasets may not be suitable for large datasets.

Also solutions that work as small prototypes may not be adequate for handling large datasets.

- Gap in skillset

Related to "unrealistic expectations", data scientists are expected to have deep expertise in a broad variety of tools and techniques but it's quite easy to have blind spots.

Collated by JBI's instructors based on course delegate feedback from the following courses:

Power BI training course

Power BI Beyond the basics training course

Python Data Analysis training course

Tableau training course

About the author: Craig

Craig is a self-confessed geek who loves to play with and write about technology. Craig's especially interested in systems relating to e-commerce, automation, AI and Analytics.