20 May 2022
Relevant data may not be directly available to the analyst (may need org permission, support infrastructure in place, different process for "one off" access vs. need to regularly refresh data)
Relevant data may still need to be identified and collected (same as above re. need for infrastructure in place before starting with the analysis job)
Data from different sources need to be integrated into a normalised form, specific issues like record merge, record deduplication, missing attributes need to be tackled. Lack of documentation on schema (e.g. is "customer ID" from database A the same as "customer code" from database B?)
Following org siloes, data may be grouped and accessible by one department (or team, or business unit) but isolated from the rest of the org
When you're hired as "data scientist" but the job is good old BI reporting
Companies want a data scientist (because they've heard data science is
cool) and they expect one person to cover multiple roles (data engineer, backend engineer, dba, analyst, scientist and everything in between)
Your first data science hire should be a data engineer, not a data scientist
(Also the fun part of the job) Research tasks can be more difficult to estimate and time-box, especially in high-risk high-reward R&D efforts.
Need to break down complexity to reduce risk
Data scientist are expected to be expert software engineers, expert statisticians and expert in business domain -- it's more common to have
stats+SW background, still need to be exposed to business domain
knowledge (e.g. in medical applications, need to talk to doctors, in financial applications, need to talk to traders, etc.)
When data science / R&D is completely separate from engineering, there's friction to bring R&D work into production. Need for embedded teams and offer engineering support.
Techniques that work on small datasets may not be suitable for large datasets.
Also solutions that work as small prototypes may not be adequate for handling large datasets.
Related to "unrealistic expectations", data scientists are expected to have deep expertise in a broad variety of tools and techniques but it's quite easy to have blind spots.
Collated by JBI's instructors based on course delegate feedback from the following courses: