19 October 2017
Nowadays, there is a significant business advantage in being able analyse, process and visualize "big data". While there is no agreed definition of "big data" it is generally accepted that big data refers to terabytes or larger data sets.
The tools, techniques and systems for working with "big data" is known as data science.
Big problem is the lack of professionals who can work effectively with data on this scale. Fortunately, the Python programming language is packed with libraries designed specifically for data science applications.
To this end, JBI was enagaged to provide Python training with a slant to a team of corporate Data Scientists.
JBI's approach to this Python for Data Science training course delivery began with basics, but quickly built into how to use Python to deal with these large heterogenous data sets.
First stop was understanding how to use the NumPy library. NumPy is a high performance library designed to work with multi-dimensional data that underpins most of the other data science libraries.
Next stop was understanding how to use Pandas and Matplotlib for analysing and visualizing data. Matplotlib in particular has a multitude of ways to present your data.
Next comes the SciKitLearn and SciKitImage libraries for machine learning and image processing. Among the other libraries we needed to consider were web scraping libraries and ways to process spreadsheet information.
The whole built up to a comprehensive coverage of using Python to meet the challenges of today's "big data".
Course duration was 4 days which some delegates found intensive, but rewarding. Hands-on practicals allowed delegates to put theory into practice. with approx 60-70% of time spent on practical content.