Objectives
- Get experience in machine learning with Python
- Have use cases of Python external packages on my end
- Go over package manuals and online references whenever needed
Background
To get comfortable with using external packages in Python, I started a project regarding a supervised learning and classification. I downloaded a dataset, explored it, tested out a few models and classifiers, and wrote analysis about them.
Not only I got more familiar with those packages, but I also started to look at manuals written by package developers. I used to read only answers in Stack Overflow whenever I needed an answer for my problems, but I started to face problems that are hard to solve if I don’t get into internals of a package. For example, there wasn’t a simple way to horizontally align items of a legend of a seaborn.catplot
. I decided to look into the internals of a seaborn.catplot
, and figured out that matplotlib.offsetbox.VPacker
makes items to align vertically. I replaced it with HPacker
of the same module, and successfully aligned items horizontally. By adjusting numbers in fig._legend.set_bbox_to_anchor()
, I was able to move a legend to the top as well.
How it works
See here for description of a problem, the dataset I used, and approaches I have taken to solve that problem.
Learning outcomes
I improved my skills to:
- use visualization tools such as
matplotlib.pyplot
andseaborn
- use data manipulation tools such as
pandas
andpatsy
- use statistics-related packages such as
numpy
,scipy
, andstatsmodels.api
- explore manuals, references, and a Python source code
I also learned:
- the importance of documentation of my work
- the usefulness of raising errors in my function whenever appropriate
Next steps
Publish some functions used in the analysis as a package to PyPIOngoing, seestatspark
repository