Objectives

Get experience in machine learning with Python
Have use cases of Python external packages on my end
Go over package manuals and online references whenever needed

Background

To get comfortable with using external packages in Python, I started a project regarding a supervised learning and classification. I downloaded a dataset, explored it, tested out a few models and classifiers, and wrote analysis about them.

Not only I got more familiar with those packages, but I also started to look at manuals written by package developers. I used to read only answers in Stack Overflow whenever I needed an answer for my problems, but I started to face problems that are hard to solve if I don’t get into internals of a package. For example, there wasn’t a simple way to horizontally align items of a legend of a seaborn.catplot. I decided to look into the internals of a seaborn.catplot, and figured out that matplotlib.offsetbox.VPacker makes items to align vertically. I replaced it with HPacker of the same module, and successfully aligned items horizontally. By adjusting numbers in fig._legend.set_bbox_to_anchor(), I was able to move a legend to the top as well.

How it works

See here for description of a problem, the dataset I used, and approaches I have taken to solve that problem.

Learning outcomes

I improved my skills to:

use visualization tools such as matplotlib.pyplot and seaborn
use data manipulation tools such as pandas and patsy
use statistics-related packages such as numpy, scipy, and statsmodels.api
explore manuals, references, and a Python source code

I also learned:

the importance of documentation of my work
the usefulness of raising errors in my function whenever appropriate

Next steps

~~Publish some functions used in the analysis as a package to PyPI~~ Ongoing, see statspark repository

Credit card fraud detection

Junkyu Park

Objectives

Background

How it works

Learning outcomes

Next steps