Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

begin
import PlutoUI
PlutoUI.TableOfContents(title="Intro to Julia")
end
410 ms

Julia and Modern Machine Learning

Looking at the Julia's current surge in popularity in scientific fields, it makes sense to start considering Julia for doing researching in the machine learning.

I will be giving a high-level tutorial on what you need to know about Julia for our course. The goals of this tutorial are:

  • to get you set up to use Julia for the purposes of the UofA ML Undergrad Courses,

  • introduce some core aspects of how to use Julia,

  • point to key packages to take advantage of,

  • and give resources for learning more.

This tutorial is written in Pluto notebooks, a new take on what an interactive notebook can be built from the ground up in Julia.

md"""
# Julia and Modern Machine Learning

Looking at the Julia's current surge in popularity in scientific fields, it makes sense to start considering Julia for doing researching in the machine learning.

I will be giving a high-level tutorial on what you need to know about Julia for our course. The goals of this tutorial are:
- to get you set up to use Julia for the purposes of the UofA ML Undergrad Courses,
- introduce some core aspects of how to use Julia,
- point to key packages to take advantage of,
- and give resources for learning more.

This tutorial is written in Pluto notebooks, a new take on what an interactive notebook can be built from the ground up in Julia.


"""
16.5 ms

Why Julia

Julia is a modern language with numerics at its core. It is abstract and flexible like python, numerically focused like Matlab, and can be optimized to be as fast as c/c++/fortran. It is also the first language to seriously use multiple dispatch as a core design philosophy. While this may be hard to get used to when coming from OO languages, the Julia userbase has noticed some interesting properties suggesting its effectiveness in code reuse link. Julia also has built-in utilities for threading (go-style coroutines, and pthread style for loops), multi-processing, and efforts for language-wide auto differentiation tools with a unified and extensible approach. There is also support for GPU computations, including writing your own GPU kernels.

You may be asking yourself, why should we think about using Julia as Python is ubiquious in the field? I won't try and convince you in this notebook, but here are a few reasons I have for using Julia:

  • Mulitple dispatch can often make code easier to use and extend as compared to OOP.

  • Arrays are well thought about in the base language, so there is uniformity in design principles across numeric arrays, arrays for generic data, and arrays on specialized hardware (like GPUs). Core linear algebra is also a priority and a part of Base.

  • Solves the two language problem: code can be effiecient or easy to read (and often both!) all in julia, so there is little need to turn to c or fortran for really efficient code. Julia is also a part of the exclusive petaflop club: https://www.avenga.com/magazine/julia-programming-language/.

  • Threads and Multi-processing are both a part of the base language and easy to use.

These are only a few reasons, and more extensive lists can be found elsewhere. If you have further questions you can ask Matt Schlegel about why he uses Julia in his ML/RL research and doesn't see a return to Python.

md"""
### Why Julia

Julia is a modern language with numerics at its core. It is abstract and flexible like python, numerically focused like Matlab, and can be optimized to be as fast as c/c++/fortran. It is also the first language to seriously use multiple dispatch as a core design philosophy. While this may be hard to get used to when coming from OO languages, the Julia userbase has noticed some interesting properties suggesting its effectiveness in code reuse [link](https://www.youtube.com/watch?v=kc9HwsxE1OY). Julia also has built-in utilities for threading (go-style coroutines, and pthread style for loops), multi-processing, and efforts for language-wide auto differentiation tools with a unified and extensible approach. There is also support for GPU computations, including writing your own GPU kernels.


You may be asking yourself, why should we think about using Julia as Python is ubiquious in the field? I won't try and convince you in this notebook, but here are a few reasons I have for using Julia:

* Mulitple dispatch can often make code easier to use and extend as compared to OOP.
* Arrays are well thought about in the base language, so there is uniformity in design principles across numeric arrays, arrays for generic data, and arrays on specialized hardware (like GPUs). Core linear algebra is also a priority and a part of Base.
* Solves the two language problem: code can be effiecient or easy to read (and often both!) all in julia, so there is little need to turn to c or fortran for really efficient code. Julia is also a part of the exclusive petaflop club: https://www.avenga.com/magazine/julia-programming-language/.
* Threads and Multi-processing are both a part of the base language and easy to use.

These are only a few reasons, and more extensive lists can be found elsewhere. If you have further questions you can ask Matt Schlegel about why he uses Julia in his ML/RL research and doesn't see a return to Python.
"""
32.7 ms

What about Python?

Julia gives you the tools and flexibility to work at an abstract level (like Python) with the ability to work at a low level to optimize numerical code (like C/C++). While Python lets you write high-level abstract code, all the optimized numerical code is written in C/C++. This means operations not supported by NumPy will either need to be written in C or will be slow. Python can also be quite error-prone for new users, as certain operations are legal but not what you intended (e.g., dot product versus element-wise product) and certain language features can cause the code to become very slow (e.g., for loops). While some of these issues are being actively tackled by projects such as Numba and Cython, third-party package developers need to have explicit buy-in to these systems and develop code with these in mind. Chris Rackauckas has an excellent blog post discussing the core limitations to these approaches link.

This is not to say Python is not an excellent language, it is and has become quite popular becuase of the trade-offs it makes. It is hard to predict if Julia will become a language of choice over other common languages for data analysis (Python, R, Matlab), but it has the potential to become widely used and has a growing user base, and has the potential to kick the machine learning field out of its design run caused by monolithic highly optimized kernels.

md"""
### What about Python?

Julia gives you the tools and flexibility to work at an abstract level (like Python) with the ability to work at a low level to optimize numerical code (like C/C++). While Python lets you write high-level abstract code, all the optimized numerical code is written in C/C++. This means operations not supported by NumPy will either need to be written in C or will be slow. Python can also be quite error-prone for new users, as certain operations are legal but not what you intended (e.g., dot product versus element-wise product) and certain language features can cause the code to become very slow (e.g., for loops). While some of these issues are being actively tackled by projects such as Numba and Cython, third-party package developers need to have explicit buy-in to these systems and develop code with these in mind. Chris Rackauckas has an excellent blog post discussing the core limitations to these approaches [link](https://www.stochasticlifestyle.com/why-numba-and-cython-are-not-substitutes-for-julia/).

This is not to say Python is not an excellent language, it is and has become quite popular becuase of the trade-offs it makes. It is hard to predict if Julia will become a language of choice over other common languages for data analysis (Python, R, Matlab), but it has the potential to become widely used and has a growing user base, and has the potential to kick the machine [learning field out of its design run](https://dl.acm.org/doi/10.1145/3317550.3321441) caused by monolithic highly optimized kernels.
"""
428 μs

Design patterns

While I won't go into detail about design patterns which emerge from Julia's multiple dispatch and typing system, you should read this blog by Christopher Rackauckas (who is an active user of the language doing research in applying ML/AI methods to Scientific pursuits).

I've noticed after introducing several students to Julia the hardest hurdle is understanding how to organize code. This is not new, and when faced with any new paradigm it can be daunting to try and understand how things are organized. Unlike other modern paradigms like OOP there is very little in the way of patterns that need to be known (think gang of four). This is a property of a multiple dispatch language and the seperation of functions/methods from types. You can see more of the neat properties of multiple dispatch in this video.

md"""
# Design patterns

While I won't go into detail about design patterns which emerge from Julia's multiple dispatch and typing system, you should read [this blog](https://www.stochasticlifestyle.com/type-dispatch-design-post-object-oriented-programming-julia/) by Christopher Rackauckas (who is an active user of the language doing research in applying ML/AI methods to Scientific pursuits).

I've noticed after introducing several students to Julia the hardest hurdle is understanding how to organize code. This is not new, and when faced with any new paradigm it can be daunting to try and understand how things are organized. Unlike other modern paradigms like OOP there is very little in the way of patterns that need to be known ([think gang of four](https://en.wikipedia.org/wiki/Design_Patterns)). This is a property of a multiple dispatch language and the seperation of functions/methods from types. You can see more of the neat properties of multiple dispatch in [this video](https://www.youtube.com/watch?v=kc9HwsxE1OY).

"""
420 μs

Other resources

md"""
# Other resources
"""
163 μs
md"""
### Good practices

- [Style Guide](https://docs.julialang.org/en/v1/manual/style-guide/)
- [Differences from other Languages](https://docs.julialang.org/en/v1/manual/noteworthy-differences/)
- [Performance Tips](https://docs.julialang.org/en/v1/manual/performance-tips/)

"""
471 μs

Useful packages and projects

  • Pluto: reactive notebooks written in and for julia

  • IJulia: jupyter kernel

  • Revise: for command line development

  • Julia for VSCode: an editor outside of pluto/jupyter

  • Plots: an extensive plotting package

  • Flux: neural network package

and many more...

md"""
### Useful packages and projects

- [Pluto](https://github.com/fonsp/Pluto.jl): reactive notebooks written in and for julia
- [IJulia](https://github.com/JuliaLang/IJulia.jl): jupyter kernel
- [Revise](https://timholy.github.io/Revise.jl/stable/): for command line development
- [Julia for VSCode](https://www.julia-vscode.org): an editor outside of pluto/jupyter
- [Plots](http://docs.juliaplots.org/latest/): an extensive plotting package
- [Flux](https://fluxml.ai): neural network package

and many more...

"""
745 μs
md"""
### Other resources for learning julia

- [Julia for Beginners](https://www.youtube.com/watch?v=ub3tqCWZmo4)
- [Documentation](https://docs.julialang.org/en/v1/)
- [Tutorials on the Julia website](https://julialang.org/learning/tutorials/)

"""
481 μs

Hurdles and known sharp edges

So far we've discussed several positive aspects of julia, but like with any language there are edges where the design of the language can be frustrating. While we won't discuss these here, a great list has been compiled here.

md"""
### Hurdles and known sharp edges

So far we've discussed several positive aspects of julia, but like with any language there are edges where the design of the language can be frustrating. While we won't discuss these here, a great list has been compiled [here](https://viralinstruction.com/posts/badjulia/).
"""
271 μs

Basics

In this section, we will be going over the basics of using the julia language. This should be enough for most of what you will need for the machine learning course.

  • variables

  • variable scope

  • types

  • arrays and other collections

  • loops

  • structs and data

  • linear algebra/mathematical operations

  • Style guide and other tips/recommendations

md"""
# Basics


In this section, we will be going over the basics of using the julia language. This should be enough for most of what you will need for the machine learning course.


- variables
- variable scope
- types
- arrays and other collections
- loops
- structs and data
- linear algebra/mathematical operations
- Style guide and other tips/recommendations


"""
649 μs

Variables

Variables in julia are created and assigned like most dynamic languages. In julia, every variable is given a type, and if that variable remains the same type throughout its scope it is considered type-stable. You can inspect the typeof of a variable using typeof, as seen below.

md"""
## Variables

Variables in julia are created and assigned like most dynamic languages. In julia, every variable is given a type, and if that variable remains the same type throughout its scope it is considered type-stable. You can inspect the typeof of a variable using `typeof`, as seen below.


"""
5.6 ms
x
10.0
x = 10.0
9.0 μs
Float64
typeof(x)
9.8 μs

Variable Scope in Julia/Pluto

Variables are scopped in Julia/Pluto in a similar way to Python/Jupyter. Variables defined in a cell are global and available across all the cells. This can sometimes cause problems in Jupyter notebooks where variables can be overwritten unknowingly. In Pluto, you are unable to re-use a variable by accident (see below for y), Pluto doesn't know which definition you mean and throws an error for both.

md"""
## Variable Scope in Julia/Pluto

Variables are scopped in Julia/Pluto in a similar way to Python/Jupyter. Variables defined in a cell are global and available across all the cells. This can sometimes cause problems in Jupyter notebooks where variables can be overwritten unknowingly. In Pluto, you are unable to re-use a variable by accident (see below for `y`), Pluto doesn't know which definition you mean and throws an error for both.
"""
284 μs

Multiple definitions for y.

Combine all definitions into a single reactive cell using a `begin ... end` block.

y = 11
---

Multiple definitions for y.

Combine all definitions into a single reactive cell using a `begin ... end` block.

y = 12
---

While a win for reproducibility and simplicity, this can be annoying when you have throw-away variables you wan to use/name. Fortunately, there is a way around this!

md"""
While a win for reproducibility and simplicity, this can be annoying when you have throw-away variables you wan to use/name. Fortunately, there is a way around this!
"""
239 μs

Begin and Let Blocks

The begin and let blocks are both used to make multi-statement cells in our notebook.

The begin block makes all the variables available at the global scope:

md"""
### Begin and Let Blocks

The begin and let blocks are both used to make multi-statement cells in our notebook.

The begin block makes all the variables available at the global scope:
"""
287 μs
200
begin
my_global_x = 10
# do other stuff
my_global_x *= 20
end
13.8 μs
200
my_global_x
11.0 μs
Loading...