We have been here, in the blog, talking about Machine Learning sometimes. The purpose of this series of articles is to go a little bit further and to explore a bit more the Machine Learning space and its relation with Python.
All the information in a more technical shape and the small scripts can be found at my GitHub account under the project python-ml.
One of the questions that it is worth to discuss is, why Python?
Available languages for Machine Learning
It is clear that you can use a lot of different languages to implement Machine Learning algorithms and programs but, looking at the space and popularity you can easily see a tendency and preference for four of them.
- Python
- It is the leader of the race right now due to the simplicity and its soft learning curve.
- It is especially good and successful for beginners, in both, programming and Machine Learning.
- The libraries ecosystem and community support are huge.
- R
- It is designed for statistical analysis and visualization, it is used frequently to unlock patterns in big data blocks.
- With RStudio, developers can easily build algorithms and statistical visualization.
- It is a free alternative to more expensive software like Matlab.
- Matlab
- It is fast, stable and secure for complex mathematics.
- It is considered as a hardcore language for mathematicians and scientists.
- Julia
- Designed to deal with numerical analysis needs and computational science.
- The base Julia library was integrated with C and Fortram open source libraries.
- The collaboration between the Jupyter and Julia communities, it gives Julia a powerful UI.
Some important metrics to consider when choosing a language should be:
- Speed.
- Learning curve.
- Cost.
- Community support.
- Productivity.
Here we can classify our languages as follows:
- Speed: R is basically a statistical language and it is difficult to beat in this context.
- Learning curve: Here depends on the person’s knowledge. R is closer to the functional languages as opposite to python that is closer to object-oriented languages.
- Cost: Only Matlab is not a free language. The other languages are open source.
- Community: All of them are very popular but, Python has a bigger community and amount of resources available.
- Productivity: R for statistical analysis, Matlab for computational vision, bio-informatics or biology is the playground of Julia and, Python is the king for general tasks and multiple usages.
The decision, at the end of the day, is about a balance between all the characteristics seen above, our skills and the field we are or the tasks we want to implement.
In my case, I am going to choose Python as probably all of you have assumed because it is like a swiss knife and, at this point, the beginning, I think this is important. There is always time later to focus on other things or reduce the scope.
IDEs
There are multiple IDEs that support Python. As a very extended language, there are multiple tools and environments we can use. Here just take the one you like the more.
If you do not know any IDE or platform, there are two of them that a lot of Data Scientist use:
I do not know them. As a developer, I am more familiar with Visual Studio Code or IntelliJ, and I will be using one of them probably unless I discover some exciting functionality or advantage in one of the other.