Mã VS cho khoa học dữ liệu
TL; DR
VS Code có thể khá tuyệt vời cho khoa học dữ liệu, đặc biệt nếu bạn chọn các phần mở rộng tốt nhất; trong bài viết này, tôi chia sẻ một số tiện ích mở rộng yêu thích và cấu hình của tôi.
VS Code là gì?
Đó là một IDE mã nguồn mở được duy trì bởi Microsoft, công ty cũng sở hữu GitHub và OpenAI.
Nó hiện là IDE phổ biến nhất trong số các lập trình viên [1], rất có thể mở rộng và cá nhân hóa, cũng như xử lý tất cả các công cụ và định dạng tệp mà chúng tôi thường sử dụng.

Phi công phụ

Sự miêu tả
GitHub Copilot là một tiện ích mở rộng IDE gần đây do GitHub phát triển, sử dụng mô hình được OpenAI đào tạo về mã và văn bản, có thể cung cấp tính năng tự động hoàn thành cực kỳ hữu ích. Điều này vượt ra ngoài quy trình tự động hoàn thành thông thường, nơi chúng tôi chỉ nhận được những thứ như danh sách các thuộc tính và phương thức có sẵn của một đối tượng nhất định, để gợi ý cho chúng tôi toàn bộ dòng mã, toàn bộ phương thức và thậm chí cả nhận xét và tài liệu!

Để sử dụng nó, bạn cần đăng ký bản xem trước kỹ thuật :
Tiện ích mở rộng
Tự động định dạng lại và cảnh báo

Sự miêu tả
Nếu bạn sử dụng các thư viện định dạng và in linting chẳng hạn như flake8
và black
(mà bạn nên làm), thì VS Code có thể làm cho nó dễ dàng hơn bằng cách tự động định dạng mã của bạn mỗi khi bạn lưu tệp.
Bạn cũng nhận được cảnh báo trong các dòng mã có thể vi phạm quy tắc kiểu mã.
Tiện ích mở rộng
Không cần phần mở rộng trong trường hợp này, nó đã được đưa vào VS Code theo mặc định!
Tìm kiếm và thay thế

Description
On the sidebar of VS Code, you can find the search tool that quickly finds every mention of what you search for in the entire directory. Furthermore, it allows you to quickly do a replace all, e.g. in case there’s a renaming of a variable.
Extensions
No extensions needed in this case, it’s already included in VS Code by default!
Documentation

Description
Documenting code is very important even if you don’t open source your code. There’s always a chance that you’ll need someone to look through your code and, even for your future self, it could help you remember how it all worked (honestly, just look at the unintelligible gibberish that you just wrote 🤷♂️ ).
Two extensions can be of great help here:
- autoDocstring: generates boilerplate documentation for a function once you type in
"""
. - Copilot: its autocomplete wonders can fully generate your entire documentation, understanding the context of each variable and how your code works (just beware that sometimes it completely misinterprets the meaning of things).
Debugging scripts
Description
Everyone’s prone to face bugs in their code on a somewhat regular basis. In data science, you can even encounter bugs with your data and models. In order to figure out what’s going wrong and fix it, there’s nothing like a good ol’ debugging session.
VS Code allows us to debug both scripts and notebooks, with all of the usual features of breakpoints and a debug console, and the addition of tools such as a data viewer to inspect those suspicious dataframes.
You can configure your debugging setup in a launch.json
, where you can change the setting to also debug external code (through the justMyCode
parameter) and pass in command line arguments (through the args
parameter).
Extensions
No extensions needed in this case, it’s already included in VS Code by default!
Notebooks

Description
Whether you love [2] or hate notebooks [3], they’re a commonly used format to explore data, test out code and run modelling experiments. And while Jupyter Notebooks might have their issues, I personally think that notebooks in VS Code [4] are a significant improvement. You can carry on coding on individual cells and mix in Markdown, but you also get the same toolbox that you have when writing scripts, such as autocompletion, debugging and most other extensions that you might have installed, such as Copilot and your IDE theme. You can also see the time that it took to run each cell and, in my opinion, get a much nicer UI than in Jupyter.
Extensions
Remote access to VMs

Description
Odds are that you’ll run code in a server, a virtual machine or some other hardware that you might want to connect to via SSH. Fortunately, you can connect to any of them and still carry on using your local VS Code installation, by using the Remote SSH extension.
Extensions
Live share / collaborative coding

Description
You know how you can collaboratively edit documents in realtime in Google Docs? Well, you can do the same in VS Code, allowing you to do some pair programming even if you’re not in the same room.
Extensions
Git management


Description
VS Code has embedded support for managing git, allowing you to do pushes and pulls without a terminal. Additionally, I’d recommend using the following extensions:
- GitLens: improves the overall git experience by showing the commits, identifying the latest changes in each line of code, showing how the file looked in previously commited changes, among other features.
- GitHub Pull Requests and Issues: gives access to the GitHub pull requests inside VS Code, also showing the comments on their respective lines of code.
CSVs

Description
People who work on data science and/or machine learning tend to use CSVs for some of their data and having to use something like Excel can be a bit annoying. You can keep using VS Code when dealing with CSVs, especially if you install the following extensions:
- Rainbow CSV: gives each column a colour and provides a hover tip to help guide you through the CSV.
- Edit CSV: adds a spreadsheet-like view where you can edit it in a more structured format and easily add columns or rows.
Markdown

Description
If you’re using Markdown for your README
files, documentation or any other writing, you can do so in VS Code, having the same relevant colouring system as in the rest of your code and seeing a live preview of how your Markdown can be rendered.
Extensions
No extensions needed in this case, it’s already included in VS Code by default!
LaTeX

Description
Those who write papers or books and made the (potentially regrettable) decision to use LaTeX, can also rely on VS Code. You just need to install the LaTeX Workshop extension to get the live preview and all of the features that you’re used to in your IDE setup.
Extensions
Extensions marketplace

While I focused on my favourite extensions, you can find many others on the extensions marketplace.
Some other extensions that I use and recommend are RescueTime, for keeping track of our productivity, and the Night Owl theme for the style 😎
You can check all of the available extensions either in VS Code or in the following link:
settings.json
All of the settings of your VS Code setup that are not your extensions installations are kept in a settings.json
file. Above you can see my current configuration.
Final thoughts
Hopefully, this brief article is helpful to you, potentially convincing you to try out VS Code and the setup that was presented. I do encourage you though to explore other options, both in extensions and IDEs. Indeed, some of the extensions that I shared, such as Copilot, are also available in PyCharm for example.
After reading this, I’d suggest at least giving a try to VS Code’s notebooks alongside GitHub Copilot. You’ll get an amazing platform for code testing, data exploration and model experimentation, which I think that it’s on a whole other level compared with the alternatives.
References
[1] Khảo sát dành cho nhà phát triển Stack Overflow (2021)
[2] Jeremy Howard, I Like Notebooks (2020)
[3] Joel Grus, I Don't Like Notebooks (2018)
[4] Máy tính xách tay Jupyter trong VS Code (2021)