Virtualization makes it doable to run a number of digital machines (VMs) on a single piece of bodily {hardware}. These VMs behave like unbiased computer systems, however share the identical bodily computing energy. A pc inside a pc, so to talk.
Many cloud companies depend on virtualization. However different applied sciences, corresponding to containerization and serverless computing, have turn into more and more essential.
With out virtualization, lots of the digital companies we use daily wouldn’t be doable. In fact, this can be a simplification, as some cloud companies additionally use bare-metal infrastructures.
On this article, you’ll discover ways to arrange your personal digital machine in your laptop computer in just some minutes — even when you have by no means heard of Cloud Computing or containers earlier than.
Desk of Contents
1 — The Origins of Cloud Computing: From Mainframes to Serverless Structure
2 — Understanding Virtualization: Why it’s the Foundation of Cloud Computing
3 — Create a Digital Machine with VirtualBox
Ultimate Ideas
The place are you able to proceed studying?
1 — The Origins of Cloud Computing: From Mainframes to Serverless Structure
Cloud computing has essentially modified the IT panorama — however its roots return a lot additional than many individuals suppose. Actually, the historical past of the cloud started again within the Fifties with big mainframes and so-called dumb terminals.
- The period of mainframes within the Fifties: Firms used mainframes in order that a number of customers may entry them concurrently through dumb terminals. The central mainframes have been designed for high-volume, business-critical information processing. Massive firms nonetheless use them right this moment, even when cloud companies have lowered their relevance.
- Time-sharing and virtualization: Within the subsequent decade (Nineteen Sixties), time-sharing made it doable for a number of customers to entry the identical computing energy concurrently — an early mannequin of right this moment’s cloud. Across the identical time, IBM pioneered virtualization, permitting a number of digital machines to run on a single piece of {hardware}.
- The beginning of the web and web-based functions within the Nineties: Six years earlier than I used to be born, Tim Berners-Lee developed the World Vast Internet, which revolutionized on-line communication and our total working and residing atmosphere. Are you able to think about our lives right this moment with out web? On the identical time, PCs have been turning into more and more common. In 1999, Salesforce revolutionized the software program trade with Software program as a Service (SaaS), permitting companies to make use of CRM options over the web with out native installations.
- The massive breakthrough of cloud computing within the 2010s:
The trendy cloud period started in 2006 with Amazon Internet Companies (AWS): Firms have been in a position to flexibly lease infrastructure with S3 (storage) and EC2 (digital servers) as a substitute of shopping for their very own servers. Microsoft Azure and Google Cloud adopted with PaaS and IaaS companies. - The trendy cloud-native period: This was adopted by the subsequent innovation with containerization. Docker made Containers common in 2013, adopted by Kubernetes in 2014 to simplify the orchestration of containers. Subsequent got here serverless computing with AWS Lambda and Google Cloud Capabilities, which enabled builders to jot down code that mechanically responds to occasions. The infrastructure is absolutely managed by the cloud supplier.
Cloud computing is extra the results of many years of innovation than a single new know-how. From time-sharing to virtualization to serverless architectures, the IT panorama has constantly developed. Right now, cloud computing is the inspiration for streaming companies like Netflix, AI functions like ChatGPT and world platforms like Salesforce.
2 — Understanding Virtualization: Why Virtualization is the Foundation of Cloud Computing
Virtualization means abstracting bodily {hardware}, corresponding to servers, storage or networks, into a number of digital cases.
A number of unbiased methods may be operated on the identical bodily infrastructure. As an alternative of dedicating a complete server to a single software, virtualization permits a number of workloads to share assets effectively. For instance, Home windows, Linux or one other atmosphere may be run concurrently on a single laptop computer — every in an remoted digital machine.
This protects prices and assets.
Much more essential, nevertheless, is the scalability: Infrastructure may be flexibly tailored to altering necessities.
Earlier than cloud computing grew to become extensively out there, firms usually needed to keep devoted servers for various functions, resulting in excessive infrastructure prices and restricted scalability. If extra efficiency was out of the blue required, for instance as a result of webshop site visitors elevated, new {hardware} was wanted. The corporate had so as to add extra servers (horizontal scaling) or improve current ones (vertical scaling).
That is completely different with virtualization: For instance, I can merely improve my digital Linux machine from 8 GB to 16 GB RAM or assign 4 cores as a substitute of two. In fact, provided that the underlying infrastructure helps this. Extra on this later.
And that is precisely what cloud computing makes doable: The cloud consists of big information facilities that use virtualization to supply versatile computing energy — precisely when it’s wanted. So, virtualization is a elementary know-how behind cloud computing.
How does serverless computing work?
What in case you didn’t even should handle digital machines anymore?
Serverless computing goes one step additional than Virtualization and containerization. The cloud supplier handles most infrastructure duties — together with scaling, upkeep and useful resource allocation. Builders ought to give attention to writing and deploying code.
However does serverless actually imply that there are not any extra servers?
In fact not. The servers are there, however they’re invisible for the person. Builders not have to fret about them. As an alternative of manually provisioning a digital machine or container, you merely deploy your code, and the cloud mechanically executes it in a managed atmosphere. Assets are solely supplied when the code is working. For instance, you need to use AWS Lambda, Google Cloud Capabilities or Azure Capabilities.
What are the benefits of serverless?
As a developer, you don’t have to fret about scaling or upkeep. Which means that if there may be much more site visitors at a specific occasion, the assets are mechanically adjusted. Serverless computing may be cost-efficient, particularly in Perform-as-a-Service (FaaS) fashions. If nothing is working, you pay nothing. Nonetheless, some serverless companies have baseline prices (e.g. Firestore).
Are there any disadvantages?
You may have a lot much less management over the infrastructure and no direct entry to the servers. There may be additionally a danger of vendor lock-in. The functions are strongly tied to a cloud supplier.
A concrete instance of serverless: API with out your personal server
Think about you have got an internet site with an API that gives customers with the present climate. Usually, a server runs across the clock — even at instances when nobody is utilizing the API.
With AWS Lambda, issues work in another way: A person enters ‘Mexico Metropolis’ in your web site and clicks on ‘Get climate’. This request triggers a Lambda perform within the background, which retrieves the climate information and sends it again. The perform is then stopped mechanically. This implies you don’t have a completely working server and no pointless prices — you solely pay when the code is executed.
3 — What Information Scientists ought to Learn about Containers and VMs — What’s the Distinction?
You’ve in all probability heard of containers. However what’s the distinction to digital machines — and what’s notably related as a knowledge scientist?
Each containers and digital machines are virtualization applied sciences.
Each make it doable to run functions in isolation.
Each provide benefits relying on the use case: Whereas VMs present sturdy safety, containers excel in velocity and effectivity.
The principle distinction lies within the structure:
- Digital machines virtualize your entire {hardware} — together with the working system. Every VM has its personal operational system (OS). This in flip requires extra reminiscence and assets.
- Containers, then again, share the host working system and solely virtualize the appliance layer. This makes them considerably lighter and quicker.
Put merely, digital machines simulate total computer systems, whereas containers solely encapsulate functions.
Why is that this essential for information scientists?
Since as a knowledge scientist you’ll come into contact with machine studying, information engineering or information pipelines, it’s also essential to know one thing about containers and digital machines. Certain, you don’t must have in-depth data of it like a DevOps Engineer or a Website Reliability Engineer (SRE).
Digital machines are utilized in information science, for instance, when an entire working system atmosphere is required — corresponding to a Home windows VM on a Linux host. Information science tasks usually want particular environments. With a VM, it’s doable to supply precisely the identical atmosphere — no matter which host system is obtainable.
A VM can also be wanted when coaching deep studying fashions with GPUs within the cloud. With cloud VMs corresponding to AWS EC2 or Azure Digital Machines, you have got the choice of coaching the fashions with GPUs. VMs additionally fully separate completely different workloads from one another to make sure efficiency and safety.
Containers are utilized in information science for information pipelines, for instance, the place instruments corresponding to Apache Airflow run particular person processing steps in Docker containers. Which means that every step may be executed in isolation and independently of one another — no matter whether or not it includes loading, remodeling or saving information. Even if you wish to deploy machine studying fashions through Flask / FastAPI, a container ensures that the whole lot your mannequin wants (e.g. Python libraries, framework variations) runs precisely because it ought to. This makes it tremendous straightforward to deploy the mannequin on a server or within the cloud.
3 — Create a Digital Machine with VirtualBox
Let’s make this somewhat extra concrete and create an Ubuntu VM. 🚀
I take advantage of the VirtualBox software program with my Home windows Lenovo laptop computer. The digital machine runs in isolation out of your important working system in order that no modifications are made to your precise system. In case you have Home windows Professional Version, you may also allow Hyper-V (pre-installed by default, however disabled). With an Intel Mac, you also needs to be capable to use VirtualBox. With an Apple Silicon, Parallels Desktop or UTM is outwardly the higher different (not examined myself).
1) Set up Digital Field
Step one is to obtain the set up file for VirtualBox from the official Digital Field web site and set up VirtualBox. VirtualBox is put in together with all crucial drivers.
You possibly can ignore the notice about lacking dependencies Python Core / win32api so long as you do not need to automate VirtualBox with Python scripts.
Then we begin the Oracle VirtualBox Supervisor:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_8z7PdhGXP19s285I28m5VQ.png)
2) Obtain the Ubuntu ISO file
Subsequent, we obtain the Ubuntu ISO file from the Ubuntu web site. An ISO Ubuntu file is a compressed picture file of the Ubuntu working system. Which means that it incorporates an entire copy of the set up information. I obtain the LTS model as a result of this model receives safety and upkeep updates for five years (Lengthy Time period Help). Observe the placement of the .iso file as we’ll use it later in VirtualBox.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_Q2Eh2Wk5EM21Yw5Rv3AulA-1024x524.png)
3) Create a digital machine in VirtualBox
Subsequent, we create a brand new digital machine within the VirtualBox Supervisor and provides it the title Ubuntu VM 2025. Right here we choose Linux as the sort and Ubuntu (64-bit) because the model. We additionally choose the beforehand downloaded ISO file from Ubuntu because the ISO picture. It will even be doable so as to add the ISO file later within the mass storage menu.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_nftl94tPrLS1W57iiAqZRA-1024x798.png)
Subsequent, we choose a person title vboxuser2025 and a password for entry to the Ubuntu system. The hostname is the title of the digital machine inside the community or system. It should not include any areas. The area title is non-compulsory and could be used if the community has a number of units.
We then assign the suitable assets to the digital machine. I select 8 GB (8192 MB) RAM, as my host system has 64 GB RAM. I like to recommend 4GB (4096) at the least. I assign 2 processors, as my host system has 8 cores and 16 logical processors. It will even be doable to assign 4 cores, however this fashion I’ve sufficient assets for my host system. You will discover out what number of cores your host system has by opening the Activity Supervisor in Home windows and searching on the variety of cores beneath the Efficiency tab beneath CPU.
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_N0Q7M_H0HvcYMJef7wDxmg-1024x528.png)
Subsequent, we click on on ‘Create a digital arduous disk now’ to create a digital arduous disk. A VM requires its personal digital arduous disk to put in the OS (e.g. Ubuntu, Home windows). All packages, information and configurations of the VM are saved on it — identical to on a bodily arduous disk. The default worth is 25 GB. If you wish to use a VM for machine studying or information science, extra space for storing (e.g. 50–100 GB) could be helpful to have room for big information units and fashions. I preserve the default setting.
We will then see that the digital machine has been created and can be utilized:
![](https://towardsdatascience.com/wp-content/uploads/2025/02/1_SX7zMSw1MCq0sCbaP2xN7A-1024x938.png)
4) Use Ubuntu VM
We will now use the newly created digital machine like a standard separate working system. The VM is totally remoted from the host system. This implies you’ll be able to experiment in it with out altering or jeopardizing your important system.
In case you are new to Linux, you’ll be able to check out fundamental instructions like ls, cd, mkdir or sudo to get to know the terminal. As a knowledge scientist, you’ll be able to arrange your personal improvement environments, set up Python with Pandas and Scikit-learn to develop information evaluation and machine studying fashions. Or you’ll be able to set up PostgreSQL and run SQL queries with out having to arrange a neighborhood database in your important system. You too can use Docker to create containerized functions.
Ultimate Ideas
Because the VM is remoted, we are able to set up packages, experiment and even destroy the system with out affecting the host system.
Let’s see if digital machines stay related within the coming years. As firms more and more use microservice architectures (as a substitute of monoliths), containers with Docker and Kubernetes will definitely turn into much more essential. However understanding how you can arrange a digital machine and what it’s used for is definitely helpful.
I simplify tech for curious minds. In case you get pleasure from my tech insights on Python, information science, information engineering, machine studying and AI, think about subscribing to my substack.