Browsers Internals
How good does it feel when we type in the URL of our favorite website and within a few seconds a beautiful interface pops up in our face. All those different sections containing the information we want to see and the perfect structure. Its so good that we don’t even think about it. But one thing is true - when something is this organized - it has to have been built with an ultra high level of planning. Like how does all that even appear on a box of metal?? And what makes all of these websites - each of which is different in nature - to appear on our screens so flawlessly?
And that’s where - a monster of a software - the Browser comes in. The entire process of the browser fetching the data from the internet to showing it on our screens is a marvel of software engineering that must absolutely not be taken lightly - especially for a web developer. The deeper we understand what the thing we are working with actually is - the better decisions we will be able to make in our development journey. This is because development is not about coding - it never was. It was about being able to think of the architecture that gives the best possible results - and for that to happen - of course we need to understand its physics first.
So for a high level structure this is what the browser looks like:

We have the User Interface, The Browser Engine , The Rendering Engine , Networking and blah blah blah as you can see in the diagram. Ma…..an We aren’t here to get bored. Diving deep into every topic separately - we aren’t in for History 101. We are engineers. We will map not the components- but the processes. Components for us don’t exist for us to study them - they exist to serve us - to solve the problems we face on the path of solving a problem. The Browser engine is a solution to a problem. And well the browser in itself is a solution to another problem - of wanting a uniform and safe networking environment.
Brace for Impact as we delve into how browsers show us a new page - from the start of us opening a browser, navigating and opening a website.
Opening the Browser
The first and foremost step is to of course open the browser by clicking on it - and we are bombarded with buttons. Tabs, extensions, back buttons, menu button, bookmarks and so on and so forth. Tabs are there to provide an independent networking environment within the browser to help us multitask. Reload is to refresh the content on screen to maybe get updates or troubleshoot minor issues with the browser. Back buttons and front buttons are used to access the timeline of our browsing in that tab.
And the most important - the URL bar. The one where the actual journey of the browser begins. Now for all the UI we see - there is a backend too. The browser Engine handles the responsibility for the backend. I mean when we click on a button in the browser it should do something right. What it will do is decided by the browser engine. To show previous page or to reload, to fetch again or to work with an extension etc. Or even instructing the rendering engine and networking parts to do their Job is also his part when they receive the URL. The UI backend is another section that has its role in executing what the browser engine tells it to do - to display data on the screen.
Typing Enter after typing in a URL
Once we enter our URL it goes to the browser engine. The browser engine takes that URL and shouts at the networking section calmly - GET ME ITS IP ADDRESS NOW - I mean it has a lot of things to do doesn’t it . The poor Networking section - resolves the URL’s IP by either looking into browser’s own cache or by sending the query to a DNS resolver. When the IP is returned - hold on the work is not yet done.
The request to establish a connection with the server is sent via a Three Way Handshake - over the TCP protocol. Along with that - since most versions use HTTPS we also need another step to exchange keys and certificates to exchange data. This is to ensure an encrypted pipeline between the server and the user for the data flow. Now that this is done. The next part begins.
The Browser Receives the sacred DOCUMENT
Now once everything is done and said - the browser receives the first byte of the HTML document. This moment is so special that the time between the user pressing enter or a URL to when this byte is received has a name - Time to First Byte (TTFB). Once the browser receives the data - it begins the construction of the DOM - document object model.
From here we will enter into another world - the parsing of data to trees construction - to understanding their meaning and pasting pixels on the screen. And it has a heavy name - which I feel very elated to hear.
It is:-
The Critical Rendering Path
Such a heavy name - feels like it will decide your fate doesn’t it? But just like all other computer science names - this too must be something fancy right? Well…. not this time. Because this actually is the most important aspect of the browser. It is such a powerful, complex, intricate and regulated process that makes browser being worthy of being called as a software whose complexity matches or maybe even surpasses that of an Operating System.
And even from an engineering and developer point of view this is very important. Why? Because the more we can optimize this to be faster - the more companies value us. Because believe it or not - Walmart has noticed that decreasing the time it takes for the website to appear on the screen by 1 second - increased their customer conversion rate by 2% - and that’s a huge number in the game of businesses.
This process starts with the loading of HTML document.
The DOM creation
Having seen the HTML syntax you must have seen tags - and tags inside tags and all that - but what actually happens? Inside those tags is the data that is actually needed by the browser. In comes the parser.The parser takes in the data stream of the HTML document. In breaks down the tags - and extracts its meaning. Joins in its attributes and its content and create an element - an entity called a node.
Any tags nested in this element will be treated as children of this node and will have followed the same process. The topmost being the html node indicated by the <html></html> tag. This tree is called as the DOM - the document object model. Because for the given html element it extracts each tag as an object and creates a model out of it.
This is how a DOM looks like:

However this is not all. Because what good would a skeleton look if not for beauty and functionality.
The CSSOM creation
When the browser is parsing the HTML document - it might - and mostly does come across a linked stylesheet a css file . When the browser sees this - it stops the parsing and fetches the CSS file completely. Once the fetch is complete the browser continues parsing. In fact while the html is being parsed the DOM is constructed during the parsing itself to save time and the DOM is repaired later if any element goes haywire.
Now when the CSS file is received - the browser in parallel starts with the execution of the CSS file - to create the CSS object Model - The CSSOM. The CSSOM looks very much like the DOM except that its contents are the css style specifications - which get refined based on when that element’s style was referred the last and how specifically was that targeted.
This is how the CSSOM looks like:

Both of these object model trees are parsed by the Rendering Engine. For example Safari uses Webkit. Chrome uses a fork of Webkit called Blink. And Firefox uses an open source engine - gecko. These are what are responsible for converting the CSS and HTML files into CSSOM and DOM respectively. After which they are dumped into the content sink for sometime before being picked by the rendering engine again for one final process to get the final model it can work with.
But I think we are missing something. What about the scripts.
The Javascript Execution
While the document is being parsed if the parser encounters the script tag - it will halt all execution and hail the command of the Javascript. The Javascript is processed by the Javascript Engine. Different browsers use different engines. Chrome uses V8 engine while Apple’s Safari Browser uses JavaScriptCore Engine. Firefox uses SpiderMonkey.
The DOM creation and HTML parsing is stopped because Javascript has the power to change the DOM. That is why heavy scripts can cause lag in the website. And it is also the same reason as to why it is preferred to have the script tag at the end of the body.
The Render Tree Creation
Once the CSSOM and DOM are ready - the render tree construction starts.
The Rendering Engine combines the DOM and CSSOM - overlaps them and decides what elements will stay - what content and what properties they have and also retains their structure.
Some elements like the ones with display:none in CSS are omitted - tho visibility:hidden in processed because it still occupies space on the screen.

Once this tree is constructed then its time to use the screen as the canvas and paint it with beautiful pixels.
Layout and Reflow
In this section - the browser works on deciding the size of the window - to showcase the elements. It recursively traverses each node in the render tree and finalizes its size based on the css attributes. Once that is done the browser knows what element will occupy what size. This calculation is extremely heavy as it involves a lot of mathematical operations to be performed. This is called the Layout Stage.
Sometimes when Javascript of some action causes change in attributes of an element - like loading of an image - or a button popping up due to click - the sizes need to be calculated again. This process is called as reflow .
Painting And Display
Now that the screen sizes and section of each element is clear - the browser is now responsible for filling in the void with colors- the pixels. The browser recursively calls the paint() method for each node recursively in the render tree and used the UI backend to execute the commands which directly talks to the OS.
After this entire process is done - then we see all those beautiful websites on our favorite browser. The browser is a feat of software engineering that is to be respected for the sheer amount of design and creativity it has taken to develop - and needless to say it is the backbone of the current internet. Learning the browser and understanding how it works not only separates us from the crowd but also makes us join into the thought process of those people who were able to envision it.