Please explain to me how webpush work in TCP/IP network layers (especially layer 4-5).
I understand that HTTP is stateless protocol:
Therefore it's understandable that for a user behind NAT to be able to view webpage of a remote host (because the user behind NAT is the one initiating the connection); but the webserver cannot initiate TCP connection with the client (browser process).
However there are some exceptions like 'websocket' where client (browser) initiate a connection, then leave it open (elevate to just TCP, not HTTP anymore). In this architecture, webserver may send / initiate sending message to client (for example "you have new chat message" notification).
What I don't understand is the new term 'webpush'.
How does it work? How do they accomplish this? Previously I think that:
Is this correct? Or am I missing something? Since both of my guess above won't work behind NAT'd network
Is Firebase web notification also this kind of webpush?
I have searched the internet for explanation on what make it work on client side, but there seems only explanation on 'how to send webpush', 'how to market your product with webpush', those articles only explain the server side (communication of app server with push service server) or articles about marketing.
Also, I'm interested in understanding what application layer protocol they're running on (as in what text/binary data the client/server send to each other), if it's not HTTP
Web Push works because there is a persistent connection between the browser (e.g. Chrome) and the browser push service (e.g. FCM).
When your application server needs to send a notification to a browser, it cannot reach the browser directly with a connection, instead it contacts the browser push service (e.g. FCM for Chrome) and then it's the browser push service that delivers the notification to the user browser.
This is possible because the browser constantly tries to keep an open connection with the server (e.g. FCM for Chrome). This means that there isn't any problem for NAT, since it's the clients that starts the connection. Also consider that any TCP connection is bi-directional: so any side of the connection can start sending data at any time. Don't confuse higher level protocols like HTTP with a normal TCP connection.
If you want more details I have written this article that explains in simple words how Web Push works. You can also read the standards: Push API and IETF Web Push in particular.
Note: Firebase (FCM) is two different things, even if that is not clear from the documentation. It is both the browser push service required to deliver notifications to Chrome (like Mozilla autopush for Firefox, Windows Push Notification Services for Edge and Apple Push Notification service for Safari), but it is also a proprietary service with additional features to send notifications to any browser (like Pushpad, Onesignal and many others).
so the client/user agent/browser (ex: chrome) keeps open TCP connection to the push service (ex: FCM); and the connection is kept alive forever? and browser does this without being executed explicitly (ex: user click on chrome icon)?
Yes, there is usually a persistent connection between the browser and its own push service. If that cannot be done on a specific OS (for example, in background) then an alternative method is used: for example the push message is forwarded from the browser push service to the normal push service of the OS (e.g. this is done by Firefox on Android). If you need the exact implementation details you need to read the standards that I linked above.