5 October 2020

Web Bundles: What are they and do they pose a threat to the Web?

7 minutes read
Written by Martin Knakal

In November 2019, Google introduced its Bundled HTTP Exchanges or, more clearly, Web Bundles. At first sight, the whole concept seems really neat. You can "bundle-up" all the website contents and proceed to use them at any time even offline!

Here is Google's suggested use-case:

One common scenario is the ability to build a self-contained web app that's easy to share and usable without an internet connection. For example, say you're on an airplane from Tokyo to San Francisco with your friend. You don't like the in-flight entertainment. Your friend is playing an interesting web game called PROXX, and tells you that she downloaded the game as a Web Bundle before boarding the plane. It works flawlessly offline.

We can think of several more uses, but the main idea is that Web Bundles CAN provide an alternative to native applications, provided that your browser supports them. So far, Web Bundles are still experimental and supported only by Chrome 80+ behind an experimental flag.

Why bundle anything at all?

A sample .js import could look like this.

<script src="/js/first.js"></script>
<script src="/js/second.js"></script>
<script>
// $global_variable/method you want to access from either of files
</script>

This poses two issues that have mostly been eradicated from software development in other languages by various means and practices.

The first problem is that you can only access anything in the files through a global variable. If you have taken any programming class ever, one of the first things you were told is that you should use global variables very cautiously or not at all. The reason is simple; they make the code an absolute pain to maintain.

The second problem is that you must import your scripts in a fixed order. If the first.js were dependent on the second.js, you'd have to switch them. Handling more complex dependencies becomes much harder very quickly.

This particular problem gave rise to projects like Browserify that let you use a require() function to handle dependencies and output a single bundle.js file. If we take a look at Webpack project, which lets you bundle various modules with their dependencies and emits static assets representing those modules, you can see that the idea behind Web Bundles is not that far off.

So what are Web Bundles exactly?

A Web Bundle is a CBOR file with a .wbn extension. CBOR is a binary format loosely based on JSON and sacrifices human readability for a significant performance boost. It's an archive format with a parsable index and easy-to-read individual files based on their offset in the bundle.

Per draft spec:

A bundle is logically a set of HTTP exchanges, with a URL identifying the primary resource of the bundle. A bundle is parsed from a stream of bytes, which is assumed to have the attributes and operations described in Section 2.1. Bundle parsers support two operations, Load a bundle's metadata (Section 2.2) and Load a response from a bundle (Section 2.3) each of which can return an error instead of their normal result.

In other words, your browser (the aforementioned "parser") reads a bytestream from a bundle and uses provided metadata to simulate HTTP exchanges and access resources inside it. As a bonus, Web Bundles apparently load almost instantly when served locally. You can currently only navigate to the local Web Bundle, but that's a temporary restriction.

Google has already provided a sample implementation on github if you want to try and create your own bundle. They introduced several ways to generate a bundle depending on how you want to define a set of exchanges: by a HAR file, by a URL list, and by a local directory.

There has been some controversy about the format in the past few weeks, mostly related to the binary nature of the emitted file and proposed signing of bundles.

The ability to sign a bundle seems like a good thing, and, if used as stated in spec, it would be great. As you can see, it can be done very easily as well. The tool is very straightforward:

sign-bundle \
-i unsigned.wbn \
-certificate cert.cbor \
-privateKey priv.key \
-validityUrl https://example.org/resource.validity.msg \
-o signed.wbn

This allows the bundle to be served by anyone and with the same trust as the origin server. Some, however, see it as a way to monopolize the internet, where all the content is delivered by Google.

Concerns

The primary source of concern goes back to Peter Snyder's blog post published on brave.com. He described several ways the Web Bundles could be potentially harmful and make privacy-protecting tools ineffective. It has since been discussed on hacker news, with even Google employees giving their take on the matter.

The main concern about using the Web Bundles is that they make URLs not unique. You can randomize, reuse, and hide unwanted URLs that would otherwise be blocked by ad-blocking or other privacy tools. Of course, all of the above can be done even without Web Bundles, but they allow you to do it in a slightly more convenient way. The common denominator of all these problems is that Web Bundles create a local independent namespace. Let's take the following example:

<script src="http://adnetwork.com/js/ads.js"></script>
<span class="ads"
...
</span>

You can easily block */ads.js in your ad-blocker to stop the javascript from even downloading. This practice may prove difficult in the web bundle since you download the whole bundle, and the resource can have an arbitrary name. You could also reuse the URL and make it point to different resources in each bundle. Or you change the URLs to ones you know won't be blocked.

You can do all this in a bundle while leaving "life real web" resources unmodified. Not only does it make building blacklists pretty much impossible, but it also provides means to poison them.

If you can build bundles on-the-fly per request, then you really can make almost any content unblockable. While Google's tool does not have a bundle edit implemented, it could be done very cheaply on the edge server.

One more quote from Snyder, that seems to be cited frequently:

This threatens to change the Web from a hyperlinked collection of resources (that can be audited, selectively fetched, or even replaced), to opaque all-or-nothing “blobs” (like PDFs or SWFs).

This statement is, in my opinion, not entirely accurate. Web Bundles, as stated above, are archives, and all resources in them are parsable. You can even block them quite easily if the publisher doesn't choose to obfuscate URLs. There are already several tools for parsing CBOR files.

Do Web Bundles threaten the Web?

To be fair to Google's engineers, as of now, probably not. Web Bundles are still in the early stages of development, and it's entirely possible that many of the problems stated will get resolved. There are also limits to what can be bundled. Anything requiring calls to servers will have very little incentive to use bundles.

Another issue for wide adoption is the size of bundles. Since they contain ALL the necessary resources, they can get quite big. You must be able to choose which bundle you want to download. If the caching of bundles was automated in any way, user devices would quickly run out of space. On the other hand, downloading a bundle every time you want to access the site would defeat their purpose entirely.

Used responsibly, Web Bundles can increase the accessibility of content in areas with a poor connection and save data costs.

We are excited to see exactly how these files are supposed to be cached. Google employee stated on HN that "What's important for users is that files in a WebBundle are individually cached" At this point, it's hard to find a way it could be possible. It remains to be seen when Google provides us with a more detailed implementation.