WebKitGTK+ and WebM

So you probably heard about WebM, right? It’s the awesome new media format being pushed by Google and a large number of partners, including Collabora, following the release of the VP8 video codec free of royalties and patents, along with a Free Software implementation.

It turns out that if you are a user or developer of applications that use the GStreamer framework, you can start taking advantage of all that freedom right away! Collabora Multimedia has developed, along with Entropy Wave GStreamer support for the new format, and the code has already landed in the public repositories, and is already being packaged for some distributions.

I just couldn’t wait the few days it will take for the support to be properly landed in Debian unstable, so I went ahead and downloaded all of the current packages from the pkg-gstreamer svn repository, built everything after having the libvpx-dev package installed, and went straight to a rather unknown, small video site called Youtube with my GStreamer-powered WebKitGTK+-based browser, Epiphany!:

Youtube showing a webm video in Epiphany

If you’re running Debian unstable, or any of the other distributions which will be lucky to get the new codecs, and support packages soon, you should be able to get this working out of the box real soon now. Check the tips on WebM’s web site on how to find WebM videos on youtube.

WebKit2 and WebKitGTK+

So you’ve seen people talking about WebKit2, perhaps have seen someone claiming it “drops support for Linux“, and you’ve been wondering what the hell that means for WebKitGTK+. Well, welcome to the preemptive Q&A section with WebKitGTK+ maintainers =D. Let’s first explore some history so we can better understand what exactly is going on.

What exactly is WebKit2?

Currently, when we say “WebKit” we really mean one of the ports that are built on top of WebCore using the WebKit layer. WebCore is the part that does all of the hard Web-related work, WebKit an API layer that exposes WebCore functionality in a coherent way, so that the platform-specific ports can expose a public API layer for their applications to use, which is usually also called “WebKit”. This WebKit layer was designed by Apple to build the Mac, and Windows ports it maintains, and was later released as Free Software so that other ports, such as the GTK+, Qt, EFL ports could be built on top of it, instead of having to do all the heavy lifting from WebCore directly.

Current WebKit model

WebKit2 is nothing more than the second version of that interface, with a whole lot of changes on what you can expect from it, and on how it interacts with WebCore, and the platform-specific API and UI. First of all, the first WebKit was not API stable, and that interface was usually not made public by the various ports – they only exposed their platform-specific APIs. WebKit2 is being designed to provide a stable, cross-platform, C-based, non-blocking public API. This is huge. It will allow cross-platform code to be written without having to consider language, and port differences for basic functionality.

The second big change is the API will be made fully non-blocking. Currently most things you do are asynchronous already, but some of them may be completed in a synchronous ways (like, loading a string into WebKit instead of an URI). This is important for responsiveness, and is also a very important need for what comes next: process splitting.

WebKit2 will bring into WebKit proper the concept of splitting the UI process from the Web process, similar to what Chromium has. It also much more awesome than what Chrome has for a large number of reasons, including, but not limited to:

  1. It’s being contributed directly to the WebKit project, in a cross-platform way that lets ports such as WebKitGTK+ take advantage of it, instead of being shipped directly into Safari, like Google does with Chrome;
  2. The process separation goes bellow the API layer, meaning that all complexity involved in managing the process separation is handled by the library, and hopefully none of it leaks to the application using it; that means that applications like Devhelp and Yelp will be able to take advantage of this without having to make their lives more complicated;
There’s a much better diagram in WebKit2’s wiki page, but here goes a simplified version that demonstrates what I’m talking about:
WebKit2 model

What WebKit2 is not?

WebKit2 is NOT a rewrite of the whole WebKit stack. Webcore will continue mostly unchanged, and all ports currently building on top of it will keep working. It is also not a fork – the code lives in the same tree as the current version of WebKit, which will allow us to progressively move towards using this new, improved layer. WebKit2 is not Apple-only, and it is not dropping Linux support. Initial builds of the code that is being landed will likely show up building on Linux in the near future (specially because us porters are already eager to play with it).

What happens to WebKitGTK+?

In the near future, nothing special. We will continue working towards making it feature-complete, more stable, faster, and rocking on it as always. We will, though, start working out how we can best take advantage of WebKit2 in order to provide an even more awesome library for the G world. What this means is you can expect us to have a library that will provide a nice GTK+ widget, just like we have today, with a GObject-based API, like we have today, but that is built on top of this new WebKit2 infrastructure, taking advantage of the process-splitting, and the bigger focus on not blocking the UI thread. This should give us a platform that is more stable, and faster and more responsive than what we already have today.

The API is bound to change, of course, but the WebKit2 version of WebKitGTK+ will be a separate, parallel-installable library, and we will keep supporting the WebKit1 version while we work on making the new one at least as good as the current one. This is long term we’re talking here. We’ll likely see WebKitGTK+ 1.4, and 1.6 come to life before we are satisfied enough with WebKitGTK+2.

We hope this clears some of the doubts up, and lightens your hearts!

The WebKitGTK+ maintainers.

Designing from the bottom up

Have you ever seen, while dealing in a support channel with a novice that just got in touch with the power of UNIX a conversation that goes like this?:

<novice> How can I process the output of a command, so that any number of spaces gets turned into a newline?
<seasoned> What are you trying to do?
<novice> I want to list the contents of a directory, but I want one per line.
<seasoned> ls -1

I have seen this numerous times, even as one of the actors. At times I was the novice, and many times in #debian-br I was the seasoned person trying to get the novice to focus on the problem they were trying to solve, not on the solution they thought was right.

While reading Máirín Duffy’s awesome paper about contributing to Free Software as a designer I couldn’t help but get that image brought to my memory again, and again. Specially when I read this part:

This means the language and even the approach FLOSS projects take to solving problems tend to be focused on implementation and technology rather than starting with a real-life user problem to solve and determining appropriate implementation afterwards.

That does sound like us, and it does sound like many of the solutions we come up with. While I was reading her paper, there was a reference I got very interested in checking. It’s a PDF with no links in it, so I only had the number of the reference. What I would have to do is I would have to scroll to the end of the paper, and find the reference, then somehow come back to the place I was looking at.

My most immediate thought was ‘you know, maybe evince should have tabs’. Why? Because I could open a new tab, go to the place the reference was at, and to ‘go back’ I just needed to close the new tab. Other options require much more effort – remembering the page I was at, or maybe the scroll offset more or less, and scan for the part of the text I was at. But those are not the only options! I could have the application set a marker on where I am, and have an easy command to go back to that marker, for instance, or evince could provide a way of ‘looking ahead’ without throwing away the current state at all. I’m pretty sure if I look around enough I will find tools that solve this problem in a fairly good way.

Now, I think that is exactly how we ended up with tabs in so many places they do not make sense in, and with so many ad-hoc solutions that solve our problems in half-assed ways. Even in browsers, we tend to use tabs as ad-hoc solutions to real problems we have no real solution to handle yet, such as “I want to check this other thing out real quick, but I do not want to lose any state of this page”, or “I want to check this out, but not right now, so let me open it, and then I’ll come back to it”, or maybe even “I want to look at this now, but since it is going to take a while to load, I might as well let it load in the background, and when I finish reading this I can go look at it”. These are the real problems we have, and I think we need better designs that solve them for real, instead of just patching them with the ad-hoc solution that tabs are.

The other extreme of the spectrum is, of course, not doing something, or even anything for lack of the perfect solution. Using ‘this is not a real solution’ as an excuse to not implement something that could serve as a temporary solution to a problem may cause more frustration than having to deal with the ad-hoc solution that is tested, and being applied to other applications for some time. After all, in many cases the ad-hoc solution can be later replaced with a proper one.

I guess this is another instance of the very difficult problem of balancing different realities: proper design is not always available to start something up, specially if the application is backed by individuals and not by a company or a bigger project that could bring in designers to work on it from the start. In this case having something up and running is usually a very important first step in a free software project – usually required to get enough interest to make it worth designing for.

WebKitGTK+ 1.1.90 is out!

We’re coming close to GNOME 2.30 release date, and we are getting ready to branch a stable release off of WebKit’s svn trunk in preparation for that. The idea of the stable branch is to try to maintain, and improve stability, with no additional features going in. Speaking of features, though, if you’ve been paying attention you will have noticed WebKitGTK+ has come a long way, now.

We came from not having basic features such as download support or openning links in new tabs, a more-or-less working HTML5 media implementation, and very few or missing in action developers to a thriving project, that gets more, and more attention, and contributors every day, with advanced features available, and rocking HTML5 media support that leaves little to be desired. It’s been just over one year since we started rolling mostly bi-weekly releases, each adding more awesome features.

There are still many issues, and we are not always equipped as a team to handle all the specifics of the engine ourselves, but I am really happy with the progress we’ve made, and really thankful for the support my employer Collabora has given all the way for this to happen, including the early work on plugins, and many other things before my time as a contributor. When I switched to using Epiphany with the WebKit backend as my default browser back in January 2009, that meant having to deal with a whole lot of misbehaviour, and work-around a lot of painful brokeness. These days I enjoy a snappy, functional browser that makes me happy.

If you haven’t done so yet, go download, and test the newest Epiphany, with the latest WebKitGTK+, and help us make the GNOME 2.30’s web browser rock even more!

WebKitGTK+ testing

Most times when I blog about WebKitGTK+ it’s to talk about new features, and their usage in Epiphany. This time I’d like to tell those who care more about our test infrastructure. Like I said in my previous post, testing is something we take very seriously in WebKit land. It would be hard to get such a complex project, with such diversity of platforms moving forward without automated testing.

Apple hosts a buildbot master that controls a whole lot of build slaves, for many platforms. Today we added the fourth WebKitGTK+ build slave to the family: 64bits release. This makes up for 4 build slaves in total, 2 release bots sponsored by Collabora, and 2 debug bots sponsored by Igalia. These bots build WebKitGTK+, and then run what we call the “layout tests”. I use quotes there because the name is a bit misleading. Despite being HTML/JavaScript tests, they cover a whole lot of functionality, and tests for regressions in many areas, including security, crashes, animations, media playing, DOM behaviour, and javascript API behaviour. WebKitGTK+ bots currently run 6397 tests, which represent about half the available tests.

Our bots are also, as of today, the first ones to run platform-specific API tests. Almost a year ago we started writing small tests based on glib’s GTestSuite, and they have been very valuable in helping us make sure our API expectations aren’t breaking (at least unknowingly), and to be able to test things that would be very hard to have Layout Tests for. So, yay! Thanks to everyone involved.

Back to layout tests, now: the other half is currently skipped because of one of three main reasons:

  • We suck, and the test fails for real, either because we are missing implementation of something it uses (say, JavaScript isolated worlds, which has been recently added), or because our implementation is wrong
  • The test is a render dump, and we did not generate results for it yet
  • We lack functionality to run the test in our DumpRenderTree implementation

The first one is the worst of all. It means we have broken functionality, or lack web compatibility. The second one is less bad, we can usually trust layout, and rendering to be good because most of the rendering code is shared (thought there are exceptions, of course). Render dumps are a special way of representing the render tree as text, and we need to generate our own results because of differences in things such as font sizes. The third one is also pretty bad – it means we cannot test some features; DumpRenderTree is an application that uses the port’s API to run the tests, and provide additional JavaScript API through JavaScriptCore.

If you feel like helping WebKitGTK+, choosing a bunch of these (specially non-render-dump) skipped tests to make them pass could likely be a good first step =).

WebKitGTK+, and the Page Cache

So, one of the things I get to do during work hours for Collabora is to contribute code, and do maintenance tasks for WebKitGTK+, and have been doing so since early last year, working on all kinds of things, from improving the network backend to handle the real-world web, to fixing scrolling problems, while reviewing patches from the many awesome developers who have been joining us (more on that later =D).

One of the big features I have worked on this past month or so, along with Xan Lopez is the Page Cache. The page cache is a feature of web browsers that makes going back, and forward between pages in the same view very fast. It’s better explained in this post, but to summarize, the idea is that instead of destroying all the work you have done since downloading the resources, and having to reparse/rebuild the structures the view uses to display the page from the cached resources, you hit pause on the page, and store the whole thing as is, and when coming back to it, you just hit play. You can see in the video two instances of Epiphany, one with the page cache enabled, one with it disabled. Easy to see which was has it enabled. Thanks to KiBi for the suggestion regarding a page that shows this easily =D.

We initially thought we had this feature enabled, since our initialization functions (that exists since before the current maintainers were involved) did setup the number of desired pages in the cache, but during the hackfest we held in December we found out we were fooled all this time. Enabling the page cache does make going back faster, but also made lots of things become unstable and crash.

Since then, we have been working on figuring out all the problems, and fixing them, using help from adventurous users of in-development software ;D. I believe we’re now at a point in which I can happily declare the GTK+ port has a working page cache in trunk! If you’re interested in the nasty details, bear with me!

Let me go back in time a bit, and show you what problems we had. First, some background: the GTK+ port deviates a lot from the other ports when it comes to scrolling. This is because, when designing this part of the port, Holger Freyther had a very nice idea in mind: that the WebView should be a first-class citizen GTK+ scrollable widget. Meaning it would use GTK+’s adjustments for scrolling, and be able to interact with any parent scrolling widget, be it a GtkScrolledWindow, or a MokoFingerScroll.

We cannot just throw away all the rest of the scrolling code in WebCore, though, that deals with all the details related to interacting with the DOM, and JavaScript code. This means our WebView contains adjustments that need to be set, and unset on our port’s version of WebKit’s own representation of the view, called the FrameView, to interact with it, and to get updates on the bounds of content, and such. For every load, in the non-page-cache case, a new FrameView is created, the previous one is destroyed – this means we need to set the adjustments on every load.

The problem starts when you have the page cache enabled, because the code path used to do what is called “commit” the load of a cached page (that is, start replacing the content that is currently being displayed by the one that should now be displayed) is completely different, and we were not setting the adjustments on this new view, so we started with that.

But all was not well. We were still having weird behaviour with scrollbars disappearing, and becoming the wrong size, and worse, crashes when “back” was hit. We then started investigating in more detail how it is that the page cache does its magic, to try and figure out the source of all evil.

It turns out that when you leave a page that can be cached, the existing FrameView is no longer destroyed – it is stored as is in a CachedFrame to be restored if you go back, and a new one is created for the new page. This was having the undesired effect of having the adjustment be set in more than one FrameView at once, causing all kinds of (predictable, after we knew for real what was going on) unwanted effects. Thus, we reworked the code to make sure the adjustments are only ever set in one FrameView at once, making sure they are unset when the FrameView is being frozen, and reset when it’s being restored from the page cache.

Last, but not least, it was discovered that going back to a page that contained resources with data: URIs (such as Google results pages which contain a small number of image hits) also caused a crash. This was because our network backend was not storing the data: URI in the ResourceResponse objects it fed into WebCore. The page cache relies on those responses to recreate the requests it uses to artificially replay the load when restoring the page from the page cache, so we fixed that as well.

What can be taken from all this? Building browsers is a lot of hard work! I can’t think how we could deal with this level of complexity without the awesome testing suite of WebKit. The good news is all of those issues I talked about in this post are now covered by the automate tests that run as part of the normal buildbot cycle in our bots, so we’re covered for the future, at least for these specific problems =D.

Cool hack – html5tube

Did I mention I hate flash? I do. It crashes a lot, and is overall a bad thing for the web, in my opinion. But I do enjoy watching videos on the web, and unfortunately, up to this day, flash is what most sites use to show videos. Months ago I read a couple of blog posts with nice hacks to make Firefox able to play youtube videos without using the flash player. Some recent discussions with colleagues at work got me itching to try my hand at something similar for Epiphany.

HTML5Tube working

So I went ahead, and wrote a new extension that does just that – in youtube video pages, it finds the flash player element, and replaces it with an HTML5 video tag pointing to the actual movie file. This causes the internal HTML5 media player built into WebKitGTK+, that is based on GStreamer, to play the movie. That means you only need to have the necessary GStreamer magic, and the extension enabled, to enjoy the movie.

There are some caveats – in-video text messages are gone (though I’m pretty sure we could get them added somehow), playlists, and other places which display videos other than the ‘normal’ video watching page are not handled, youtube needs to think you have the flash plugin installed, at least, so the only way to make it work right now is to actually have a flash plugin installed. I think we could probably get away with the last problem somehow, by looking at what the totem youtube plugin code does, for instance, and replicating it.

Content-Encoding in soup – all your gzip are belong to us

One thing everyone forgot to talk about the WebKitGTK+ hackfest was that master Dan Winship added basic Content-Encoding support to libsoup, and patched WebKitGTK+ to use it. If you are using a recent enough version of those you will finally be able to visit web sites that send gzipped content despite the browser saying it could not handle it, like the Internet Archive.

This was one of those cases in which the web shows all of its potential to behave weirdly. The HTTP/1.1 RFC says that if an Accept-Encoding header is not present, the server MAY assume the client accepts any encoding, so we were having many sites send us gzip content even though we did not support it. We then started sending a header saying “we support identity, and nothing else!”.

It turns out the web sucks, so many servers were not happy with a full header, and started giving us angry looks (slashdot, for instance, would not render correctly because it started sending encoded CSS files!). We then simplified the header we were sending, which made those servers happy again. Some sites, though, completely ignored our saying we didn’t support anything except identity, and sent us gzipped content anyway. Most of these were misbehaving caches (this was the case for Wikipedia), so would work after you asked for a forced reload, which would ignore the cache, but some servers, such as the Internet Archive’s didn’t really want to talk about encodings – they only wanted to send gzip-encoded content.

So, in the end, our only way out was implementing the damn encoding support, which finally happened during the hackfest. Take that, web!

WebKitGTK+ HackFest!

The WebKit hackfest is now over, and I think it was a very productive week. Thank you very much to all who attended, to Igalia for organizing the hackfest, and hosting us so well, to Collabora for having sponsored the event, and allowed me to spend the week working on it, and to the GNOME foundation for having payed all of my costs!

Xan blogged about day 0, and also a summary of all that was done, so I’ll focus on the stuff he forgot to mention ;D. The hackfest, for me, started on day -1 with me not allowing Xan to go sleep before he had reviewed a couple patches of mine to fix DOM context menu handling. It always bothered me that Epiphany failed to open right-click menus in some pages, or let pages handle the right click. Well, this is fixed now, and Zimbra users can now have their right click menus, and WoW players can remove talent points from their calculators =P.

It turns out that many of the attendees don’t like pages messing with their context menus, though, and they had some good points to back up their positions (like pages making it hard to save images, for instance), so I implemented a way to force openning the custom menu: Ctrl-rightclick.

We wanted to use a GtkInfoBar to display questions regarding the form saving – our initial implementation always saved all credentials, but that didn’t sound good enough. Xan and I thought it would be very complicated to make this work, because there were assumptions in the code regarding which widget contains which, but it turned out to be quite trivial – making EphyEmbed a descendant of GtkVBox instead of GtkScrolledWindow, fixing a small number of assumptions, and that was it.

The passwords are saved in the GNOME Keyring. It’s interesting to point out that GNOME Keyring seems to be unhappy with the number of passwords a browser stores – Xan’s daemon was hanging, crashing, and spawning a large number of threads. My daemon decided to take up some 300MB of RAM at one point. It’s somewhat funny to see how much a browser pushes the limits of our platform. We are hoping this will improve with the new keyring APIs, and the rewrite that is ongoing. It’s nice to see my browser form passwords in seahorse, though, and be able to manage them like any other.

One more thing worth of notice, although this post is already a bit too big: one of the main concerns people had during the Hackfest was on making build time smaller. Touching a single file in WebCore causes a debug build of 10 minutes on my laptop. Evan Martin and Benjamin Otte made a push at removing unnecessary includes from WebCore, and WebKitGTK+ files, which brough the build time down a bit. They end up inspiring Aroben, from Apple, to go even further into this, and remove many includes from files all over WebKit.

Evan was also able to bring linking time down by making it possible to link libwebkit without having to build all the intermediate libraries, which brought build time down to 1 minute, when touching a single file in WebCore. Behdad and I also started looking into breaking WebCore up into lots of shared libraries for Debug builds, since we don’t care too much about speed penalties in those. None of these experiments got committed yet, but I am hoping we will be having a better time hacking on WebKitGTK+ in the near future.

It was awesome meeting everyone, by the way! See you around =).

Regressions, ah, regressions

There are few things I really hate. One of them is regressions. Regressions are bad because they usually take away things we are used to rely on, and leave us with the idea that perhaps the technical improvements didn’t really improve our lifes as a user, despite putting less burden on the developers. Software is made for users, after all.

As part of my work on WebKitGTK+, I always keep an eye on regressions, both from previous WebKitGTK+ releases, and those imposed on embedding applications on their migration away from Gecko, and try to focus some of my efforts into lowering their numbers, whenever I can.

In recent times I have worked on removing a few very user-visible regressions in Epiphany, which I see as the most demanding WebKitGTK+ user in GNOME, such as save page not working, missing
favicon support, failing to
perform server-pushed downloads (such as GMail attachments), and not being able to view source. An example of a regression from a previous version of WebKit also exists: in 1.1.17 we started advertising more than we should as supported by the HTML5 media player, causing download to be almost completely broken.

All of these are working if you are using WebKit and Epiphany from trunk/master, so should be on the next development versions of WebKitGTK+ and Epiphany. Other people have also fixed many other regressions; a few examples: Xan has reimplemented the Epiphany customization of the context menu, Frederic Peters provided a work-around for mailto: links while we don’t have SoupURILoader yet, and Joanmarie Diggs keeps rocking on the accessibility front!

If you find regressions, keep them coming! If you have a patch, even better! =)

Next week WebKitGTK+ team gets together to work furiously on improving WebKitGTK+ in a hackfest sponsored by Collabora, and Igalia, and hosted/organized by Igalia. While there I should also get my hands on one of these. Can’t wait! =)