Garf's blog: 2012

Thursday, October 11, 2012

Phishing and malware protection arrives on mobile devices

I'm happy to announce malware and phishing protection is now enabled in Firefox for Android, and available in the Aurora and Nightly builds. It will be released in Firefox 18. As far as I could tell, Firefox is the first browser to offer this feature to mobile users.

The phishing and malware protection works by regularly updating a local list of known, bad sites, graciously provided by Google through their SafeBrowsing database. Whenever Firefox detects that you navigate to such a site, or that a page you visit is trying to pull data from it, Firefox will present you with a warning page and allow you to abort the operation.

I blogged previously about our intent to rework the database to a size acceptable to mobile devices, and the subsequent rewrite of the SafeBrowsing backend. The remainder of the work had to be postponed a little as we worked vigorously to finish the new Native Firefox for Android for phones and tablets.

Some of the remaining tasks were verifying that the new backend processes all SafeBrowsing updates correctly. This was done by writing an external Python program that can read in both the old and new format files, which turned out to be very useful in debugging the file format documentation as well.

Another issue that remained was the need for the updates to be very resilient on a mobile platform. For example on Android the application can be killed nearly at will by the Operating System, and care must be taken that the database not only doesn't get corrupted, but that as little data as possible is lost if this happens in the middle of an update. The old backend got this functionality mostly for free through the ACID features of SQLite, but for the new backend some extra work was necessary.

The final step was then updating the SafeBrowsing warning screens and support code for Firefox for Android, which we finished last week. For the future, our UX team is currently busy with a new, nicer visual design for the warning pages that will be shared between Firefox for Android and Firefox OS. But in the mean time, enjoy the added protection already. I sincerely wish you will never have to encounter either of these pages, anyway!

Phishing warning

New UX Design

Wednesday, June 20, 2012

The effectiveness of practice - a small study of StarCraft 2 balance

Edit 2012-06-21: Corrected "games" to "won games". Sorry about that!

Win rates have little to do with "balance"

I had some musings about StarCraft 2 balance a while ago and decided to write a program to check them out.

Basically, most discussions about balance end up mentioning win rates a lot. However, because the StarCraft 2 matchmaking tries to guarantee each player has a win rate around 50% in the middle-long term, win rates are actually quite a meaningless indicator of player strength, and hence balance. Regardless of balance, we're always expecting them to end up around 50%.

Imagine you take a player of grandmaster level, and now handicap him by disallowing the use of his races' macro mechanic. The hit to his strength is of a similar nature as would be when one of the races were weaker than the others. After playing for a while (and initially scoring way below 50%), he will likely drop down to somewhere like diamond league, for example, until his opponents are such that he gets a 50% win rate again. So we now know that this player is actually grandmaster, but his race handicap holds him back from realizing his potential. Yet his win rate is the same as always.

Win rates also suffer from temporal fluctuations if effective strategies for once race are developed and it takes a while to find a counter for the other races. Win rates can suddenly shift quite far away from 50%. The actual ability of the players did not change, nor did the game itself suddenly become imbalanced. This would only be the case if it eventually turns out that there is no counter to this strategy.

There is one particular case in which win rates can tell us something about balance issues: Random players. For Random players who play all races equally, we'd expect them to on average score the same with all races. If we take the data of all random players and a certain race appears to outscore the others, it's a good sign of a balance issue.

Effort needed for reaching a league

Unfortunately, gathering statistics of the race performance of Random players is something that seems hard to spider off of Battle.Net. So while Blizzard can and probably does use this information in their balancing discussions, it is not available to us.

Another way to look at race strength is to consider that if we could pick players of similar "innate" skill, give them a race and let them loose on the SC2 ladder, we could check if they end up in the same leagues.

So how do we find these players of similar "real" skill? We can make an assumption: a player improves the more StarCraft 2 games he plays. This isn't a very far fetched conclusion, as its known that most people improve at skilled tasks though practice. We can then rephrase our balance question: for a given amount of practice, will a player of a certain race end up being higher ranked than players of another race?

The data

I spidered data from about 26676 players from the EU StarCraft 2 servers, or about 10% of the active player population in that region. I rejected all players that have no games in Season 8 yet, and all Random players. They're not pertinent to our base question, and they require extra effort to distinguish from each other. I also threw away all players with less than 5 games, which hence have no league yet.

The table below shows the average amount of won games for players in a certain league, both overall, and broken down per race. After the number of games is the 95% standard error on the mean, which essentially gives some idea how accurate the numbers are. (They're mostly accurate to about 5-10%, except for the grandmasters due to extremely low sample size there). The next number is the amount of won games below which 95% of the players in that league are. This gives some idea of the amount of games where a player is extremely likely to get to the next league soon, as well as the actual variance of the amount of games players in a league have. Note that the variance is actually quite big - there's quite some players who have played twice the average amount of games yet are still stuck in a league.

The average number of won games the players in a league have is not the same as the amount of games you expect to need on average to be promoted into that league. That point is (very approximately) somewhere halfway between the average of the lower and the target league. The total number of games is about twice the number of won games - because the matchmaking should keep you around 50%.

So with this introduction, here is the actual data:

    EU Server, 26676 players sampled, 16243 valid samples
    
    6010 players in Protoss (37%)
    5248 players in Terran  (32%)
    4985 players in Zerg    (31%)
    
    Avg wins =   290 (+-    9), 95% upper ( 1021) for  5754 players in bronze     (35%)
    Avg wins =   533 (+-   17), 95% upper ( 1547) for  3468 players in silver     (21%)
    Avg wins =   708 (+-   22), 95% upper ( 1902) for  2715 players in gold       (17%)
    Avg wins =   939 (+-   33), 95% upper ( 2485) for  2133 players in platinum   (13%)
    Avg wins =  1239 (+-   48), 95% upper ( 3047) for  1386 players in diamond     (9%)
    Avg wins =  1680 (+-   86), 95% upper ( 4101) for   783 players in master      (5%)
    Avg wins =  2880 (+- 1544), 95% upper ( 5968) for     4 players in grandmaster (0%)
    
    Avg wins =   293 (+-   15), 95% upper ( 1022) for  2248 players in bronze_Protoss
    Avg wins =   305 (+-   16), 95% upper ( 1079) for  2163 players in bronze_Terran
    Avg wins =   262 (+-   17), 95% upper (  917) for  1343 players in bronze_Zerg
    
    Avg wins =   511 (+-   26), 95% upper ( 1467) for  1269 players in silver_Protoss
    Avg wins =   553 (+-   33), 95% upper ( 1681) for  1162 players in silver_Terran
    Avg wins =   537 (+-   29), 95% upper ( 1483) for  1037 players in silver_Zerg
    
    Avg wins =   687 (+-   34), 95% upper ( 1767) for   977 players in gold_Protoss
    Avg wins =   750 (+-   48), 95% upper ( 2115) for   777 players in gold_Terran
    Avg wins =   695 (+-   37), 95% upper ( 1848) for   961 players in gold_Zerg
    
    Avg wins =   949 (+-   55), 95% upper ( 2480) for   762 players in platinum_Protoss
    Avg wins =   886 (+-   59), 95% upper ( 2292) for   550 players in platinum_Terran
    Avg wins =   966 (+-   57), 95% upper ( 2608) for   821 players in platinum_Zerg
    
    Avg wins =  1217 (+-   77), 95% upper ( 2902) for   476 players in diamond_Protoss
    Avg wins =  1188 (+-   97), 95% upper ( 3065) for   371 players in diamond_Terran
    Avg wins =  1293 (+-   80), 95% upper ( 3156) for   539 players in diamond_Zerg
    
    Avg wins =  1719 (+-  135), 95% upper ( 3974) for   276 players in master_Protoss
    Avg wins =  1687 (+-  177), 95% upper ( 4349) for   225 players in master_Terran
    Avg wins =  1635 (+-  141), 95% upper ( 4017) for   282 players in master_Zerg
    
    Avg wins =  2867 (+- 3537), 95% upper ( 7870) for     2 players in grandmaster_Protoss
    Avg wins =  2893 (+- 1336), 95% upper ( 4782) for     2 players in grandmaster_Zerg

Conclusions

There is a clear correlation between having played more games, and being in a higher league. This validates our earlier assumption. More practice makes you a better player. (Or if you're pedantic about causation-correlation conclusions, at the very least better players practise more.) I know this sounds extremely obvious but it is nice to validate it. It also indicates smurfing etc isn't prevalent enough to mess with our results.

There are no significant differences between the races per league [1]. This means that you cannot get higher ranked faster by picking a specific race. You still need to put in the same amount of practice. This is the main thing we wanted to investigate and it's good news: StarCraft 2 is *really* well balanced, or at least it has been over the last 8 seasons, which this data aggregates.

If you want to get to masters, expect to put in about 3000 practice games, and maybe as much as 6000. If we assume an average game (including postmortem analysis etc) takes 20 real-life minutes, expect to put in about 1000 hours of practice.

If you played 2000 games and are still in bronze league, you're doing it wrong.

[1] This conclusion assumes that the majority of players play one race most of the time, i.e. that offracing only happens occasionally. Without this assumption we can't put the players in race groups to begin with. Note that if offracing were the norm, and the races not balanced, the offracing players would be able to detect this quickly, which in turn would made it less likely that they'd continue to offrace instead of sticking with the strongest race.

Thursday, May 24, 2012

History and bookmarks in Firefox for Android

The feature I have been working on now for quite some time is one I'd jokingly refer to as "a feature I hope most users will never need". It's the system that translates your browser history from XUL Fennec to Native Fennec. New users obviously don't need this, and I hope we get a lot of new users with the switch to the native UI, so there.

More seriously, for something that I initially described in a meeting as "doing a query in one DB and some inserts in another, how hard can it be?", this turned out to be a project that ended up sinking easily over a man-month of engineering time. Of course putting it like that in the meeting was just asking for problems, and I'm going to use this post to give some background into what happened, and explain some more things about our current History/bookmarks architecture and how databases work on Android.

The need for a new History & bookmarks database

First of all, you can ask yourself why we needed to change the format of browser history in the first place. In XUL Fennec, the history and bookmarks are managed by the Places system, the exact same as is powering desktop Firefox. It's a combination of SQLite with a C++ and JavaScript control layer.

The reason for rewriting the UI parts of Fennec in Java was to make it start up faster on Android, so that the user has something he can interact with instantly. Having bookmarks and history available early is quite important as that's what the user will most likely be interacting with, either through the AwesomeBar or the home screen that Native Fennec has. So we cannot wait for Gecko to load to access its Places system - history and bookmarks must be available immediately.

Secondly, Native Fennec now also includes Sync as a true Android service that works with the Android Accounts panel, background synchronization, etc. Having the history and bookmarks available natively in Java removes the need for Sync to load Gecko, which isn't something that you'd want a background service to do.

Lastly, storage on Android devices is often fairly limited. Places databases can get quite large for large if you've been using Firefox a long time and browse a lot, for example on my current desktop system the places.sqlite file is over 70Mb. This isn't desirable on Android, so we probably want to tune history expiration a bit differently.

Evolution of the new database

Our initial implementation of an Android history & bookmarks system was to simply use the built-in default database. This has some nice advantages such as automatic integration with the default Browser. If the user is transitioning from that to Firefox he will immediately have his history & bookmarks available to him. Feedback from the community came in that they weren't entirely comfortable with this. There was a feeling that although users trust Firefox with their private data, they don't necessarily trust Android or other applications that may access this data in the same way.

We started adding our own implementation, a local database (LocalDB) using Android's SQLite library, our own ContentProvider, and a wrapper layer (BrowserDB) that allowed most of the code to seamlessly switch between using the local database and Android's database.

As we started adding more features to our bookmarks system, and Sync implementation was underway, it became clear at some point that we couldn't reasonably support them well with the limited bookmark support of Android itself. For example, by default the Android implementation can only remember the last 250 URL's you visited, which makes the AwesomeBar's search sort-of useless. So eventually, support for Android's history & bookmarks database was phased out.

Remapping Places & Database constraints

Importing the Places database into our LocalDB format is mostly a straightforward remapping, although there are some minor differences. The Places database is very normalized, and for example each bookmarks and history entry refers to an entry in another table which contains the actual URL. LocalDB is a bit simpler in this regard as the URLs are contained within the entries themselves, saving some database work - which can help on slow devices. We made a small design mistake here in the Favicons table. They're big enough that we do not want them duplicated in the database, but Favicons can be the same for multiple URLs. We caught this issue a bit too late and as a result it will not be fixed in the first release - see bug 717428. You can occasionally see constraint violation warnings in the Android logs if this problem occurs.

Another small complication in transferring bookmarks is that they form a folder (tree) structure. We added some (SQL key) constraints in the new LocalDB to ensure no folders get orphaned even if we have bugs in our first versions, but this means that we must reconstruct the tree structure in the correct order when remapping the bookmarks data. A simple breadth-first algorithm makes multiple passes over the bookmarks data until we reach the leaves. Any bookmarks left over were orphaned and are not imported.

Places "tags" are stored in a fairly complicated manner in Places (using a multitude of bookmark folders) and due to this they are not supported in Native Fennec, and likely won't be for a while, so there is no need to migrate them.

Some fragmentation pains

During the development, at some point bug reports started coming in from some users that were migrating to Native Fennec, didn't get any bookmarks & history imported, and got an error "database is corrupted". After investigation, it turns out this was problem of combined SQLite and Android fragmentation. Firefox XUL (and Desktop) ships with the most recent SQLite revision, which means that it is currently using Write-Ahead-Logging for database operations.

Various Android devices come with various versions of SQLite, and only since (approximately) Honeycomb are the SQLite's new enough to understand the new database format that WAL requires. So older Android devices couldn't correctly read Firefox's SQLite databases with their built-in SQLite support.

We ended up writing a bridge driver that uses JNI to talk to Firefox's SQLite library, and provides an interface compatible with the Android SQLite one. This turned out to be useful for a number of other things too, such as accessing the original Passwords and Form History database from Gecko. In general, this kind of experiences made the many work spent on this feature worthwhile - quite a bit of the work was usable in other parts of the project as well.

Another small fragmentation issue we ran into is that foreign key constraints are not supported on some older Android devices. Because we basically only use these for error checking, it was not a large problem.

Performance aspects

The migration initially ran in the background on the first run, but it turns out this leads to horrible performance as the user tries to interact for the first time while there is heavy disk I/O and processing happening at the same time. Because that kind of nullifies the message we were trying to send that Native Fennec is a lot faster, we opted to show a modal screen instead that informs the user of what is happening. Of course, popping up a loading screen doesn't exactly convey a message of speediness either, but we should be fine as long as the user only sees it once and it doesn't take too long.

The latter was a bit of a problem though. Initial implementations ran at a speed of a few bookmarks or history entries per second. It's not unusual for Firefox users to have hundreds of bookmarks and hundreds of thousands of history entries. Android provides an API to allow ContentProviders to support database transactions, and implementing support for those on both sides sped up our migration speed to about 50-100 records per second. This was another example of the work on this little-seen feature providing benefits elsewhere, as Sync is now also using the faster API.

There are some interesting interactions with the batch API in Android if part of the transaction fails. Basically, it is implementation-defined what to do if an operation fails for some operations (applyBatch), and the API seems to be designed to really allow both options (continue or abort) to work. For our purposes, doing as much as possible was the better option, but this turns out to have some problems because the API to tell SQLite what to do in such a case isn't available in Android <= 2.2, which we wanted to support. This lead to some "interesting" code that may make your eyes bleed.

Sync integration

Because that's still not fast enough to migrate a users entire history on the first run, we also made an API that allows Sync to interact with the migration feature. The first run will migrate about 2000 entries and the entire bookmarks table. For users without Sync, this is all that they'll get, but it is enough to make the AwesomeBar quite usable and covers several weeks of browsing even for moderately heavy users.

If the users have Sync set up, the idea was that Sync will continue the migration from the database before starting to use the network. Feedback from the beta made us reconsider the previous approach: the problem of delaying Sync until the Places database has been entirely migrated is that the user will not get his more recent History updates until his updates from potentially weeks or months ago have been migrated. This leads to a suboptimal experience, and it's something we're going to try to address for the release.

The XUL to Native migration includes a feature that tries to migrate your Sync credentials automatically if you have them set up in Native Fennec. Unfortunately this had a bug in the very first beta release, so if you were one of the earliest adopters this will not have worked for you. The issue was fixed in the second beta release.

Future work

Despite the first beta being out, work on this crucial feature is far from finished. Aside from the points already mentioned above, there are a few more notable areas that we plan to get addressed before the release ships:

After support for the Android bookmarks & history database was dropped, users who were previously using the built-in browser have no way to import their bookmarks and history into Fennec right now. We're currently working on adding this feature as a first priority. Eventually we will add export functionality as well (though we'd rather make Fennec so awesome that nobody ever wants to use the built-in browser ever again!).
There is currently no expiration in the history database. So although the format is a bit more compact than Places, it could theoretically grow unbounded if the user visits a lot of sites on his phone and keeps on using Firefox. The algorithms for this that Places uses are fairly complicated, so we need to study which parts are meaningful for a mobile device, and if we want to tune it differently for typical phone or tablet use scenarios.

Saturday, April 14, 2012

Java wat

I'm sure quite a few you have seen the "wat" talk exposing some funny behaviors of Ruby and JavaScript. Working in the industrial/enterprise/boring Java language most of the time for Native Fennec's development, we've been missing out on such interesting moments. Nevertheless, I present you this gem, from bug 745384:

```
(null instanceof <anything>)  = false
```
This is guaranteed by the language spec:
"At run time, the result of the instanceof operator is true if the value of the RelationalExpression is not null and the reference could be cast (§15.16) to the ReferenceType without raising a ClassCastException. Otherwise the result is false."

```
IsInstaceOf(NULL, <anything>) = JNI_TRUE
```
This too is guaranteed by the language spec:
"Returns JNI_TRUE if obj can be cast to clazz; otherwise, returns JNI_FALSE. A NULL object can be cast to any class."

I don't know what the people at Sun were thinking when they designed it this way, but right now I'm all:

For those wondering, these are the relevant Java bugs:

Note the "Closed, Not a Defect, Fixed in spec." It's not a bug, it's a feature, :-)

Monday, February 6, 2012

New SafeBrowsing backend

Today, the SafeBrowsing rewrite me and Dave Camp have been working on for several months finally landed in the Mozilla Nightlies, and it should be part of the Firefox 13 release, narrowly having missed Firefox 12. It reduces the disk footprint of our anti-phishing and malware protection from about 40-50Mb to 5-6Mb, changes all related I/O from pure random access to a single serial read, and refactors a single 4000+ line C++ file into a bunch of modules. An earlier part of this work landed in Firefox 9 and reduced the memory footprint from potentially up to 40-100M to 1.2M, as well as removing the need to do some random I/O on every page load.
Aside from the performance gains, the reduced footprint is an essential step to enable us to extend our SafeBrowsing protection to Mobile devices, which is why we undertook this in the first place.

It was an interesting assignment, and being my first real project for Mozilla, a bit more involved than we thought at first. I blogged in July last year about our plans for this feature. Some of the optimizations we had in mind didn't work out, while others did end up being implemented.

Eliminating the host keys

One of the things touched upon in the previous blog post was that we used to store 2 32-bit hash prefixes for every blocked entry: one encoding only the host and the other encoding the full URL. Strictly speaking, we only need the full URL part. The old SQLite based code used the host key to SELECT and load all keys for a certain domain at once, but our new in-memory prefix-trie based approach has no such needs. However, as Justin Lebar alread touched upon in the previous blog, this does significantly increase the likelihood that we get a false positive. We now expect to have a false positive for every 7000 URLs visited. This will not cause us to block out any legitimate sites, as any positive hit in the local database is queried against the full, 256-bit hash at a remote server (hosted by Google, who provides the SafeBrowsing data).

This does mean we will increase the traffic to this remote server by a large factor. Scary as it may sound, some back-of-the-envelope estimates shows its not really that bad: say there are about 420M Firefox users, browsing for 8h/day. They load on average 1 URL per second. This means about 140M URL loads per second, causing about 20000 hash requests per second to Google. Google confirmed they can handle this with ease.

Now, there is still a problem when doing this: any collision will appear for all users on exactly the same URLs. This means that if you're unlucky enough to be a a site owner that has an URL that happens to collide, every visitor to your site will have a slightly slower browsing experience. Even worse, should you get linked in a popular forum, or be in the news, there will be a storm of false positives to the server all at once. We thought this to be problematic enough that we implemented a workaround: every user will generate a unique randomization key and re-randomize all his hashes with it. Collisions will happen on a different URL for every user, and consequently also be much better spread through time.

Eliminating chunk numbers

After some discussion, it turned out eliminating the chunk numbers isn't as easy as hoped. First of all, the observation in the previous blog posts that chunk expires only seem to happen when the chunks are in fact old, doesn't hold after observing the protocol for a longer time. It also happens very regularly that a chunk is added, deleted, and added back again, particularly in the malware list. In those cases, it is important to know which add chunk a delete is referring to, so it won't delete the later add. It would still be possible to deal with that if the server recalculates the prefixes to send for every client, but this is less efficient on the server side compared to the bigger, more static downloads that the server can point to now, and which are easily mirrored on Google's network.

Sub prefixes compress harder

In line with the previous paragraph, it happens that we receive sub prefixes for add prefixes we never received. These must be kept around until we receive an expire for them, as we can't know if the add they belong to is outdated or just not downloaded yet. Note also that we usually receive updates backwards, i.e. the most recent data will be sent to the clients first, as it's the one believed to be most relevant. Because sub prefixes contain both an add and a sub chunk, they are also bigger than add prefixes. This causes the eventual size of the database to be a bit more than the minimum guessed in the previous blog post, which more or less ignored sub prefixes entirely. If you peek in your profile, you can see that the goog-malware-shavar.sbstore will tend to be the biggest file: this is exactly due the many sub prefixes in the malware list.

Detection performance

It should be noted that these improvements are purely focused on the footprint of the feature. They will improve the resource usage of the browser, but they do not change the detection performance in any way.

NSS Labs Report

In what is somewhat of a funny coincidence, the same day I am writing this blog NSS Labs published a report "Did Google pull a fast one on Firefox and Safari users?". The main points of this report shouldn't be much news as I pointed out over half a year ago the discrepancy between Chrome, and Firefox and Safari in my previous blog post, as well the reason ("Performance differences between Firefox, Chrome and Safari").

I have two remarks to the report: one, as I've already pointed out in the past, false positive control is an important part of effective malware detection. Internet Explorer flags many malware sites, but it also flags legitimate sites, undermining the true effectiveness.

Secondly, the problem isn't so much that the "new" SafeBrowsing protocol is proprietary or non-documented; it's implemented in Chrome and Chromium is open source, so at the very worst we can go study that code to see how to implement it. The problem is that permission is required to use Google's SafeBrowsing servers, and ~~Firefox does NOT have permission to use the download protection list~~. Edit: Please see the statement from Ian Fette below.