Category: web
Articles from 1 to 3
Migration to Webgen
I just moved the site to webgen. For the occasion, forked the project on github and improved a lot of things. As webgen doesn't seem to be alive, I think I'll be the one maintaining it in the future.
I took the occasion to move my website over to my own web server, hosted at home. If I ever find it unsuitable, I can very easily host my website elsewhere as it is just a bunch of static pages.
More to come ...
Blueprints for a Privacy Respecting Browser
What is this all about: web privacy. We are tracked everywhere, and i'd like to help if possible. So, let's design a web browser that for once respects your privacy.
Main features:
- Each website has its own cookie jar, its own cache and its own HTML5 local storage
- History related css attributes are disabled
- External plugins are only enabled on demand
- Support for Tor/I2P is enabled by default
- You have complete control over who receives what information
- Let's you control in the settings if you want to allow referrers or not.
- The contextual menu let's you open links anonymously (no referrer, anonymous session)
The browser is bundled with a particular UI that let you control everything during your browsing session. it is non intrusive and makes the best choice by default. I am thinking of a notification bar that shows at the bottom. I noticed that this place is not intrusive when I realized that the search bar in Firefox was most of the time open, even if most of the time I didn't use it.
First, let's define a session:
- A session can be used my more than one domain at the same time.
- A session is associated to a specific cache storage
- A session is associated to a specific HTML5 storage
- A session is associated to a specific cookie jar
- A session acn be closed. When it is closed, session cookies are deleted from the session
- A session can be reopened. All long lasting cookies, cache, HTML5 storage and such is then used again.
- A session can be anonymous. In such a case, the session is deleted completely when it is closed.
- A session is associated to none, one or more domains. These domains are the domains the end user can see in the address bar, not the sub items in the page.
Sessions are like Firefox profiles. If you iopen a new session, it's like you opened a new Firefox profile you just created. Because people will never create a different Firefox profile for each site.
If we want to protect privacy, when a link is opened, a new session should be created each time. To make it usable to browse web sites, it is made possible to share sessions in specific cases. Let's define the cases where it might be intelligent to share a profile:
You click a link or submit a form and expect to still be logged-in in the site you are viewing. You don't care if you follow a link to an external page.
User Interface: If the link matches one of the domains of the session, then keep the session. No UI. If the user wanted a new session, the "Open anonymously" entry in the context menu exists. A button on the toolbar might be available to enter a state where we always want to open links anonymously.
If the link points to another domain, then open the link in a new session unless "Open in the same session" was specified in the context menu. The UI contains:
We Protected your privacy by separating <domain of the new site> from the site you were visiting previously (<domain of the previous site>). Choices: [ (1) Create a new anonymous session | ▼ ] | (2) Continue session from <previous domain> | | (3) Use a previous session for <new domain> | | (4) Use session from bookmark "<name>" |The first choice is considered the default and the page is loaded with it. If the user chooses a new option, then the page is reloaded.
If the user chooses (2), the page is reloaded with the previous session and the user will be asked if "Do you want
and to have access to the same private information about you?". Answers are Yes, No and Always. If the answer is Always, then in the configuration, the two domains are considered one and the same. The choice (3) will use the most recent session for the new domain. It might be a session currently in use or a session in the history.
There are as many (4) options as there are bookmarks for the new domain. If different bookmarks share a single session, only one bookmark is shown. This choice will load the session from the bookmark.
If (3) and (4) are the same sessions, and there is only one bookmark (4), then the (4) option is left out.
You use a bookmark and expect to continue the session you had for this bookmark (webmails)
The session is simpely stored in the bookmark. When saving a bookmark, there is an option to store the session with it or not.
[X] Do not save any personal information with this bookmarkYou open a new URL and you might want reuse a session that was opened for this URL.
The User Interface allows you to restore the session:
We protected your privacy by not sending any personal information to <domain>. If you want <domain> to receive private information, please select: Choices: [ Do not send private information | ▼ ] | Use a previous session for <domain> | | Use session from bookmark "<name>" |
If you can see other use cases, please comment on that.
From these use cases, I can infer three kind of sessions:
- Live sessions, currently in use
- Saved sessions, associated to a bookmark
- Closed sessions in the past, accessible using history. Collected after a too long time.
Now, how to implement that? I was thinking of QtWebKit as I already worked with Qt and it's easy to work with.
- We have a main widget:
QWebView. We want to change the session when a new page is loaded. So we hook up with the signalloadStarted. - We prevent history related CSS rules by implementing
QWebHistoryInterface, more specifically, we store the history necessary to implementQWebHistoryInterfacein the session. - We change the cache by implementing
QAbstractNetworkCacheand setting it usingview->page()->networkAccessManager()->setCache(...) - We change the cookie jar by implementing
QNetworkCookieJarand setting it usingview->page()->networkAccessManager()->setCookieJar(...) - Change the local storage path using a directory dedicated for the session
and using
view->page()->settings()->setLocalStoragePath(QString)
After all that, we'll have to inspect the resulting browser to determine if there are still areas where we fail at protecting privacy.
HTML 2.0: Why I think HTML is broken
First postulate: HTML was designed as a stateless protocol
Context: web sites need to maintain a context (or state) to track the client. This is required by the log-in procedures the various websites have. It is also useful to track the user in a web store, to know which items the user wants to buy. In fact, it is requires almost everywhere.
The first solution to be thrown out for this problem are the cookies. People didn't like cookies but now, everyone accepts them. Nothing works without cookies. Why did people dislike cookies back then? They liked their provacy and cookies makes it possible to track the user. Through advertisement networks, the advertiser known exactly which website the user visited. And it is still the case now. What changed is that the users got tried to fight cookies and have every website break, and they got used to it.
People got used to being tracked just as people are used to be watched by video cameras in the street and people are used to get tracked by the government and big companies and banks.
Cookies are a great way to track prople, all because HTTP didn't include session management. The way Google track you is very simple. Google Analytics puts a cookie on your computer and each time you access the Google Server, they know it's the same person. Google is everywhere:
- Many web sites are using Google APIs, or the jQuery library at Google.
- Many web sites ask Google to track their users to know how many prople visit their page.
- Google makes advertisement.
- Youtube, Blogger, Picasa and others are owned by Google
With this alone, Google is found on almost every page. If you have an account at Google (YouTube, Picase, Gmail, Blogger, Android or other), they can even give a name or an e-mail address to all of these information.
Google motto is Don't be Evil, they are perhaps not evil but can they become evil? Yes.
Whatever, my dream HTTP 2.0 protocol would include of course push support like WebSockets, but more importantly: session management. How should this be done?
HTTP and Session Management
When the server needs a session, it initiates the session by giving a session token to the client. The client needs to protect this token from being stolen and should display that a session is in pogress for this website. It could appear on the URL bar for example. The client could close the session at any moment.
With the token, the server provides its validity scope. Domains, subdomains,
path. Only the resources in the session scope will receive the tocken back. If
for example http://example.com starts a session at example.com but have an
<iframe> that includes facebook. Facebook won't receive the session token. If
Facebook wants to start a session (because the user wants to log-in) it will
start a second session.
Session cannot escape the page. If you have two tabs open with facebook in each tab (either full page or embedded), the two facebook instances don't share the same session, unless the user explicitely allowed this. For instance, when Facebook starts a session, the browser could tell the user that Facebook already have an existing session and the user would be free to choose between the new session and the existing one.
How does this solve XSS
XSS is when a website you don't trust access the session of a website you trust, and steal it. At least I think so.
With this kind of session management, the session couldn't possibly be stolen. Suppose that the non-trusted site makes an XmlHttpRequest to gmail.com. If cross-domain wasn't forbidden, any web-site could read your mails.
With the new session management, if the untrusted site makes a request to gmail.com, gmail.com session wouldn't be available and the login page would be returned instead of the list of e-mails. If the non trusted website tries to log-in, you would be prompted to associate the Gmail session with the site you don't trust. If you aren't completely idion, you wouldn't allow the online pharmacy to connect to Gmail.
Extra
What is known about you? Let's take an average person that uses her credit card, have and Android phone with Gmail, uses Facebook:
- All your relationships are known by Google (Gmail, Google+) and Facebook
- All your interests are known by Google and Facebook (Ad Sense track which website you visit and Facebook have a huge profile on you)
- All your posessions are known to your bank
- Your photograph is known by Google and Facebook (people probably took a photo of you and placed it on their Android phone synchronized with Google)
- Your location is known (using your Android phone, your credit card, or your RFID card you use for public transportation)
- ...
If you ever want to keep private, it is becoming very difficult.
- 1
- latest