My photo

Mildred's Website

My avatar

GoogleTalk, Jabber, XMPP address:

GPG Public Key
(Fingerprint 197C A7E6 645B 4299 6D37 684B 6F9D A8D6 9A7D 2E2B)

Category: en

Articles from 41 to 50

Wed 28 Nov 2012, 02:49 PM by Mildred Ki'Lya comp dev en lisaac lysaac

The overlooked problem

Let me explain the reason why I think Lisaac is broken and why I stopped working on it. The fundamental concept behind it is completely broken, and i never could solve the problem. First, let me explain a few things about Lisaac:

  • It is statically typed (like Eiffel)
  • It is prototype based (like Self)

It may not seem like it but those two things are mutually exclusive, unless something is done to reconcile the two. Being statically typed means you know the type (or a parent of the real type) at compile-time. Being prototype based means that the type can change completely.

Imagine you have a CUCUMBER that inherit VEGETABLE. If at some point during run-time, the cucumber inherit SOFTWARE (yes, cucumber is the name of a software as well) you have a problem. Yet, this is not the problem I want to talk about.

Lisaac solve this problem because you can only assign a child type as a parent. In the following object:

 Section Header
   + name := CUCUMBER;
 Section Inherit
   + parent :VEGETABLE := ...;

So, you're forbidden to assign a SOFTWARE in the parent slot because SOFTWARE is not a child of VEGETABLE. But, you're allowed to assign a GREEN_VEGETABLE. So not it becomes:

 Section Header
   + name := CUCUMBER;
 Section Inherit

Now, let's say you send a message to your cucumber, and through the magic of inheritance, you end up in a slot of the parent. That is, you are executing code that is located in GREEN_VEGETABLE. Then, you have a problem.

 Section Header
   + name := GREEN_VEGETABLE;
 Section Public
   // inherited from VEGETABLE
   - do_something <- paint_it_green;
 Section Private
   // specific to GREEN_VEGETABLE
   - paint_it_green <- ( "green".println; );

In the code above, if you are executing the do_something slot and have SELF = CUCUMBER, you can't possibly call the paint_it_green slot because it does not exist in CUCUMBER, and the Lisaac compiler will refuse to compile your code. To explain it another way, the type of Self is not compatible with GREEN_VEGETABLE, and this is a problem for this is the code we are executing. You can't easily solve this problem.

In the Self language, if you had a similar situation, it would work because it is not statically typed. It would have failed to find paint_in_green in the cucumber and called the parent, which would have been a green vegetable. Somehow, the type of the cucumber would have changed at runtime, which is incompatible with a type system that can only compute types at compile time (and have them immutable at run time).

So, I stopped working on Lisaac

I tried to start up a new project, Lysaac, but got confronted to the same problem once the compiler was mature enough to start getting into inheritance. So I stopped again. Life getting in the way didn't help.

A new hope

I tried to look into virtual machines, convincing myself that compiled languages couldn't possibly answer such need to dynamic things in the language. I looked at Self for the first time. I tried to build a virtual machine that was fully concurrent. I looked at the Go language because I know that it has nice concurrent primitives ... and I found illumination.

The Go language feels like a dynamic language, but it is compiled. I always wondered how it could implement duck typing, and I found out. I also realized that the difference between a virtual machine with a JIT compiler, and a runtime for a compiled language that included a compiler was very narrow. Basically, nothing prevents any language from being compiled instead of being JIT compiled. You just have to include a compiler in the standard library.

Ironically, that what Common Lisp is doing.

A new language

I like how Go implements interfaces, and I want to do the same. In this new language, an object is composed of :

  • an opaque work (pointer sized)
  • a pointer to an interface

An interface is a collection of pointers to functions. There is always a pointer for the fallback implementation and a pointer for the case operation (to case an object to a new interface).

I want to use a smalltalk/Io type syntax. So the following will represent an object with one variable and one slot:

 obj := {
   var := obj slot;
   slot <- { ... };

The data word of obj would be a pointer to a struct containing the variable "var" The interface of obj would be:

  • a function pointer for the fallback implementation that throws an error
  • a function pointer for the "cast" operation
  • a function pointer for the "slot" slot

I haven't figured out yet how to pass parameters to functions and return objects. This kind of things will require to comply to predefined interfaces (we need a way to tell which object type is returned by a slot for example). Interfaces would probably be defined at compile time and available as constants in code at run time (to be given as parameter to the cast slot).

If I want to be able to compile on the fly the code, I need to include the LLVM library. So I need to implement the language in a language that is compiled using LLVM (so bootstraping will be easier).

Thu 13 Sep 2012, 03:30 PM by Mildred Ki'Lya comp dev en idea wwwgen

I started writing wwwgen: a website generator that uses redo for its dependency tracking. Unfortunately, I might not have taken the correct approach to the problem. I reuse the webgen concepts, and that might be a bad idea.

Specifically, webgen (and my first version of wwwgen) are bottom-up systems (you take the sources and build everything they can generate). The problem is that redo itself is top-down (you take the target and build it, building sources as they are needed), and I tried to make the two match. It's very difficult, and impossible to do with a clean design.

What I'd like to have is a simple wwwgen binary that generates a HTML file from a source file. Let's imagine how it could work:

  • If the source file is a simple page with no templates, just generate the HTML and this is it

  • If the source file is a simple page with a template, redo-ifchange the template source and build the HTML combining the two

  • If the source file is an index, we have a problem because multiple outputs are generated. Redo doesn't support this case and we must find a way to make it work.

So, we have a problem here ...

Now, we have another problem. Specifically, my source file is called and I want my output file to be title/index.html. In webgen, this is implemented by a configuration in telling it to build in title/index.html.

There is a solution to solve both problems at once. the wwwgen command creates an archive (the formats needs to be defined, it could be tar, or different yaml documents in the same file for example). Then, the build process would be:

 find src -name "*.src" \
   | sed 's/src$/gen/' \
   | xargs -d '\n' redo-ifchange
 find src -name "*.gen" \
   | xargs -d '\n' wwwgen unpack

 redo-ifchange "$2.src"
 wwwgen --redo-dependencies -o "$3" generate "$2.src"

wwwgen generate would parse the source file and generate an archive, that will be unpacked later by wwwgen unpack. Let's see how it can work:

  • The source file can choose where it unpacks, relatively to the directory where the source file is

  • If the source file is an index, it will redo-ifchange the other source files for the index and redo-ifchange the template, generate multiple pages packed together.

  • If the source file is a tag tree (a special source that doesn't output anything on its own but create index files dynamically), then it parses every child to find a canonical list of tags and the paths they refer to. Then, it creates the index files. Unfortunately, those index files will never be compiled until next build.

How an we improve the design to be able to create source files dynamically.

There are different views to the problem:

  • pages, index and tags should all generate all the output files they are related to. It means that index files should be able to generate pages, and tags should be able to generate indexes and pages.

  • pages should generate the output file, index should generate pages and feeds and tags should generate index.

  • mixed solution (the one described): pages generate output file, index should generate the output files as well and tags generates index.

How can we generate source files on the fly:

  • have a predefined compilation order: first tags, then index and lastly feeds and pages.

  • rebuild everything until no more source files are generated. We might build unnecessary things.

I prefer the second solution which is more flexible, but we need a way to avoid building things twice. For example, it's not necessary to build a page if on the next phase the page source is going to be regenerated.

Very simply, the generated page can contain a link to the index source file that generated it, and when generating the page, redo-ifchange is run on the index file.

Next question: what if a tag is deleted. The corresponding index page is going to stay around until the next clean. The tag page should keep around a list of index files it generated and delete them when a tag is no longer detected. And deleting the index should not be done using rm because the index will need to delete the pages it generated. The best solution would be to integrate to redo to detect these files.

The build scripts now are:

 srclist="$(find src -name "*.src")"
 while [ "$oldsrclist" != "$srclist" ]; do
   echo "$srclist" \
     | sed 's/src$/gen/' \
     | xargs -d '\n' redo-ifchange
   srclist="$(find src -name "*.src")"

 find src -name "*.gen" \
   | xargs -d '\n' wwwgen unpack

 redo-ifchange "$2.src"
 wwwgen --redo-dependencies -o "$3" generate "$2.src"

Thu 02 Aug 2012, 11:15 AM by Mildred Ki'Lya comp dev en git

I was looking at Git, and which features may land in the next few releases, and I found the following things:

Git-SVN will be completely redesigned

If you worked with git-svn, you probably know that the git-svn workflow has nothing to do with git. Basically, you just have the svn history and have to use git-svn to push the changes back to the subversion repository. You can't use git-push and that's really annoying.

Recently, the git-remote-helpers feature was added. It allows git to interact with any kind of remote url, using a specific git-remote-* command. For example, you can already use mercurial this way (according to git-remote-hg):

Git allows pluggable remote repository protocols via helper scripts. If you have a script named "git-remote-XXX" then git will use it to interact with remote repositories whose URLs are of the form XXX::some-url-here. So you can imagine what a script named git-remote-hg will do.

Yes, this script provides a remote repository implementation that communicates with mercurial. Install it and you can do:

$ git clone hg::
$ cd some-mercurial-repo
$ # hackety hackety hack
$ git commit -a
$ git push

The plan is to do the same with subversion. You could just do:

 $ git clone svn::

Branches might be tricky to implement, they might not be there. But you will get what's already in git-svn but with a way better UI. And far more possibilities for the future.

Here is the summary from the GSoc page:

The submodule system of git is very powerful, yet not that easy to work with. This proposed work will strengthen the submodule system even more and improve the user experience when working with submodules.

Git repository:

Midterm evaluation: passed

Progress report / status:

  • [GSoC 11] submodule improvements at git mailing list
  • [GSoC 11 submodule] Status update at git mailing list
  • [RFC PATCH] Move git-dir for submodules at git mailing list

Submodules will be improved a lot

I wish it was already there. From the wiki page, the improvements will be:

As Dscho put it, submodules are the “neglected ugly duckling” of git. Time to change that …

Issues still to be tackled in this repo:

  • Let am, bisect, checkout, checkout-index, cherry-pick, merge, pull, read-tree, rebase, reset & stash work recursively on submodules (in progress)
  • Teach grep the --recursive option
  • Add means to specify which submodules shall be populated on clone
  • Showing that a submodule has a HEAD not on any branch in “git status”
  • gitk: Add popup menu for submodules to see the detailed history of changes
  • Teach “git prune” the “--recurse-submodules” option (and maybe honour the same default and options “git fetch” uses)
  • Better support for displaying merge conflicts of submodules
  • git gui: Add submodule menu for adding and fetching submodules
  • git status should call “git diff --submodule --ignore-submodules=dirty” instead of “git submodule summary” for providing a submodule summary when configured to do so.
  • Add an “always-tip” mode
  • Other commands that could benefit from a “--recurse-submodules” option: archive, branch, clean, commit, revert, tag.
  • In the long run should be converted to a rather simple wrapper script around core git functionality as more and more of that is implemented in the git core.

Submodule related bugs to fix

  • Cherry picking across submodule creation fails even if the cherry pick doesn’t touch any file in the submodules path
  • git submodule add doesn’t record the url in .git/config when the submodule path doesn’t exist.
  • git rebase --continue won’t work if the commit only contains submodule changes.

Issues already solved and merged into Junio’s Repo:

  • Since git 1.6.6:
    New --submodule option to “git diff” (many thanks to Dscho for writing the core part!)
    Display of submodule summaries instead of plain hashes in git gui and gitk
  • Since git 1.7.0:
    “git status” and “git diff*” show submodules with untracked or modified files in their work tree as “dirty”
    git gui: New popup menu for submodule diffs
  • Since git 1.7.1:
    Show the reason why working directories of submodules are dirty (untracked content and/or modified content) in superproject
  • Since git 1.7.2:
    Add parameters to the “--ignore-submodules” option for “git diff” and “git status” to control when a submodule is considered dirty
  • Since git 1.7.3:
    Add the “ignore” config option for the default behaviour of “git diff” and “git status”. Both .git/config and .gitmodules are parsed for this option, the value set in .git/config. will override that from .gitmodules
    Add a global config option to control when a submodule is considered dirty (written by Dscho)
    Better support for merging of submodules (thanks to Heiko Voigt for writing that)
  • Since git 1.7.4:
    Recursive fetching of submodules can be enabled via command line option or configuration.
  • Since git 1.7.5:
    fetch runs recursively on submodules by default when new commits have been recorded for them in the superproject
  • Since git 1.7.7:
    git push learned the --recurse-submodules=check option which errors out when trying to push a superproject commit where the submodule changes are not pushed (part of Frederik Gustafsson’s 2011 GSoC project)
  • Since git 1.7.8:
    The “update” option learned the value “none” which disables “submodule init” and “submodule update”
    The git directory of a newly cloned submodule is stored in the .git directory of the superproject, the submodules work tree contains only a gitfile. This is the first step towards recursive checkout, as it enables us to remove a submodule directory (part of Frederik Gustafsson’s 2011 GSoC project)

And the GSoC page:

The submodule system of git is very powerful, yet not that easy to work with. This proposed work will strengthen the submodule system even more and improve the user experience when working with submodules.

Git repository:

Midterm evaluation: passed

Progress report / status:

  • [GSoC 11] submodule improvements at git mailing list
  • [GSoC 11 submodule] Status update at git mailing list
  • [RFC PATCH] Move git-dir for submodules at git mailing list

Thu 26 Jul 2012, 12:37 PM by Mildred Ki'Lya en idea web

I want a new architecture for the web. But what web ?

The Traditional Web

A Web Site is a collection of Web Pages. A Web Page is a document, a static document I might add. The document contains text, images, sound and videos to present some information to the public. This is the traditional version of the web that I respect very much.

There is no need to change this as it works very well. The web was designed to allow a collection of documents to be accessed and does the job very well

The one this that is sensible to do however is not to use any scripting language to render the content of the pages. By definitions, these pages are static, using a scripting language like PHP is not only going to be inefficient, but is going to be a security risk.

Web Applications

This is an entire different thing, and the way we do this is completely wrong. We think that a web application is a collection of dynamic web pages. This should be erased from your mind.

A Web Application is an application of its own right.

Most unfortunately, applications are written in HTML which is not suited for this purpose. The widgets are very few and the framework is not suited to create complex User Interfaces.

The very bad thing to do

You should not do this. Unfortunately, this is how it works for most.

A very bad programmer is going to create a web page for each of the different aspects of its program. The web page would be dynamically generated on the server using a template engine. Some JavaScript would be included in this mess to avoid refreshing a page while in fact you should.

This is bad because you'll have a ton of page refresh, and that's bad for the user. If you're not using JavaScript, this is the old way of web applications and is acceptable, but if you're using JavaScript for more than a few things, it's bad.

You shouldn't use JavaScript to change the content of a page to match what the server would generate just to avoid a refresh. Just because this is just a way to recreate a template engine on the client that is redundant with the one on the server.

And if you are using JavaScript to change the content of a page to get the content that you should generally see in another page, you're a moron. This breaks the back/forward mechanism and is very bad.

The only sensible thing to do in this configuration with JavaScript is to script the user interface. Show hidden sections, enable a button. Don't contact the server using AJAX, you are already getting information from the server using normal page reload.

The sensible thing to do

Please do this!!!

An application is an application. It's a bundle and can't be separated into pages. It can be separated into sub-sections if you want, but those will necessarily trigger a page refresh.

The application should be the equal of a desktop application using a native toolkit, except that web technologies are used instead. Contacting the server should be limited to what is necessary only (fetching resources, exchange data, ...). In particular, the templates should be on the client side.

On the server side, you'd have just a classic Web API, and static resources. The Web API should be designed with care and security in mind. It should be easily shared with third parties that want to integrate within your application.

It is as simple as that:

  • Server = Web API + Resources
  • Client = UI (incl templates)

I heard that SproutCore is a solution to work this way, but it was horrible to use it as well.

Perhaps, we need to get away from frameworks, there are many solutions that can integrate well together and that don't need a framework. Read this article for example: Grab UI by the Handlebar

An Example

I have a very simple example to show you what I mean, the comment mechanism on this website. Each page contain a JavaScript script (just look at the sources) that will add the comment features. The page delivered by the server contain no information about the comments, except a <noscript> tag to tell the people without JavaScript that they miss the comments.

The script does an AJAX query on a third party server (with PHP and a database). The arguments are the URL of the page and the answer is the JSON formatted list of comments for the page. These comments are then presented on the page. (The refresh link does this again)

The script also creates a form to add a comment. When the form is submitted, an AJAX query is made with the URL of the page and the content of the comment as arguments.

This is how all web applications should work.

Going further: static databases on the server

I already wrote an article on this, but I'll summarize the idea here.

With this new idea for web applications, the application logic is moved from the server to the client. Perhaps not all the way, but for simple applications like the comments above, the server API is nothing much than a fancy database API.

Why not model a database on this model? But it already exists: CouchDB. This is a database which API is a web API. I want to take the principles of this database and mix them with the ideas of a static page generator.

The idea is that read only access when the resource URL is known, a static page should correspond to the URL and a classic web server should answer the query.

Only update and search queries would be filtered by the web server to a FastCGI application that is going to update the static files, or look at them to answer the search.

I find it difficult to find advantages of this compared to CouchDB. There is one thing, your data will always be accessible read only. No obscure database format that require a database server that might not work on new hardware, might not be maintained any more...


With this approach, most web applications could be composed of static assets that are accessing a database with a Web API. The static assets and the database could be on the same server or on a different one. No limit is posed here, except the same origin restriction of web browsers.

Fortunately, a new standard (that I'm using for the comments of this website) let you specify if the same origin policy should apply or not: Cross-Origin Resource Sharing

What about Cookies Then?

They should not exist in their current form.

This was a late addition to the HTTP standard, because it was already implemented. It was never really thought of, and is a breach of the original HTTP model.

The cookies are what allows XSS vulnerabilities, NOT JavaScript

The current web community takes the things the wrong way. Instead of damning cookies forever, they blame on JavaScript and try to build walls all over the place to avoid XSS while there is a most simple solution.

What are XSS?

Cross-site scripting (XSS) is a type of computer security vulnerability typically found in Web applications, such as web browsers through breaches of browser security, that enables attackers to inject client-side script into Web pages viewed by other users. A cross-site scripting vulnerability may be used by attackers to bypass access controls such as the same origin policy. Cross-site scripting carried out on websites accounted for roughly 84% of all security vulnerabilities documented by Symantec as of 2007.[1] Their effect may range from a petty nuisance to a significant security risk, depending on the sensitivity of the data handled by the vulnerable site and the nature of any security mitigation implemented by the site's owner.

(Source: Wikipedia)

This is a wrong way to look at the problem. Inclusion of other items in web pages dates from the very beginning. It is the foundation of the web. What allows to have rich documents, with links between them. The same origin policy is just one of these walls that I talked about, that assumes that all pages on the current domain can be trusted. Is that true?

The real story is that if the design of the web was correct, the script injected by attackers in web pages would be completely harmless. The same origin policy shouldn't exist and bypassing it should be by no mean be a security risk.

We have come to a situation where WebGL can be a security risk because it can determine the colours of an image included in the web page. Just because an image can, from the very beginning of the web, bypass the same origin policy. Some people say that images should respect the same origin policy, but what they don't imagine is that we could just forget all of this mess.

The Real Problem.

The real problem is that when loading foreign content, though AJAX, the use of images or whatever, a page can load a foreign page where you are logged in using cookies. And access your foreign account. For example your bank account.

The problem is that the page containing the XSS script was granted the access to the cookie protecting your bank account.

Why the hell your bank account is just protected by a cookie than any page can have access to???

The solution

Unless you have in your address bar the address of your bank, the cookie protecting your bank account should be locked away. In fact, instead of having everything in a page respect the same origin policy, cookies should be the single thing respecting the same origin policy;

This is just a change in the handling of the cookie by the browser. Not a very big change at that. Only cookies for the domain in the address bar should be exploitable. But it would break a number of assumption:

Third party cookies would no longer exist, this would be a big shock for advertisers and tracking programs. A loss for these companies, but a win for personal privacy.

Allow third party cookies nonetheless

We could nonetheless allow third party cookies, but only in the context of a specific page:

  • each tab/window of the browser share the same cookie jar for the requests
  • for each tab/window, a separate cookie jar would be assigned that will be emptied when a link is followed to a different origin
  • the sub-elements of a page that are not of the same origin of the page itself, should have their cookies in the sub cookie jar only
  • if there are sub-sub elements, they should be assigned to a sub-sub cookie jar recursively.

Allow third party cookies to access the main cookie jar on demand

This would nonetheless break a number of things. Third party scripts expect to see the same cookies they set a while ago, even if it was not on the same page.

A solution to this problem would be to add an option in the browser UI to specifically allow a third party script in a page to access the global cookie jar. There would be a button saying for example "Some cookies have been blocked". When clicked it would open a menu with:

  • Allow access to
  • Allow access to
  • Allow access to

A sensible person would not allow a blog with an XSS script access his bank website, but would allow facebook if he wants to use a facebook button.

Browser plugins, anyone ?

I'd love a browser plugin for that !

Unfortunately, on Firefox, creating sub cookie jar would be difficult due to the architecture of it. Webkit is better on that.


Cookies aren't necessarily bad, they should just respect the same origin policy they introduced instead of imposing it to all of the other elements of a page. This can already be done now in web browsers, it just needs a consensus. Or at least, an extension with the right UI to bypass the same origin on demand when the scripts needs to.

Thu 26 Jul 2012, 10:57 AM by Mildred Ki'Lya en idea web wwwgen wwwsupport

I want to create a new static webpage generator. There are tons of them, you'll tell me, why do I want to create a new one?

Static Website Generator

Let's see what we have:

  • webby: my first love
  • nanoc: the second one I used
  • webgen: the one I'm using currently (with tons of local modifications)
  • jekyll: powering GitHub pages
  • and many more most probably

Most of these systems take the assumption that you are a developer and you are comfortable to write code to have your website running. In all of these systems, I was forced to code some Ruby here and there to get it working how I wanted to.

The case of nanoc3 is very special because the main configuration uses a special Ruby DSL to manage the path locations. If you have non technical users, they won't be willing to write code in this file anyway. And for technical users, it might not be powerful enough.

Jekyll is probably the simplest system and can probably be used by non technical users, but it is far too simple and not powerful enough. That's why I didn't used it.

In the end, I modified Webgen a lot to include the following features:

  • ability to write ruby code in haml instead on having to rely on the template engine included in webgen
  • special .index nodes that will generate a paginated index of articles found dynamically. The index would contain the last articles in reverse order while the pages will contain each N articles in the natural order. This way, an article that end up on page 3 is always going to be on page 3.
  • special .tags nodes that will generate .index pages dynamically to create an index or articles for each tag.

If you look around, there are not many static web page generators that permit that. First, I decided I would maintain my own version of webgen with these modifications, but now, I have the idea that the code base is so horrible that I prefer rewrite the same functions from scratch.

Rewriting From Scratch

As I said, I'm not satisfied with the current status of the webgen code. There is a complex system of cache, and a blackboard object that is used to dispatch method call to the correct objects around the system. The problem is that this extra indirection level makes it difficult to know the code path. It would be useful if the hooks in the blackboard would be highly dynamic, but it's mostly static. it serves no purpose whatsoever.

Moreover, I don't like big programs that do everything. And all of these static website generators have a component that is used to determine which pages are out of date, and only update them. This is traditionally what make(1) should do. And recently, I found that redo does the job very well. So, I want it to be an integral part of my new system.


Recently, I wrote a piece of code: WWWSupport. It's a very simple git repository that is supposed to be included as a submodule of the git repository of a website. it contains a daemontools daemon that receives e-mail from a special mailbox and convert them into blog posts on the fly (that's how I'm currently writing this block post).

I want my WWWGen project to integrate the same way into my website.


WWWGen is the name of my website generator. The architecture is very simple:

  • A src directory containing the pages, in the same format as webgen
  • A nodes directory containing the files WWWGen is working on to generate output
  • An output directory where the result files in nodes is copied to, and where some static assets are copied as well (using rsync)
  • A redo script that contains the configuration and invokes the wwwgen script.

The wwwgen script will create the nodes directory and the redo scripts necessary to its working in it. Then, it will do three things:

  1. For each source file, the script will create:

    • A .node file that contain a relative path to the source file, and represents it.
    • As many .outdep files as the source file will generate output files. The .outdep file is not managed by redo (because redo doesn't support multiple targets yet). It references the .node file using a relative path.

    Note that during this process, new sources can be generated to allow to create new nodes. This step will be executed until no new sources are generated.

  2. Once this is done, the main script will look for all of the .outdep files and will build the corresponding .out file. The .out file will contain the final processed result of the page

  3. Copy all .out files in the output directory (after removing the .out extension) and all the static files in the static directory.

Note that step 1 and 2 recursively call redo to generate the .node and .out files. This two step design is necessary to account for multiple pages generated from a single source.


In all my projects, I always want to focus on the usability of what I create. I always think that non programmers should be able to do the same that I did, to a certain limit. For example, my personal e-mail server at home is scripted all the way. Reinstalling it should be a matter of:

  • installing a debian system with a basic configuration
  • clone my git repositories
  • Set some basic configuration (hostname, ...)
  • run redo install

I agree that even then, not anybody can install at home a mail server, but with a process that simple, it's possible to create user interfaces for it. So even if it's not there, it's a possibility.

I want the same for WWWGen. It leaves a possibility for creating a user interface. Nothing prevents from creating a web application or even a native application, that will create markdown pages with a WYSIWYG editor (à la WordPress). The page files thus created could be checked out in a git repository and pushed to a server. There, a hook will run WWWGen to update the website with the modifications.

This could be seriously a very good alternative to WordPress, and I'd preefer working with such a system than WordPress.

What already exists

I am not very good at creating desktop applications. So I thought I would reuse the existing ones : my mailer. I's like a Content Management System where everything must be configured by hand, and only articles can be submitted using an e-mail.

This post is being sent by e-mail to my personal web server. Currently, I'm still using plain text with a markdown syntax, but we could reuse the HTML markup in a mail. This e-mail is then processed by a special alias in /etc/aliases:

 http.www: "|/srv/http/www/wwwsupport/"

This line was automatically generated by an script. The script will read the mail using a ruby script and will create a file in my website git repository with the content of the mail. Then webgen is run and the content of the repository is pushed to origin.

As a consequence, the new article appears on the website. This is a very simple form of user interface, but it permits anybody to write blog posts.

What features I would like to see

  • Better parsing of HTML e-mail
  • Using wwwgen instead of webgen
  • Support for image galleries using git-annex in wwwgen
  • Support for taking the attached images in an e-mail to create a gallery on my website?

What features a customer would want

A web application that can :

  • modify the website configuration in the file
  • modify any source files, using a WYSIWYG editor when applicable
  • add static assets (possibly using git-annex)
  • run the website compilation and preview it
  • push to a server to update the production website

This is entirely possible.

Fri 06 Jul 2012, 11:08 AM by Mildred Ki'Lya en idea

Websites should be static. Any dynamic thingy (scripting using php, ruby, python and the like) should be limited to a set of features that absolutely cannot be implemented otherwise. Most GET requests should lead to a static resolution (there can be exceptions such as GET queries of search engines for instance).

Respecting this principle is quite simple, just generate the static pages when they are updated, and no not worry about them afterwards. This is what I try to use for my website, and it works quite well.

Advantages of this technique :

  • your data will always be readable in the future, even if you may not write it any more
  • improved security: normal operations only involve static files. You get to spend more time with update actions (POST and PUT) and design them better.

Now, I had the idea to extend this to databases. Do you know about CouchDB ? It's a database which has a web interface only. I very like its design but again, I'd like it to use the same principle as above.

The idea of such a database came with the feature I developed for my blog: the user comments. In this blog, the user comments are completely managed with JavaScript. If you don't have JavaScript, you don't have comments at all. How do that work ?


To get the comments for an article, the JavaScript will contact a simple PHP application in another server (a free hosting service). This simple application is able to store and get JSON data using REST requests. The JavaScript will then use XmlHttpRequest to contact the server and give it the canonical URL (<link rel=canonical>). The server will answer a JSON object with the comments.

Storing a comments is done the same way, using a POST request instead of a GET request.

To a Database Server

This is very simple yet powerful. Why not extend this design to:

  • allow any kind of data, not just comments
  • allow simple GET requests to bypass any script and just fetch the raw data

We can imagine the data store to be publicly accessible using URL that end up with the .json suffix. There would be a similar URL with .json.meta to access the metadata about an object (its current version, right access, ...). We can imagine the web applications of the future being completely implemented on the client side. The server side would be just a shared database.

We would obviously need a security layer to prevent anyone to read anything if they should not be allowed. We can imagine three levels of permissions:

  • read and write by everyone
  • read by everyone, write only by authorized user
  • read and write only by authorized user

We could imagine many different authentication mechanisms. For most data, the mechanism could be of a shared secret. The metadata of a json file would contain :

 "auth": "shared-secret",
 "secret": "path/to/another/file"

To get access to the file, the client would have to provide the exact content of the file "path/to/another/file", which would obviously be a protected file, readable only by authorized access. It could be a login/password or anything else.

Update operations would be :

  • PUT: to update the entire content of the file
  • POST: append to the existing data (the data should be a JSON array)

The data file will have an associated version which will be in the form of "sha1:<sha1 of the file>". To successfully update a data file, the existing version of the file must be given. If it is not the same, the client should retry. This is the same concept as in CouchDB.

Thu 21 Jun 2012, 03:08 PM by Mildred Ki'Lya dev en website

Hi everyone, I just developed a new way to update my blog. I am just using my mailer (which is currently Thunderbird for a lack of a better alternative). As you may know, I already manage my own server. I have a nginx web server and my website is just plain old static pages generated from markdown using my own version of webgen.

My code is available at two locations:

I just developed a few scripts which does the following:

  • hooked in /etc/aliases, receive e-mail from a special e-mail address. Put this mail in an inbox directory on the server
  • another daemon (a shell script run by daemontools actually) run as the www-data user is monitoring the inbox using inotifywait. When a mail comes, or every 60 seconds, it process the mail using a ruby script that convert the mail to a markdown page, and run webgen on the website. It also commit and push the changes.

And that's all.

The one thing I couldn't automate is the creation of the private key to push the changes back to the git server.

All the code is there. I just don't know if I have a public URl for these repos :)

Tue 28 Feb 2012, 11:07 AM by Mildred Ki'Lya comp en web

I just moved the site to webgen. For the occasion, forked the project on github and improved a lot of things. As webgen doesn't seem to be alive, I think I'll be the one maintaining it in the future.

I took the occasion to move my website over to my own web server, hosted at home. If I ever find it unsuitable, I can very easily host my website elsewhere as it is just a bunch of static pages.

More to come ...

Continue Reading ...

Wed 19 Oct 2011, 10:21 AM by Mildred Ki'Lya comp dev en privacy web

What is this all about: web privacy. We are tracked everywhere, and i'd like to help if possible. So, let's design a web browser that for once respects your privacy.

Main features:

  • Each website has its own cookie jar, its own cache and its own HTML5 local storage
  • History related css attributes are disabled
  • External plugins are only enabled on demand
  • Support for Tor/I2P is enabled by default
  • You have complete control over who receives what information
  • Let's you control in the settings if you want to allow referrers or not.
  • The contextual menu let's you open links anonymously (no referrer, anonymous session)

The browser is bundled with a particular UI that let you control everything during your browsing session. it is non intrusive and makes the best choice by default. I am thinking of a notification bar that shows at the bottom. I noticed that this place is not intrusive when I realized that the search bar in Firefox was most of the time open, even if most of the time I didn't use it.

First, let's define a session:

  • A session can be used my more than one domain at the same time.
  • A session is associated to a specific cache storage
  • A session is associated to a specific HTML5 storage
  • A session is associated to a specific cookie jar
  • A session acn be closed. When it is closed, session cookies are deleted from the session
  • A session can be reopened. All long lasting cookies, cache, HTML5 storage and such is then used again.
  • A session can be anonymous. In such a case, the session is deleted completely when it is closed.
  • A session is associated to none, one or more domains. These domains are the domains the end user can see in the address bar, not the sub items in the page.

Sessions are like Firefox profiles. If you iopen a new session, it's like you opened a new Firefox profile you just created. Because people will never create a different Firefox profile for each site.

If we want to protect privacy, when a link is opened, a new session should be created each time. To make it usable to browse web sites, it is made possible to share sessions in specific cases. Let's define the cases where it might be intelligent to share a profile:

  • You click a link or submit a form and expect to still be logged-in in the site you are viewing. You don't care if you follow a link to an external page.

    User Interface: If the link matches one of the domains of the session, then keep the session. No UI. If the user wanted a new session, the "Open anonymously" entry in the context menu exists. A button on the toolbar might be available to enter a state where we always want to open links anonymously.

    If the link points to another domain, then open the link in a new session unless "Open in the same session" was specified in the context menu. The UI contains:

    We Protected your privacy by separating <domain of the new site> from
    the site you were visiting previously (<domain of the previous site>).
    Choices: [ (1) Create a new anonymous session          | ▼ ]
             | (2) Continue session from <previous domain> |
             | (3) Use a previous session for <new domain> |
             | (4) Use session from bookmark "<name>"      |

    The first choice is considered the default and the page is loaded with it. If the user chooses a new option, then the page is reloaded.

    If the user chooses (2), the page is reloaded with the previous session and the user will be asked if "Do you want <old domain> and <new domain> to have access to the same private information about you?". Answers are Yes, No and Always. If the answer is Always, then in the configuration, the two domains are considered one and the same.

    The choice (3) will use the most recent session for the new domain. It might be a session currently in use or a session in the history.

    There are as many (4) options as there are bookmarks for the new domain. If different bookmarks share a single session, only one bookmark is shown. This choice will load the session from the bookmark.

    If (3) and (4) are the same sessions, and there is only one bookmark (4), then the (4) option is left out.

  • You use a bookmark and expect to continue the session you had for this bookmark (webmails)

    The session is simpely stored in the bookmark. When saving a bookmark, there is an option to store the session with it or not.

    [X] Do not save any personal information with this bookmark
  • You open a new URL and you might want reuse a session that was opened for this URL.

    The User Interface allows you to restore the session:

    We protected your privacy by not sending any personal information to
    <domain>. If you want <domain> to receive private information, please
    Choices: [ Do not send private information     | ▼ ]
             | Use a previous session for <domain> |
             | Use session from bookmark "<name>"  |

If you can see other use cases, please comment on that.

From these use cases, I can infer three kind of sessions:

  • Live sessions, currently in use
  • Saved sessions, associated to a bookmark
  • Closed sessions in the past, accessible using history. Collected after a too long time.

Now, how to implement that? I was thinking of QtWebKit as I already worked with Qt and it's easy to work with.

  • We have a main widget: QWebView. We want to change the session when a new page is loaded. So we hook up with the signal loadStarted.
  • We prevent history related CSS rules by implementing QWebHistoryInterface, more specifically, we store the history necessary to implement QWebHistoryInterface in the session.
  • We change the cache by implementing QAbstractNetworkCache and setting it using view->page()->networkAccessManager()->setCache(...)
  • We change the cookie jar by implementing QNetworkCookieJar and setting it using view->page()->networkAccessManager()->setCookieJar(...)
  • Change the local storage path using a directory dedicated for the session and using view->page()->settings()->setLocalStoragePath(QString)

After all that, we'll have to inspect the resulting browser to determine if there are still areas where we fail at protecting privacy.

Wed 28 Sep 2011, 10:42 AM by Mildred comp en privacy

Here is my Bug 660340. I created it after looking at the recent facebook 'enhancements' that makes privacy even more precious (article in French).

We need to quickly find a way to preserve our privacy on the Internet.


With the recent threats of the various big internet companies on our privacy, it would be a great enhancement if epiphany allowed to have separate navigation contexts (cookies, HTML5 storage, cache) at will, and easily.

Some companies, especially facebook, and I suppose Google could do that as well, can use all kind of methods to track a user usiquely. using cache, HTML5 storage or cookies. I wonder if they can use the cache as well, but I heard it could be prevented. Firefox does.

One solution to counter these privacy threats is to have a different browser, or different browser profile, for each of the web sites we load. This is however very inconvenient, and it should made easily possible.

First let define the concept of session. A session is almost like a separate instance of the browser. It share bookmark and preferences with other session, but have separate cache, separate set of cookies and separate HTML5 DOM storage.

I imagine the following behaviour, based on the document.domain of the toplevel document:

  • If a page is loaded without referrer, and the domain is not associated with an existing session, start a new session for that domain

  • If a page is loaded without referrer, and the domain is a domain that is already associated with an existing session, then prompt non intrusively:

    "You already opened a session on"
    Choices: [  Start a new session    ▼ ] [Use this session]
             [[ New anonymous session   ]]
             [  Replace existing session ]

    If the session is started anonymously, it would not be considered for reuse

  • If a page is loaded using a link from an existing window/tab, and the domain is the same, then share the session

  • If a page is loaded using a link from an existing window/tab, and the domain is NOT the same and, then a non intrusive message is displayed:

    "You are now visiting Do you want to continue your session
    Choices: [  Start a new session    ▼ ] [Share previous session]

    The "Start a new session" dropdown menu changes if the is already associated with a session or not. If is associated with a session:

     [[ New anonymous session            ]]
     [  Replace session      ]
     [  Use existing session ]

    If is not associated with a session:

     [[ New anonymous session    ]]
     [  New session  ]

The choices in [[xxx]] (as opposed to [xxx]) is the most privacy enhancing one, and would be the default if the user choose in the preferences

[x] allow me not to be tracked

The messages are non intrusive, they can be displayed as a banner on top of the page. The page is first loaded with the default choice, and if the user decides to use the other choice, the page will be reloaded accordingly (or the session will be reassigned).

This setting can traditionally be used to set the do not track header

Every settings should have a setting "do not prompt me again" that could be reset at some point.

About embedded content: Because toplevel pages do not share the same session (toplevel page opened at have a different session than toplevel page opened at, if a page from have embedded content from, the embedded content would not be able to track the user, except within the website.

It is possible to imagine global settings that would hide some complexity:

[ ] When I load a page, always associate it with its existing session.
[ ] When I switch website, always reuse the existing session of the new

This user interface might seem complex at first, but it is far less complex than letting the user deal with different browser profiles by hand. Unfortunately, I don't think we can abstract privacy that easily. Keep in mind that all of these settings would be enabled only if the user choose to enable privacy settings.

I am ready to contribute to the implementation of this highly important feature (important for our future and our privacy). Do you think an extension might be able to do all of that, or do you think the browser code should be modified?

Further possible enhancements:

  • Add another choice for anonymous sessions using Tor (or I2P)

  • Add the possibility to have multiple session registered with the same domain. This would enable the user to have different profiles for the same website.