Tuesday, February 26, 2013

LibJIT news.

I finally come out of the winter hibernation and start performing my LibJIT maintainer duties. I have created the new official LibJIT home page: http://www.gnu.org/software/libjit/, dedicated LibJIT mailing list, and new savannah git repository for LibJIT. The links can be found on the new homepage. My own LibJIT page has been also updated.

Friday, December 21, 2012

MainMemory project

As mentioned in the previous post the amount of work needed to finalize next libjit feature is really small. The reason why it is still not done is that currently I spend more time on another project:


 I estimate that I will be mostly busy with this project until Summer 2013. After that I plan to work on libjit more actively.

libjit project

Libjit is to be a separate GNU project. The rest of DotGNU after being orphaned for some time has been decommissioned. I am going to continue maintaining libjit but I'm still cannot allocate enough time for it. I hope this might change in a few months. The new libjit pluggable memory manager is almost done. I would need just a few full days of hacking to finish it completely.

Linux msync braindamage, Part 2

The first post discussed how msync() could be useful for an implementation of a transactional system and how it is not so on Linux. It is fairly curious that the comment from the linux kernel source reproduced in that post says that msync(MS_ASYNC) used to be different in the past.

The history of msync() is revealed in this LKML thread:

msync() behaviour broken for MS_ASYNC, revert patch?

After I went through this thread my initial reaction was a great surprise. The linux kernel maintainers just plain refuse to implement msync the way me and, as can be seen from the thread itself, some other people see it. This thread took place before the MS_ASYNC flag became a no-op in the linux kernel. Initially I wanted to write a lengthy analysis of the arguments in the thread put in the light of this later fact.

However partially because of laziness and partially because of recognizing the pointlessness of arguing with Linus Torvalds in my personal blog that nobody reads I stopped short of it. If somebody occasionally comes across this post then a please read the original thread and draw your own conclussion on the issue of how valid is the linux approach to MS_ASYNC.

Thursday, January 26, 2012

Tsuna's blog: How long does it take to make a context switch?

Tsuna's blog: How long does it take to make a context switch?: That's a interesting question I'm willing to waste some of my time on. Someone at StumbleUpon emitted the hypothesis that with all the impr...

Thursday, August 4, 2011

Linux msync braindamage, Part 1

About msync() system call

Contemplating a possible implementation of log-based transactional system and looking at the UNIX API it seems natural to employ msync() function for memory mapped log files.
     msync - synchronize memory with physical storage

     int msync(void *addr, size_t len, int flags); 
Indeed msync specification states the following:
The msync() function should be used by programs that require a memory object to be in a known state; for example, in building transaction facilities.
The idea is that the log file should be mmaped to the address space of the process, the log data is written to the memory and synchronized with disk by the power of OS virtual memory mechanism. This way there is no need to allocate in-memory buffer for log data and call write() when the buffer is full. Instead just when the transaction is to be committed, exactly the portion of the mmaped log that contains the transaction data is msynced and that's it. Concurrently the data for other transactions can be written further down the log and stay cached in memory avoiding unnecessary I/O. Additional appeal to msync() gives the existence of two modes MS_ASYNC and MS_SYNC:
When MS_ASYNC is specified, msync() shall return immediately once all the write operations are initiated or queued for servicing; when MS_SYNC is specified, msync() shall not return until all write operations are completed as defined for synchronized I/O data integrity completion.
I can't help but think that msync() was introduced to UNIX specifically to cater DBMS people. This cannot be a coincidence. This is just what one would want developing a DBMS engine.

Okay, so far I referred to POSIX and UNIX. However currently probably most attention deserves one particular implementation of POSIX API, namely, Linux. Just checking the Linux msync() man page it seems that everything is good. It pretty much conforms the POSIX specification. Or so it says.

Once you start wondering what is situation on the ground the picture becomes more complicated. One interesting tidbit can be found in FreeBSD man page:
The msync() system call is obsolete since BSD implements a coherent file system buffer cache. However, it may be used to associate dirty VM pages with file system buffers and thus cause them to be flushed to physical media sooner rather than later.
This is confusing. The purpose of msync is to ensure data integrity. I understand that if a process crashes then its modified mmapped data still remains in the system cache and at some point it will be synchronized with physical storage. So far so good. But what if the whole system crashes? Without msync() this will result in the data loss. Or are they saying that their msync() merely causes the page flush to happen somewhat earlier but not right away? So on FreeBSD there is no big difference whether you msync() or not as it provides no integrity guarantee anyway? Well, I don't have answers to these questions as now I don't want to spend much time on FreeBSD research, I'm more focused on the Linux.

About msync() on Linux

So what is about msync() on Linux precisely? In the fairly recent Linux release the following comment could be found in the file linux/mm/msync.c:
 * MS_SYNC syncs the entire file - including mappings.
 * MS_ASYNC does not start I/O (it used to, up to 2.5.67).
 * Nor does it marks the relevant pages dirty (it used to up to 2.6.17).
 * Now it doesn't do anything, since dirty pages are properly tracked.
 * The application may now run fsync() to
 * write out the dirty pages and wait on the writeout and check the result.
 * Or the application may run fadvise(FADV_DONTNEED) against the fd to start
 * async writeout immediately.
 * So by _not_ starting I/O in MS_ASYNC we provide complete flexibility to
 * applications.
So let me summarize the current status of msync() on Linux:
  • msync(..., MS_ASYNC) is effectively noop
  • msync(..., MS_SYNC) is effectively equal to fsync()
The bottom line is msync() is completely useless on Linux. It cannot help with transaction log idea described above and for that matter it cannot help with anything else. The comment in the source code suggests to use other system calls. At the same time the Linux man page for msync() is absolutely misleading. It makes it apear that everything's fine, that it fully implements the UNIX specifications.

Okay, should we stop here? Or is there more to learn yet? Sure, it is.

[To be continued]

Monday, August 1, 2011

Innodb is full of crap

Innodb code is full of crap. All across the source base it pretends that it can do multiple log groups. But it always initializes only one. And sometimes at random places it acknowledges this fact. There is a case of this schizophrenia within single function. In file log0log.c, function log_write_up_to(), we see the code with this comment:

group = UT_LIST_GET_FIRST(log_sys->log_groups);
        group->n_pending_writes++;      /*!< We assume here that we have only
                                        one log group! */
Then a few lines below we see iteration over the list of groups.
group = UT_LIST_GET_FIRST(log_sys->log_groups);
        /* Do the write to the log files */
        while (group) {
                group = UT_LIST_GET_NEXT(log_groups, group);
Why iterate over the list that always has only one member? Why have this list at all, why it's not direct pointer to the log group? Who needs multiple log groups that never are? Even more interesting is that there is a large part of log-related code that stays there but is disabled with #ifdefs. This is the code for something called log archives. The related configuration options are documented like this:
  • innodb_log_arch_dir
    This variable is unused, and is deprecated as of MySQL 5.0.24. It is removed in MySQL 5.1
  • innodb_log_archive
    Whether to log InnoDB archive files. This variable is present for historical reasons, but is unused. Recovery from a backup is done by MySQL using its own log files, so there is no need to archive InnoDB log files. The default for this variable is 0.

Almost half (albeit disabled) of 3500+ lines of log0log.c file from MySQL 5.1 still deal with these log archives. And this code is still there in MySQL 5.5. Is there anybody who needs it?

The same is for support for some ancient checksum algorithms. I guess the Innodb files with these checksums if still can be found then it's only on certain spacious 40MB hard-drives that collect dust in some abandoned garage in Finland.

And let's not forget that wonderful uber-portable 64-bit arithmetic with the help of macros. Yes, it makes the code so readable. I just like to learn what are the current target platforms where there is no compiler available with "long long" or "int64_t" arithmetic? I believe that in year 2011 we could already think of 32-bit arithmetic as something special and 64-bit as the default.

All in all, with all the effort to scale up MySQL to high-end servers why not start cleaning up the mess already?

Friday, July 29, 2011

Tinkering with web design

I made a few changes to my web pages.

- Finally updated info on the libjit page about getting libjit sources from the git repository
- Removed wordpress blog that I never really used and that only was a target for spammers and linked this blog my static pages
- Modified style sheet for my static pages to use analogous color scheme
- Tinkered with the blog template to make it similar to my static pages

My initial style sheet used colors that I chose semi-randomly. I just put in some hex value with digits that looked good for me, then looked at the page in the browser and tried again until I more or less liked it. Time passed and I realized that my color choice was awful. I bothered to read about color schemes and went on with free online tool Adobe Kuler to create my new scheme. Immediately I liked the look of my site much better. Perhaps even with new colors I still will be a laughing stock for people more sensitive to design. I never was one, sorry. But right now I'm happy with my colors.

On the other hand I'm not completely happy with blogspot page template that I got. There are still some glitches. However I fixed the most irritating thing for me personally. I like to maximize my browser window to see at once as much of a page content as possible. But the width of the content area is too small in the default Blogger.com theme, it occupies but a narrow stripe in the middle of the window. I converted the template to elastic design that I use with my static pages and so the text now utilizes much more window space. For me this is a big win.

Thursday, July 28, 2011

Back to libjit hacking

Right now I have more free time than I had during last 2 years so perhaps I will be able to contribute something new to libjit.

Currently I'm trying to improve libjit memory management. There is a proposed patch for pluggable memory allocator from Patrick van Beem (http://savannah.gnu.org/patch/?7237). I fully recognize the need for some applications to perform custom memory allocation. However I would like to have more elaborate solution for this problem than that found in the provided patch. First of all, libjit's own memory manager (jit/jit-cache.[hc]) is not so good. For instance, the way it allocates function redirectors may result in memory leaks. The patch supposedly resolves this problem but only if pluggable memory manager supports some extra feature not available for libjit default manager. This is clearly not how it should be done. The leak should be fixed in the way not dependent on which memory manager is used.

So I try to find appropriate solution that would fix this problem for all libjit users whereas Patrick's patch keeps libjit logic mostly intact and "solves" the problem by letting third-party allocator do something that normal libjit users will not have ability to do.

Another thing to consider is that libjit allocates code space in relatively small chunks. If at the compile time libjit figures that the code for a function doesn't fit to the allocated chunk then a bigger chunk is allocated and the compilation is restarted. However on systems with virtual memory (pretty much any modern system where libjit is likely to be ever used) a program can reserve very large amounts of memory in the first place. The system allocates physical memory page for a virtual memory page only when it is really accessed, not when it is reserved. Hence there should be no need for code space reallocation and recompilation. Normally, the way to go is to reserve memory block with size, say, 0.5 GB and all the code generated during lifetime of libjit application should go there. Initially we can commit only few pages from this amount and commit more on demand. If this block ever becomes full then we report error and just quit. Of course, the the size of allocated block should be configurable.

It might be that some application will not be happy with such allocation scheme. For instance, it might target an embedded system where the old allocation scheme works better. Or the application has tight control over the lifetime of JITed functions and it can tell if particular function is no longer needed so the space occupied by the function's code could be reclaimed and used for something else. And this is exactly what pluggable memory manger interface is for.

So for me the first goal is to provide better internal memory manager for libjit and the second goal provider interface to plug custom managers. Along the way I should figure out the most flexible interface that will allow application to do whatever it wants.