Adventures of a DTrace Addict: Part 0

From ls to mmap

Back in my day, I would probe by hand. Now you can get software
that does the job for you.
-Kevin Mitnick

My name is Nick and I am a DTrace addict.

Hi Nick

I’m not sure when it started (it’s hard to remember when you’re having fun),
but it was either near the end of high school or the beginning of college, when
I transitioned from Haskell to C, having realized that I had something of a
fetish for systems programming. I had OpenSolaris installed and was aching to
rip into the intricate, gritty, and perversely beautiful mechanisms that make it
sane to use a computer for pretty much everything important.

Initially, I set out to read the source code for various utilities that I use,
ranging from the simple (ls, cp, mv, etc) to the complex (X11, firefox, jvm,
etc). Then I planned on transitioning to infrastructure (libc, libumem, the
kernel, etc).

Needless to say, I figured that just reading the source wasn’t enough to gather
a true understanding of the system in a timely manner. I was also afraid that I
might misunderstand something, and have a completely false intuition of how the
system works, which I find to be a far worse affliction than not knowing at
all.

So, having already chanced upon Bryan Cantrill’s DTrace Review
(Google Tech Talk), I did some extensive research, and decided that DTrace was
going to be an indispensable companion, offering guidance and illuminating the
darkest of tunnels and infrastructural perplexities.

After reading the manual, following along with examples on the blogs of the
creators of DTrace and other DTrace addicts, and field testing the
software on some class projects, I took time away to understand the operation
of the utilities that I use, keeping the source code nearby.

And so, curious reader, the journey began, as I descended into the dark depths
of a gaping abyss, not unlike a potholing version of Indiana Jones. Stopping,
frequently, to turn over rocks and poke anything that looks strange or alive.

So, I started out light, deciding to check out the ls utility. How
complicated could it be?

I had multiple vectors from which I could have started jabbing ls.

I chose to use the syscall provider to get a high-level view of what ls was
asking the system to do.

Here is the one-liner I used on a directory full of source files (in
particular, the root directory of the source for the ls command in Illumos rev 147):

pfexec dtrace -c 'ls' -n 'syscall:::entry /pid == $target/
    {@[probefunc] = count();}'

Here are the results:

  fcntl                                                             1
  getpid                                                            1
  getrlimit                                                         1
  openat                                                            1
  rexit                                                             1
  sysi86                                                            1
  write                                                             1
  getdents64                                                        2
  memcntl                                                           2
  mmap                                                              2
  mmapobj                                                           2
  open                                                              2
  resolvepath                                                       2
  setcontext                                                        2
  ioctl                                                             3
  stat64                                                            3
  fstat64                                                           4
  close                                                             5
  brk                                                               6

So, one can see that brk was run most frequently, followed by close, and so on.

But what parts of the code were calling on the system to do its bidding? The
following one-liner gives the stack-trace of all the userland functions that
were invoking the system calls:

pfexec dtrace -c 'ls' -n 'syscall:::entry /pid == $target/
    {@[probefunc,ustack()] = count();}'

The output of which looks like this:

  brk                                               
          libc.so.1`_brk_unlocked+0x15
          libc.so.1`sbrk+0x35
          libc.so.1`_morecore+0xfc
          libc.so.1`_malloc_unlocked+0x17f
          libc.so.1`malloc+0x35
          ls`xmalloc+0x14
          ls`main+0x35e
          ls`_start+0x7d
        1
  brk                                               
          libc.so.1`_brk_unlocked+0x15
          libc.so.1`sbrk+0x35
          libc.so.1`_morecore+0xfc
          libc.so.1`_malloc_unlocked+0x17f
          libc.so.1`_smalloc+0x50
          libc.so.1`_malloc_unlocked+0x209
          libc.so.1`malloc+0x35
          libc.so.1`strdup+0x26
          libc.so.1`expand_locale_name+0x469
          libc.so.1`setlocale+0x8ab
          ls`main+0x28
          ls`_start+0x7d
        1
  brk                                               
          libc.so.1`_brk_unlocked+0x15
          libc.so.1`sbrk+0x35
          libc.so.1`_morecore+0xfc
          libc.so.1`_malloc_unlocked+0x17f
          libc.so.1`_smalloc+0x50
          libc.so.1`_malloc_unlocked+0x209
          libc.so.1`malloc+0x35
          ls`xmalloc+0x14
          ls`xmemdup+0x16
          ls`xstrdup+0x21
          ls`gobble_file+0xa3c
          ls`print_dir+0x2d0
          ls`main+0x526
          ls`_start+0x7d
        1
  brk                                               
          libc.so.1`_brk_unlocked+0x15
          libc.so.1`sbrk+0x35
          libc.so.1`_morecore+0x2e
          libc.so.1`_malloc_unlocked+0x17f
          libc.so.1`malloc+0x35
          ls`xmalloc+0x14
          ls`main+0x35e
          ls`_start+0x7d
        1
  brk                                               
          libc.so.1`_brk_unlocked+0x15
          libc.so.1`sbrk+0x35
          libc.so.1`_morecore+0x2e
          libc.so.1`_malloc_unlocked+0x17f
          libc.so.1`_smalloc+0x50
          libc.so.1`_malloc_unlocked+0x209
          libc.so.1`malloc+0x35
          libc.so.1`strdup+0x26
          libc.so.1`expand_locale_name+0x469
          libc.so.1`setlocale+0x8ab
          ls`main+0x28
          ls`_start+0x7d
        1
  brk                                               
          libc.so.1`_brk_unlocked+0x15
          libc.so.1`sbrk+0x35
          libc.so.1`_morecore+0x2e
          libc.so.1`_malloc_unlocked+0x17f
          libc.so.1`_smalloc+0x50
          libc.so.1`_malloc_unlocked+0x209
          libc.so.1`malloc+0x35
          ls`xmalloc+0x14
          ls`xmemdup+0x16
          ls`xstrdup+0x21
          ls`gobble_file+0xa3c
          ls`print_dir+0x2d0
          ls`main+0x526
          ls`_start+0x7d
        1
  close                                             
          ld.so.1`__close+0x7
          ld.so.1`file_open+0x3f9
          ld.so.1`find_path+0x155
          ld.so.1`load_so+0xdd
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  close                                             
          ld.so.1`__close+0x7
          ld.so.1`file_open+0x3f9
          ld.so.1`_find_file+0x110
          ld.so.1`find_file+0x1f7
          ld.so.1`load_so+0x340
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`elf_needed+0x1b1
          ld.so.1`analyze_lmc+0xd3
          ld.so.1`dlmopen_core+0x1d0
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  close                                             
          libc.so.1`__close+0x15
          libc.so.1`fclose+0xaa
          ls`close_stream+0x2d
          ls`close_stdout+0xb0
          libc.so.1`_exithandle+0x63
          libc.so.1`exit+0x12
          ls`_start+0x7d
        1
  close                                             
          libc.so.1`__close+0x15
          libc.so.1`fclose+0xaa
          ls`close_stream+0x2d
          ls`close_stdout+0x15
          libc.so.1`_exithandle+0x63
          libc.so.1`exit+0x12
          ls`_start+0x7d
        1
  close                                             
          libc.so.1`__close+0x15
          libc.so.1`closedir+0x4d
          ls`print_dir+0x327
          ls`main+0x526
          ls`_start+0x7d
        1
  fcntl                                             
          libc.so.1`syscall+0x13
          libc.so.1`fcntl+0x104
          libc.so.1`fdopendir+0x5c
          libc.so.1`opendir+0x3f
          ls`print_dir+0x23
          ls`main+0x526
          ls`_start+0x7d
        1
  fstat64                                           
          libc.so.1`syscall+0x13
          libc.so.1`_findbuf+0x9c
          libc.so.1`_wrtchk+0x5d
          libc.so.1`_fwrite_unlocked+0x4a
          libc.so.1`fwrite+0x53
          ls`quote_name+0x107
          ls`print_name_with_quoting+0x151
          ls`print_file_name_and_frills+0x10a
          ls`print_current_files+0xb3
          ls`print_dir+0x47e
          ls`main+0x526
          ls`_start+0x7d
        1
  fstat64                                           
          libc.so.1`syscall+0x13
          libc.so.1`fdopendir+0x78
          libc.so.1`opendir+0x3f
          ls`print_dir+0x23
          ls`main+0x526
          ls`_start+0x7d
        1
  fstat64                                           
          libc.so.1`syscall+0x13
          libc.so.1`isseekable+0x4b
          libc.so.1`_setbufend+0x36
          libc.so.1`_findbuf+0x148
          libc.so.1`_wrtchk+0x5d
          libc.so.1`_fwrite_unlocked+0x4a
          libc.so.1`fwrite+0x53
          ls`quote_name+0x107
          ls`print_name_with_quoting+0x151
          ls`print_file_name_and_frills+0x10a
          ls`print_current_files+0xb3
          ls`print_dir+0x47e
          ls`main+0x526
          ls`_start+0x7d
        1
  fstat64                                           
          libc.so.1`syscall+0x13
          libc.so.1`isptsfd+0x40
          libc.so.1`xpg4_fixup+0x2e
          libc.so.1`__openat+0x32
          libc.so.1`openat+0x83
          libc.so.1`opendir+0x29
          ls`print_dir+0x23
          ls`main+0x526
          ls`_start+0x7d
        1
  getpid                                            
          libc.so.1`getpid+0x15
          ld.so.1`rt_thr_init+0x40
          ld.so.1`setup+0x15c2
          ld.so.1`_setup+0x310
          ld.so.1`_rt_boot+0x56
          0x8047b48
        1
  getrlimit                                         
          libc.so.1`getrlimit+0x15
          ld.so.1`rt_thr_init+0x40
          ld.so.1`setup+0x15c2
          ld.so.1`_setup+0x310
          ld.so.1`_rt_boot+0x56
          0x8047b48
        1
  ioctl                                             
          libc.so.1`ioctl+0x15
          libc.so.1`_findbuf+0x61
          libc.so.1`_wrtchk+0x5d
          libc.so.1`_fwrite_unlocked+0x4a
          libc.so.1`fwrite+0x53
          ls`quote_name+0x107
          ls`print_name_with_quoting+0x151
          ls`print_file_name_and_frills+0x10a
          ls`print_current_files+0xb3
          ls`print_dir+0x47e
          ls`main+0x526
          ls`_start+0x7d
        1
  ioctl                                             
          libc.so.1`ioctl+0x15
          ls`main+0x97
          ls`_start+0x7d
        1
  ioctl                                             
          libc.so.1`ioctl+0x15
          ls`decode_switches+0x29
          ls`main+0x97
          ls`_start+0x7d
        1
  memcntl                                           
          ld.so.1`memcntl+0x7
          ld.so.1`elf_new_lmp+0x10bb
          ld.so.1`load_file+0x170
          ld.so.1`load_so+0x491
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`elf_needed+0x1b1
          ld.so.1`analyze_lmc+0xd3
          ld.so.1`dlmopen_core+0x1d0
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  memcntl                                           
          ld.so.1`memcntl+0x7
          ld.so.1`elf_new_lmp+0x10bb
          ld.so.1`load_file+0x170
          ld.so.1`load_so+0x491
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  mmap                                              
          libc.so.1`__systemcall+0x6
          libc.so.1`lmalloc+0xd8
          libc.so.1`__tls_static_mods+0xb1
          ld.so.1`tls_statmod+0x1ca
          ld.so.1`setup+0x15a0
          ld.so.1`_setup+0x310
          ld.so.1`_rt_boot+0x56
          0x8047b48
        1
  mmap                                              
          ld.so.1`mmap+0x7
          ld.so.1`malloc+0x7e
          ld.so.1`calloc+0x25
          ld.so.1`elf_new_lmp+0x284
          ld.so.1`load_file+0x170
          ld.so.1`load_so+0x491
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  mmapobj                                           
          ld.so.1`mmapobj+0x7
          ld.so.1`file_open+0x3e1
          ld.so.1`find_path+0x155
          ld.so.1`load_so+0xdd
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  mmapobj                                           
          ld.so.1`mmapobj+0x7
          ld.so.1`file_open+0x3e1
          ld.so.1`_find_file+0x110
          ld.so.1`find_file+0x1f7
          ld.so.1`load_so+0x340
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`elf_needed+0x1b1
          ld.so.1`analyze_lmc+0xd3
          ld.so.1`dlmopen_core+0x1d0
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  open                                              
          ld.so.1`syscall+0x5
          ld.so.1`open+0x25
          ld.so.1`file_open+0x3be
          ld.so.1`find_path+0x155
          ld.so.1`load_so+0xdd
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  open                                              
          ld.so.1`syscall+0x5
          ld.so.1`open+0x25
          ld.so.1`file_open+0x3be
          ld.so.1`_find_file+0x110
          ld.so.1`find_file+0x1f7
          ld.so.1`load_so+0x340
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`elf_needed+0x1b1
          ld.so.1`analyze_lmc+0xd3
          ld.so.1`dlmopen_core+0x1d0
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  openat                                            
          libc.so.1`syscall+0x13
          libc.so.1`openat+0x83
          libc.so.1`opendir+0x29
          ls`print_dir+0x23
          ls`main+0x526
          ls`_start+0x7d
        1
  resolvepath                                       
          ld.so.1`resolvepath+0x7
          ld.so.1`find_path+0x155
          ld.so.1`load_so+0xdd
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  resolvepath                                       
          ld.so.1`resolvepath+0x7
          ld.so.1`_find_file+0x110
          ld.so.1`find_file+0x1f7
          ld.so.1`load_so+0x340
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`elf_needed+0x1b1
          ld.so.1`analyze_lmc+0xd3
          ld.so.1`dlmopen_core+0x1d0
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  rexit                                             
          libc.so.1`0xfee334e8
          ls`_start+0x7d
        1
  setcontext                                        
          libc.so.1`syscall+0x13
          libc.so.1`libc_init+0x3c8
          ld.so.1`rt_thr_init+0x40
          ld.so.1`setup+0x15c2
          ld.so.1`_setup+0x310
          ld.so.1`_rt_boot+0x56
          0x8047b48
        1
  setcontext                                        
          libc.so.1`__getcontext+0x19
          ld.so.1`rt_thr_init+0x40
          ld.so.1`setup+0x15c2
          ld.so.1`_setup+0x310
          ld.so.1`_rt_boot+0x56
          0x8047b48
        1
  stat64                                            
          ld.so.1`syscall+0x5
          ld.so.1`rtld_stat+0x29
          ld.so.1`file_open+0x100
          ld.so.1`find_path+0x155
          ld.so.1`load_so+0xdd
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  sysi86                                            
          libc.so.1`sysi86+0x15
          ls`_start+0x6e
        1
  write                                             
          libc.so.1`__write+0x15
          libc.so.1`_xflsbuf+0xb4
          libc.so.1`_fflush_u+0x37
          libc.so.1`fclose+0x80
          ls`close_stream+0x2d
          ls`close_stdout+0x15
          libc.so.1`_exithandle+0x63
          libc.so.1`exit+0x12
          ls`_start+0x7d
        1
  getdents64                                        
          libc.so.1`getdents64+0x15
          ls`print_dir+0x2a2
          ls`main+0x526
          ls`_start+0x7d
        2
  stat64                                            
          ld.so.1`syscall+0x5
          ld.so.1`rtld_stat+0x29
          ld.so.1`file_open+0x100
          ld.so.1`_find_file+0x110
          ld.so.1`find_file+0x1f7
          ld.so.1`load_so+0x340
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`elf_needed+0x1b1
          ld.so.1`analyze_lmc+0xd3
          ld.so.1`dlmopen_core+0x1d0
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        2

This output is pretty cool. Actually I take it back. This is downright awesome!
Within the time frame of a few minutes I was able to see the code-paths taken
to execute all the system calls needed to list a directory. This would have
taken weeks to fully understand, assuming I didn’t give up before then. It is
this level of introspection that makes DTrace a programmer’s wet dream.

Of course it would help to know in what order these system calls are happening.
And it is only a single line of code away:

pfexec dtrace -F -c 'ls' -n 'syscall::: /pid == $target/ {}'

The “=>” indicates that the system entered the system call and “<=” indicated
that the system exited it.

CPU FUNCTION                                 
  0  => mmap                                  
  0  <= mmap                                  
  0  => setcontext                            
  0  <= setcontext                            
  0  => getrlimit                             
  0  <= getrlimit                             
  0  => getpid                                
  0  <= getpid                                
  0  => setcontext                            
  0  <= setcontext                            
  0  => sysi86                                
  0  <= sysi86                                
  0  => brk                                   
  0  <= brk                                   
  0  => brk                                   
  0  <= brk                                   
  0  => stat64                                
  0  <= stat64                                
  0  => resolvepath                           
  0  <= resolvepath                           
  0  => open                                  
  0  <= open                                  
  0  => mmapobj                               
  0  <= mmapobj                               
  0  => close                                 
  0  <= close                                 
  0  => mmap                                  
  0  <= mmap                                  
  0  => memcntl                               
  0  <= memcntl                               
  0  => stat64                                
  0  <= stat64                                
  0  => stat64                                
  0  <= stat64                                
  0  => resolvepath                           
  0  <= resolvepath                           
  0  => open                                  
  0  <= open                                  
  0  => mmapobj                               
  0  <= mmapobj                               
  0  => close                                 
  0  <= close                                 
  0  => memcntl                               
  0  <= memcntl                               
  0  => ioctl                                 
  0  <= ioctl                                 
  0  => ioctl                                 
  0  <= ioctl                                 
  0  => brk                                   
  0  <= brk                                   
  0  => brk                                   
  0  <= brk                                   
  0  => openat                                
  0  <= openat                                
  0  => fstat64                               
  0  <= fstat64                               
  0  => fcntl                                 
  0  <= fcntl                                 
  0  => fstat64                               
  0  <= fstat64                               
  0  => getdents64                            
  0  <= getdents64                            
  0  => brk                                   
  0  <= brk                                   
  0  => brk                                   
  0  <= brk                                   
  0  => getdents64                            
  0  <= getdents64                            
  0  => close                                 
  0  <= close                                 
  0  => ioctl                                 
  0  <= ioctl                                 
  0  => fstat64                               
  0  <= fstat64                               
  0  => fstat64                               
  0  <= fstat64                               
  0  => write                                 
  0  <= write                                 
  0  => close                                 
  0  <= close                                 
  0  => close                                 
  0  <= close                                 
  0  => rexit

So, there are multiple occurrences of some system calls (mmap, brk, etc), and
it would be nice to see the stack traces in chronological order.

Another one-liner to the rescue:

pfexec dtrace -c 'ls' -n 'syscall:::entry /pid == $target/
    {trace(probefunc); ustack();}'

So the first thing ls does is call mmap. Likely the one in the second stack
trace from the top, which is a result of calling setlocale. Which begs the
question, what the heck is setlocale? Luckily it has its own manpage. As much
as I’d hate to admit it, if I hadn’t done this trace, it would have likely been
a long, long time before I even heard of setlocale, let alone use it.
Needless to say I studied the manpage to see if setlocale was of any interest
to me.

So my shiny new toy now makes certain system facilities more discoverable than
they were previously: merely a few command lines away, as opposed to involved
studying of the source, and having to create the stack traces mentally (which
isn’t difficult, if one has a brain, but is time consuming). Not that having
the source isn’t just as cool…

Looking at that stack trace also reveals something else of interest: ld.so.1,
which is used to load an ELF file has its own implementation of malloc and
calloc, based on mmap. It doesn’t use the libc version.

So, I asked my self, now what? Well, it may be a good idea to keep going system
call by system call, and take note of other implementation details of ls. Or
I can go deeper, rappel into the rabit hole, and see what’s up with libc.so
and ld.so.

After all, ls isn’t that interesting: it merely lists directory contents.
The ld shared object and the standard C library are where I saw the first
glimmer of magic. I should dig into mmap and see the kernel magic in all of its
coruscating brilliance.

So, ls patches into the kernel via mmap, after it calls setlocale. It would
be cool to see what functions the kernel executes to carry out an mmap. The
command to carry this out is slightly more involved, and is thus rolled up into
a script, mmap_fbt.d. We trace all the kernel functions triggered by
setlocale‘s call to mmap.

file mmap_fbt.d:

pid$target::setlocale:entry
{
    self->follow = 1;
}

syscall::mmap:entry
/self->follow == 1/
{
    self->follow = 2;
}

syscall::mmap:return
/self->follow == 2/
{
    self->follow = 0;
    exit(0);
}

fbt:::entry,
fbt:::return
/self->follow == 2/
{
    trace(probemod);
}

executing said file:

pfexec dtrace -c 'ls' -s mmap_fbt.d

The output of the above command is a rather detailed map of the kernel code
taken to execute mmap in this particular instance. At this point it may be
useful to pull up a copy of the Illumos source code. I did this on Illumos 147.

CPU FUNCTION                                 
  2  -> smmap32                                 genunix                          
  2    -> smmap_common                          genunix                          
  2      -> as_rangelock                        genunix                          
  2      <- as_rangelock                        genunix                          
  2      -> zmap                                genunix                          
  2        -> choose_addr                       genunix                          
  2          -> map_addr                        unix                             
  2            -> map_addr_proc                 unix                             
  2              -> as_gap_aligned              genunix                          
  2                -> avl_first                 genunix                          
  2                <- avl_first                 genunix                          
  2                -> as_findseg                genunix                          
  2                  -> avl_find                genunix                          
  2                    -> as_segcompar          genunix                          
  2                    <- as_segcompar          genunix                          
  2                  <- avl_find                genunix                          
  2                <- as_findseg                genunix                          
  2                -> avl_walk                  genunix                          
  2                <- avl_walk                  genunix                          
  2                -> avl_walk                  genunix                          
  2                <- avl_walk                  genunix                          
  2                -> valid_va_range_aligned    unix                             
  2                <- valid_va_range_aligned    unix                             
  2              <- as_gap_aligned              genunix                          
  2            <- map_addr_proc                 unix                             
  2          <- map_addr                        unix                             
  2        <- choose_addr                       genunix                          
  2        -> as_map                            genunix                          
  2          -> as_map_locked                   genunix                          
  2            -> gethrestime                   genunix                          
  2              -> pc_gethrestime              unix                             
  2                -> gethrtime                 genunix                          
  2                  -> tsc_gethrtime           unix                             
  2                  <- tsc_gethrtime           unix                             
  2                <- gethrtime                 genunix                          
  2              <- pc_gethrestime              unix                             
  2            <- gethrestime                   genunix                          
  2            -> as_map_ansegs                 genunix                          
  2              -> map_pgszcvec                unix                             
  2                -> map_szcvec                unix                             
  2                <- map_szcvec                unix                             
  2              <- map_pgszcvec                unix                             
  2              -> as_map_segvn_segs           genunix                          
  2                -> seg_alloc                 genunix                          
  2                  -> valid_va_range          unix                             
  2                    -> valid_va_range_aligned   unix                             
  2                    <- valid_va_range_aligned   unix                             
  2                  <- valid_va_range          unix                             
  2                  -> valid_usr_range         unix                             
  2                  <- valid_usr_range         unix                             
  2                  -> kmem_cache_alloc        genunix                          
  2                  <- kmem_cache_alloc        genunix                          
  2                  -> mutex_init              unix                             
  2                  <- mutex_init              unix                             
  2                  -> seg_attach              genunix                          
  2                    -> as_addseg             genunix                          
  2                      -> gethrestime         genunix                          
  2                        -> pc_gethrestime    unix                             
  2                          -> gethrtime       genunix                          
  2                            -> tsc_gethrtime   unix                             
  2                            <- tsc_gethrtime   unix                             
  2                          <- gethrtime       genunix                          
  2                        <- pc_gethrestime    unix                             
  2                      <- gethrestime         genunix                          
  2                      -> avl_walk            genunix                          
  2                      <- avl_walk            genunix                          
  2                      -> avl_insert_here     genunix                          
  2                        -> avl_insert        genunix                          
  2                        <- avl_insert        genunix                          
  2                      <- avl_insert_here     genunix                          
  2                    <- as_addseg             genunix                          
  2                  <- seg_attach              genunix                          
  2                <- seg_alloc                 genunix                          
  2                -> segvn_create              genunix                          
  2                  -> anon_resvmem            genunix                          
  2                    -> rctl_incr_swap        genunix                          
  2                    <- rctl_incr_swap        genunix                          
  2                  <- anon_resvmem            genunix                          
  2                  -> hat_map                 unix                             
  2                  <- hat_map                 unix                             
  2                  -> crhold                  genunix                          
  2                  <- crhold                  genunix                          
  2                  -> avl_walk                genunix                          
  2                  <- avl_walk                genunix                          
  2                  -> avl_walk                genunix                          
  2                  <- avl_walk                genunix                          
  2                  -> kmem_cache_alloc        genunix                          
  2                  <- kmem_cache_alloc        genunix                          
  2                  -> lgrp_privm_policy_set   unix                             
  2                    -> lgrp_mem_policy_default   unix                             
  2                    <- lgrp_mem_policy_default   unix                             
  2                  <- lgrp_privm_policy_set   unix                             
  2                <- segvn_create              genunix                          
  2              <- as_map_segvn_segs           genunix                          
  2            <- as_map_ansegs                 genunix                          
  2            -> as_setwatch                   genunix                          
  2              -> avl_numnodes                genunix                          
  2              <- avl_numnodes                genunix                          
  2            <- as_setwatch                   genunix                          
  2          <- as_map_locked                   genunix                          
  2        <- as_map                            genunix                          
  2      <- zmap                                genunix                          
  2      -> as_rangeunlock                      genunix                          
  2        -> cv_signal                         genunix                          
  2        <- cv_signal                         genunix                          
  2      <- as_rangeunlock                      genunix                          
  2    <- smmap_common                          genunix                          
  2  <- smmap32                                 genunix

It’s also a very good idea to generate an index of all the source files like so:

ls -R illumos_gate > ig-ix

In the above output, the kernel is alternating between two modules: unix and
genunix. Genunix contains all the functions that are platform independent,
while unix contains functions that are platform specific. For example,
pc_gethrestime is probably unique to the x86 version of the kernel. As can be
seen, generic code calls non-generic code, and vice versa, all the time. I
think this is an important point. Most undergrad curicula present the design of
software as the layering of one body of code on top of another, as opposed to a
web of functions or modules calling each other at various points of execution.

Tangent 1: Dogmatically layering abstractions on top of each other, because
it’s The Right Way(tm), is a very bad habit to get into, and is an example of
defensive programming. Layering often results in needlessly deep call-stacks,
and code that is not very cache coherent. The kernel needs to be portable, but
it also needs to be fast. The kernel engineers accomplished this by using
conditional compilation, instead of aggressive layering. In fact, in the last
decade or so, the kernel team has been collapsing layers to both improve
performance and relialibility; witness ZFS. Also, Zones are a method of improving the perfomance of virtualization by
removing unneccessary layers (like the CPU emulator).

Tangent 2: To go on a tanget to this tangent, I did some stack traces on
Mercurial, a python application. In this particular instance, I ran
hg log and saved the stack traces to a file. You’ll notice that the default
stack depth of dtrace’s ustack action is too small, and I had to expand it to
150. These stacks are very, very deep. We’re monitoring an application that is
layered on top of the Python VM. Which goes to show that layering can lead to
the problems mentioned above. Not that the Mercurial code is bad; in fact, it
is a very high-quality and one of my favourite revision control systems. It already
performs quite well. I’m just demonstrating that layering results in deep call
stacks which are potentially problematic for some programs.

Here’s the one-liner used on hg log:

pfexec dtrace -c 'hg log' -n 'syscall:::entry /pid == $target/
    {@[ustack(150)] = count();}'

Here’s one of the smaller stack-traces from hg log:

          libc.so.1`__read+0x15
          libc.so.1`_filbuf+0xd3
          libc.so.1`fread+0x118
          libpython2.6.so.1.0`Py_UniversalNewlineFread+0x160
          libpython2.6.so.1.0`file_read+0xeb
          libpython2.6.so.1.0`PyCFunction_Call+0x19d
          libpython2.6.so.1.0`call_function+0x3e6
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`function_call+0x190
          libpython2.6.so.1.0`PyObject_Call+0x67
          libpython2.6.so.1.0`PyObject_CallFunctionObjArgs+0x3d
          libpython2.6.so.1.0`slot_tp_descr_get+0x93
          libpython2.6.so.1.0`PyObject_GenericGetAttr+0x1e7
          libpython2.6.so.1.0`PyObject_GetAttr+0x96
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x27a4
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`fast_function+0x174
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`fast_function+0x174
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`gen_send_ex+0xbf
          libpython2.6.so.1.0`gen_iternext+0x18
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x1fec
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`function_call+0x13d
          libpython2.6.so.1.0`PyObject_Call+0x67
          libpython2.6.so.1.0`ext_do_call+0x156
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2d88
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`function_call+0x13d
          libpython2.6.so.1.0`PyObject_Call+0x67
          libpython2.6.so.1.0`ext_do_call+0x156
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2d88
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`fast_function+0x174
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`fast_function+0x174
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`fast_function+0x174
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`fast_function+0x174
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`PyEval_EvalCode+0x35
          libpython2.6.so.1.0`run_mod+0x3d
          libpython2.6.so.1.0`PyRun_FileExFlags+0x6e
          libpython2.6.so.1.0`PyRun_SimpleFileExFlags+0x18a
          libpython2.6.so.1.0`PyRun_AnyFileExFlags+0x71
          libpython2.6.so.1.0`Py_Main+0xa9a
          isapython2.6`main+0x66
          isapython2.6`_start+0x7d
           77

Anyway, apologies for the tangents, but it’s a topic I feel strongly about.
Back to the main plot.

Where does one go from here? Study the AVL tree implementation of the kernel?
Read the assembly responsible for memory mapping? Dive into gethrestime,
which would probably lead us into the cyclics subsystem? Perhaps, one should
dig into the kernel memory allocator, entered through kmem_cache_alloc? There
are many starting points, for the budding Illumos kernel engineers out there.

The main contention was whether or not I ought to study the mechanics of mmap
via synthesis or decomposition. Seeing as how I tend to write code,
predominantly, in a bottom-up style, it seems most natural to explore mmap via
synthesis. Also, the way these blog posts are written is an example of
synthesis (translation: I’m winging them).

The inner-most functions in the stack trace are, in chronological order:

  • as_rangelock
  • avl_first
  • as_segcompar
  • avl_walk
  • valid_va_range_aligned
  • tsc_gethrtime
  • map_szcvec
  • valid_usr_range
  • kmem_cache_alloc
  • mutex_init
  • avl_insert
  • rctl_incr_swap
  • hat_map
  • crhold
  • lgrp_mem_policy_default
  • avl_numnodes
  • cv_signal

It seems that all functions that are prefixed with as_ are functions related
to address space management. The block comments in as.h are good introductory
material to the structures used by these functions (and should be read prior to
continuing). Starting with as_rangelock. It’s implementation is in
vm_as.c:

/*
 * Serialize all searches for holes in an address space to
 * prevent two or more threads from allocating the same virtual
 * address range.  The address space must not be "read/write"
 * locked by the caller since we may block.
 */
void
as_rangelock(struct as *as)
{
    mutex_enter(&as->a_contents);
    while (AS_ISCLAIMGAP(as))
        cv_wait(&as->a_cv, &as->a_contents);
    AS_SETCLAIMGAP(as);
    mutex_exit(&as->a_contents);
}

The first thing I noticed was that mutex_enter and mutex_exit, didn’t
appear in the dtrace output. I suspect this is due to function inlining.
There’s a quick way to check, using mdb, the modular debugger.

Run this:

pfexec mdb -k

A command prompt should pop up.

This disassembles the function.

> as_rangelock::dis
as_rangelock:                   pushq  %rbp
as_rangelock+1:                 movq   %rsp,%rbp
as_rangelock+4:                 subq   $0x8,%rsp
as_rangelock+8:                 movq   %rdi,-0x8(%rbp)
as_rangelock+0xc:               pushq  %r12
as_rangelock+0xe:               pushq  %r13
as_rangelock+0x10:              subq   $0x8,%rsp
as_rangelock+0x14:              movq   %rdi,%r13
as_rangelock+0x17:              call   -0x2e46b4        <mutex_enter>
as_rangelock+0x1c:              movzbl 0x8(%r13),%eax
as_rangelock+0x21:              testl  $0x40,%eax
as_rangelock+0x26:              je     +0x1b    <as_rangelock+0x43>
as_rangelock+0x28:              leaq   0xa(%r13),%r12
as_rangelock+0x2c:              movq   %r12,%rdi
as_rangelock+0x2f:              movq   %r13,%rsi
as_rangelock+0x32:              call   -0x175667        <cv_wait>
as_rangelock+0x37:              movzbl 0x8(%r13),%eax
as_rangelock+0x3c:              testl  $0x40,%eax
as_rangelock+0x41:              jne    -0x17    <as_rangelock+0x2c>
as_rangelock+0x43:              orl    $0x40,%eax
as_rangelock+0x46:              movb   %al,0x8(%r13)
as_rangelock+0x4a:              movq   %r13,%rdi
as_rangelock+0x4d:              call   -0x2e45ca        <mutex_exit>
as_rangelock+0x52:              addq   $0x8,%rsp
as_rangelock+0x56:              popq   %r13
as_rangelock+0x58:              popq   %r12
as_rangelock+0x5a:              leave  
as_rangelock+0x5b:              ret    
> ::quit

So, suprisingly, the calls and symbols are there. It’s just that DTrace doesn’t
trace them. It turns out, that while DTrace does match mutex_enter and
mutex_exit as valid fbt probes on Illumos 147, it doesn’t trace them
because the lock primitives aren’t instrumented.

Though, it would appear, that the DTrace creators didn’t intend for the probe
to be matched, as indicated in chapter 3, page 64 of the dtrace book.

Also, for those who don’t typically read x86 assembly, reading the above output
would be made easier with an introductory article to x86 assembly and a listing of all x86 instructions.

So mutex_enter claims a mutex, and mutex_exit releases it. Claiming a mutex
creates a mutual exclusion over some part of memory, so that
only one thread can write to that part of memory, while other threads wait
(“block” is a common synonym for wait).

When we do the mutex_enter we are locking down some of the fields of the
address space structure. Then as_rangelock proceeds make a claim over the
address space, if it isn’t already claimed. If it is already claimed,
as_rangelock waits (via cv_wait) for the claim to be withdrawn, and
then claims it, but releases the original mutex. As can be seen in the DTrace
output, we didn’t have to wait.

As is indicated in as.h the address-space structures and meta-data are stored
in an avl tree, and this is visible in the dtrace output.
Some as_ functions make calls to avl_ functions. What’s even more
interesting is that avl_find makes a call to as_segcompar; avl_find
probably takes a callback that it uses for comparison purposes. We’ll find out
later.

The only other as_ function that’s an innermost function is as_segcompar
(according to DTrace).

/*
 * compar segments (or just an address) by segment address range
 */
static int
as_segcompar(const void *x, const void *y)
{
    struct seg *a = (struct seg *)x;
    struct seg *b = (struct seg *)y;

    if (a->s_base < b->s_base)
        return (-1);
    if (a->s_base >= b->s_base + b->s_size)
        return (1);
    return (0);
}

And this turns out to be true. Though I’m not sure why ‘compare’ is misspelled.
I suspect that this is historical, and that the developers of the original unix
were motivited by similar conditions that motivated the unfortunate misspelling
of the creat system call. I’m sure they had really good reasons to do this —
but typeing and reading creat still irks the crap out of me.

So it’s just a simple comparison function.

That’s all for the inner-most as_ functions. Onto avl_ functions.
avl_first is located in usr/src/common/avl.c. Before continuing, give
avl.h a glance.

/*
 * Return the lowest valued node in a tree or NULL.
 * (leftmost child from root of tree)
 */
void *
avl_first(avl_tree_t *tree)
{
    avl_node_t *node;
    avl_node_t *prev = NULL;
    size_t off = tree->avl_offset;

    for (node = tree->avl_root;
        node != NULL;
        node = node->avl_child[0])
        prev = node;

    if (prev != NULL)
        return (AVL_NODE2DATA(prev, off));
    return (NULL);
}

Once one understands the structures and macros in avl.h, the above should be
self explanatory.

Next up, avl_walk.

/*
 * Walk from one node to the previous valued node (ie. an infix walk
 * towards the left). At any given node we do one of 2 things:
 *
 * - If there is a left child, go to it, then to it's rightmost
 * descendant.
 *
 * - otherwise we return thru parent nodes until we've come from a
 * right child.
 *
 * Return Value:
 * NULL - if at the end of the nodes
 * otherwise next node
 */
void *
avl_walk(avl_tree_t *tree, void *oldnode, int left)
{
    size_t off = tree->avl_offset;
    avl_node_t *node = AVL_DATA2NODE(oldnode, off);
    int right = 1 - left;
    int was_child;


    /*
     * nowhere to walk to if tree is empty
     */
    if (node == NULL)
        return (NULL);

    /*
     * Visit the previous valued node. There are two possibilities:
     *
     * If this node has a left child, go down one left, then all
     * the way right.
     */
    if (node->avl_child[left] != NULL) {
        for (node = node->avl_child[left];
            node->avl_child[right] != NULL;
            node = node->avl_child[right])
            ;
    /*
     * Otherwise, return thru left children as far as we can.
     */
    } else {
        for (;;) {
            was_child = AVL_XCHILD(node);
            node = AVL_XPARENT(node);
            if (node == NULL)
                return (NULL);
            if (was_child == right)
                break;
        }
    }

    return (AVL_NODE2DATA(node, off));
}

So this is slightly more involved, though it should be intuitive to those
familiar with AVL Trees. Basically avl_walk attempts to find the biggest
value that is less than the current node we’re in (i.e. void *oldnode).

Continueing with avl_insert. Quickly glossing over the code reveals that
avl_insert isn’t an inner-most function as it might call avl_rotation.

Either way, avl_rotation rotates and rebalances the tree, using new_balance
to calculate how to rotate it. avl_insert merely adds data to the tree as a
new leaf, and then call avl_rotation if a rebalancing is required.

avl_numnodes merely returns the number of nodes in the tree. Accordingly,
it’s a very short function.

/*
 * Return the number of nodes in an AVL tree.
 */
ulong_t
avl_numnodes(avl_tree_t *tree)
{
    ASSERT(tree);
    return (tree->avl_numnodes);
}

So that covers the innermost avl_ and as_ functions. Let’s move up one
layer of functions and have a look at avl_find.

/*
 * Search for the node which contains "value".  The algorithm is a
 * simple binary tree search.
 *
 * return value:
 *      NULL: the value is not in the AVL tree
 *              *where (if not NULL)  is set to indicate the insertion
 *              point
 *      "void *"  of the found tree node
 */
void *
avl_find(avl_tree_t *tree, const void *value, avl_index_t *where)
{
    avl_node_t *node;
    avl_node_t *prev = NULL;
    int child = 0;
    int diff;
    size_t off = tree->avl_offset;

    for (node = tree->avl_root; node != NULL;
        node = node->avl_child[child]) {

        prev = node;

        diff = tree->avl_compar(value, AVL_NODE2DATA(node, off));
        ASSERT(-1 <= diff && diff <= 1);
        if (diff == 0) {
#ifdef DEBUG
            if (where != NULL)
                *where = 0;
#endif
            return (AVL_NODE2DATA(node, off));
        }
        child = avl_balance2child[1 + diff];

    }

    if (where != NULL)
        *where = AVL_MKINDEX(prev, child);

    return (NULL);
}

As can be seen inside the for-loop, we do have a user-defined comparison
function, stored in the tree structure. In the case of the as_ functions,
it’s as_segcompar. Also, this misspelling of ‘compare’ seems to be
pathologically effecting the Illumos code base. Sad day.

And on and on it goes. DTrace a syscall, find the bottom-level functions,
understand them, move up one level, synthesize, repeat.

That’s my modus operandi whenever I want to understand a system call, some part
of the kernel, or even a user land application. It’s not enough to ask what a
function does. I care about how various components interact with each other.

Most people seem content starting from main and working their way down to the
depths of the kernel/app. That’s only because main provides a convenient
starting point in the absence of DTrace (which apparently works wonderfully as
a code-mapper). DTrace gives the developer the ability to see the interplay
between functions, between modules, and between processes.

Sythesizing how a program works is more efficient (IMHO), because one has less
mental state to keep track of and does less mental context switching between
functions. Try it some time.

I’m sure most people think DTrace is exclusively a performance analysis tool,
which is what it was meant to be used as, but I hope I’ve demonstrated that it
is also extremely well suited for edificatory purposes.

Now, go forth and DTrace.

Advertisements

4 thoughts on “Adventures of a DTrace Addict: Part 0

  1. Just fixed some inaccuracies regarding the disassembly of as_rangelock via mdb. Apparently DTrace won’t trace mutex_enter inside as_rangelock for some reason (and it’s not function inlining). Also, added some links and fixed spelling errors.

  2. Thanks to richlowe, on #illumos, I now have a technical explanation as to why mutex_enter is left untraced.

  3. Howdy! This article couldn’t be written much better! Looking at this post reminds me of my previous roommate!
    He always kept preaching about this. I most certainly will
    forward this post to him. Fairly certain he’s
    going to have a very good read. Thanks for sharing!

  4. I’m glad you liked the article! It sounds like your roommate and I would get along well!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s