<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Nick Zivkovic&#039;s Blog</title>
	<atom:link href="http://nickziv.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://nickziv.wordpress.com</link>
	<description>The journeys of a shift-key challenged unix desperado</description>
	<lastBuildDate>Tue, 31 Jan 2012 19:56:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='nickziv.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/8902ebb49cddb2b49d0ae95be7dac1bc?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Nick Zivkovic&#039;s Blog</title>
		<link>http://nickziv.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://nickziv.wordpress.com/osd.xml" title="Nick Zivkovic&#039;s Blog" />
	<atom:link rel='hub' href='http://nickziv.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Lex Provider Update</title>
		<link>http://nickziv.wordpress.com/2011/12/28/lex-provider-update/</link>
		<comments>http://nickziv.wordpress.com/2011/12/28/lex-provider-update/#comments</comments>
		<pubDate>Wed, 28 Dec 2011 18:56:53 +0000</pubDate>
		<dc:creator>nick zivkovic</dc:creator>
				<category><![CDATA[DTrace]]></category>
		<category><![CDATA[Illumos]]></category>
		<category><![CDATA[dtrace]]></category>
		<category><![CDATA[illumos]]></category>
		<category><![CDATA[lex source]]></category>
		<category><![CDATA[lexical analyzer]]></category>
		<category><![CDATA[regular expression]]></category>
		<category><![CDATA[token id]]></category>

		<guid isPermaLink="false">http://nickziv.wordpress.com/?p=209</guid>
		<description><![CDATA[A while ago, I wrote about the DTrace provider I made for lex, the lexical analyzer that ships with Illumos. I added one new probe to it, that solves a simple but particularly vexing issue: tracing and aggregating on the regular expression used to match a token instead of the actual token or the returned [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nickziv.wordpress.com&amp;blog=15835448&amp;post=209&amp;subd=nickziv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A while ago, I wrote about the DTrace provider I made for <code>lex</code>, the lexical<br />
analyzer that ships with Illumos.</p>
<p>I added one new probe to it, that solves a simple but particularly vexing<br />
issue: tracing and aggregating on the regular expression used to match a token<br />
instead of the actual token or the returned token-id. This new probe is called<br />
<code>match_regex</code>.</p>
<p>The returned token-id (value returned by yylex()) would only be adequate in<br />
identifying the most commonly matched regular expressions, if <code>yylex()</code><br />
necessarily returned on <em>every single match</em>. It doesn&#8217;t.</p>
<p>The two closest ways to achieve what the new probe achieves, is by (1)<br />
recording the final state that occurred before a match, and then digging<br />
through the generated <code>lex.yy.c</code> file to see what regular expression that state<br />
correlates to, and (2) using the <code>action</code> probe to see where in the switch<br />
statement we are jumping to. (2) No longer works, as the <code>action</code> probe has<br />
been removed (it&#8217;s redundant with <code>match_regex</code>, and less helpful).</p>
<p>The first method is convoluted and inelegant. The second method isn&#8217;t so bad,<br />
and is actually remarkably similar to <code>match_regex</code>. With <code>action</code>, one can go<br />
to the specific case within the switch-statement that gets exectuded. and find<br />
that <code>lex</code> has helpfully placed a <code># line</code> macro, with a line number that<br />
matches the line of the regular expression in the <code>.lex</code> source file. Using<br />
this information, one could find which regular expressions are being matched,<br />
without depending on the return value of <code>yylex()</code>.</p>
<p>The actual <code>match_regex</code> probe, allows one the luxury of not having to juggle<br />
two files (the <code>.lex</code> source file and the generated <code>lex.yy.c</code> file), and to<br />
instead focus on just one (the <code>.lex</code> source file) by giving the line number of<br />
the defined regular expression as its first argument. And, should one have<br />
access to the generated <code>lex.yy.c</code> file, but <em>not</em> the <code>.lex</code> file, one can use<br />
the second argument to get the line number of the probe itself (it&#8217;s located<br />
within the switch statement). </p>
<p>Now a quick demo.</p>
<p>Here is a simple lexer (<code>.lex</code>) for a calculator:</p>
<pre><code>#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;

%%

[0-9]+          { return (0); }
"-"[0-9]+       { return (1); }
"+"             { return (2); }
"-"             { return (3); }
"*"             { return (4); }
"/"             { return (5); }

%%

int
main(int ac, char **av)
{

    ++av; --ac;
    if (ac &gt; 0) {
        yyin = fopen(av[0], "r");
    } else {
        yyin = stdin;
    }

    while (!feof(yyin)) {
        yylex();
    }

}
</code></pre>
<p>Here is an input file for the lexer to process:</p>
<pre><code>2 + 2
-2 + 2
3 * 90
60 / 5
4321 - 1234
+ 2 2 3 8 9
- 9 8 1
</code></pre>
<p>A simple <code>dtrace</code> invocation that runs the lexer, and indicates which regular<br />
expressions were matched the most:</p>
<pre><code>pfexec dtrace -c './calc_lex calc_in' -n 'lex$target:::match_regex {@[arg0] = count();}'

 7                1
10                1
11                1
 9                2
 8                3
 6               17
</code></pre>
<p>A slightly different <code>dtrace</code> invocation that is the same as the previous one,<br />
but indicates the part of the switch statement we jump to when we match a<br />
regex:</p>
<pre><code>pfexec dtrace -c './calc_lex calc_in' -n 'lex$target:::match_regex {@[arg1] = count();}'

10                1
13                1
14                1
12                2
11                3
99               17
</code></pre>
<p>Admittedly, this is less useful than the previous invocation, but it&#8217;s better<br />
than nothing, and I&#8217;ve seen repositories where people mindlessly include just<br />
the generated file, but leave the <code>.lex</code> file out.</p>
<p>Not being able to quickly find the most commonly matched regexes was a major<br />
deficiency of the initial iteration of this provider. Only two lines of code<br />
were added (since the last iteration) to the lex source code (which is now on<br />
github) to make this happen. (See the diffs for sub1.c, for details).</p>
<p>I&#8217;ve only used <code>lex</code> a dozen times or so, thus far, but the new provider has<br />
dramatically sped up the compile-test-debug cycles, by allowing one to ask very<br />
specific questions and use DTrace constructs like aggregations. Ultimately the<br />
amount of times one has to recompile is reduced, and the volume of debug-output<br />
one has to burrow through is microscopic compared to the dump of debug-printf&#8217;s<br />
that <code>lex</code> uses when <code>#ifdef LEXDEBUG</code> is true. Also, one can use the lex<br />
provider in conjunction with other providers offered by Illumos and<br />
DTrace-aware processes.</p>
<p>The modified code is located in my own <a href="https://github.org/nickziv/illumos-joyent">branch</a> of illumos-joyent, on<br />
Github.</p>
<p>Happy lexing.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nickziv.wordpress.com/209/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nickziv.wordpress.com/209/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/nickziv.wordpress.com/209/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/nickziv.wordpress.com/209/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/nickziv.wordpress.com/209/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/nickziv.wordpress.com/209/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/nickziv.wordpress.com/209/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/nickziv.wordpress.com/209/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/nickziv.wordpress.com/209/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/nickziv.wordpress.com/209/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/nickziv.wordpress.com/209/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/nickziv.wordpress.com/209/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/nickziv.wordpress.com/209/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/nickziv.wordpress.com/209/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nickziv.wordpress.com&amp;blog=15835448&amp;post=209&amp;subd=nickziv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nickziv.wordpress.com/2011/12/28/lex-provider-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ae77a9245fb476328a04af02a009f2a8?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">nickzivkovic</media:title>
		</media:content>
	</item>
		<item>
		<title>Random DTrace Tip: You Can&#8217;t Trace sbrk Because It&#8217;s Not A Syscall</title>
		<link>http://nickziv.wordpress.com/2011/11/26/random-dtrace-tip-you-cant-trace-sbrk-because-its-not-a-syscall/</link>
		<comments>http://nickziv.wordpress.com/2011/11/26/random-dtrace-tip-you-cant-trace-sbrk-because-its-not-a-syscall/#comments</comments>
		<pubDate>Sat, 26 Nov 2011 00:22:43 +0000</pubDate>
		<dc:creator>nick zivkovic</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://nickziv.wordpress.com/?p=201</guid>
		<description><![CDATA[The DTrace syscall provider is one of the most useful (and most used) providers. Typically, people use the syscall provider to log and aggregate any subset (or the entire set) of system calls made by an application. For instance, dtrace -n 'syscall::brk:entry {@[arg0] = count();}' will trace all the brk system calls made, and count [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nickziv.wordpress.com&amp;blog=15835448&amp;post=201&amp;subd=nickziv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The DTrace <code>syscall</code> provider is one of the most useful (and most used)<br />
providers. Typically, people use the <code>syscall</code> provider to log and aggregate<br />
any subset (or the entire set) of system calls made by an application.</p>
<p>For instance,</p>
<pre><code>dtrace -n 'syscall::brk:entry {@[arg0] = count();}'
</code></pre>
<p>will trace all the <code>brk</code> system calls made, and count the number of times that<br />
an argument was passed to it.</p>
<p>However,</p>
<pre><code>dtrace -n 'syscall::sbrk:entry {@[arg0] = count();}'
</code></pre>
<p>will <em>not</em> work.</p>
<p>This is because, while <code>brk()</code> and <code>sbrk()</code> are valid Unix interfaces, used to<br />
modify the size of the calling process&#8217;s data segment, they aren&#8217;t system<br />
calls. They are functions in the standard C library that comes with Illumos (or<br />
some other DTrace-enhanced system), that wrap around a system call.</p>
<p>In our case, Illumos only supports a system call that is identified as <code>brk</code>.</p>
<pre><code>] dtrace -l -n 'syscall:::entry' | grep brk
72078   syscall                     brk entry
</code></pre>
<p>As it turns out, both the <code>sbrk()</code> function and <code>brk()</code> function are<br />
implemented in terms of the <code>brk</code> system call.</p>
<pre><code>] dtrace -n 'sycall::brk:entry {@[ustack()] = count();}'
...SNIP...
libc.so.1`_brk_unlocked+0xa
libc.so.1`sbrk+0x2e
libc.so.1`_morecore+0x119
libc.so.1`_malloc_unlocked+0x189
libc.so.1`malloc+0x2e
libc.so.1`_findbuf+0x84
libc.so.1`_ndoprnt+0x91
libc.so.1`vfprintf+0x9f
dtrace`oprintf+0xa2
dtrace`main+0x1607
dtrace`0x4034bc
  1
...SNIP...
</code></pre>
<p>If you want to get the argument that&#8217;s passed to <code>sbrk()</code> you&#8217;ll have to use<br />
the <code>pid</code> provider, like so:</p>
<pre><code>] dtrace -p $PID -n 'pid$target::sbrk:entry {@[arg0] = count();}'
</code></pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nickziv.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nickziv.wordpress.com/201/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/nickziv.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/nickziv.wordpress.com/201/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/nickziv.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/nickziv.wordpress.com/201/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/nickziv.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/nickziv.wordpress.com/201/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/nickziv.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/nickziv.wordpress.com/201/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/nickziv.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/nickziv.wordpress.com/201/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/nickziv.wordpress.com/201/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/nickziv.wordpress.com/201/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nickziv.wordpress.com&amp;blog=15835448&amp;post=201&amp;subd=nickziv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nickziv.wordpress.com/2011/11/26/random-dtrace-tip-you-cant-trace-sbrk-because-its-not-a-syscall/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ae77a9245fb476328a04af02a009f2a8?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">nickzivkovic</media:title>
		</media:content>
	</item>
		<item>
		<title>Introducing plan: The Time Management System for Illumos Propeller Heads</title>
		<link>http://nickziv.wordpress.com/2011/10/10/introducing-plan-the-time-management-system-for-illumos-propeller-heads/</link>
		<comments>http://nickziv.wordpress.com/2011/10/10/introducing-plan-the-time-management-system-for-illumos-propeller-heads/#comments</comments>
		<pubDate>Mon, 10 Oct 2011 05:49:43 +0000</pubDate>
		<dc:creator>nick zivkovic</dc:creator>
				<category><![CDATA[Illumos]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cli]]></category>
		<category><![CDATA[command line]]></category>
		<category><![CDATA[illumos]]></category>
		<category><![CDATA[planning]]></category>
		<category><![CDATA[plans]]></category>
		<category><![CDATA[time management]]></category>
		<category><![CDATA[time management system]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://nickziv.wordpress.com/?p=169</guid>
		<description><![CDATA[Using the idea of resource allocation, I've created <code>plan</code>, a nifty command line application that automatically schedules activities based on how long they last, and when they should start. A truly automated digital replacement for the much-hated daily planners. Other digital planners just digitize the tedium of laying out your day. Replace your daily planners! Unix style.
<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nickziv.wordpress.com&amp;blog=15835448&amp;post=169&amp;subd=nickziv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><em>This entry is still a work in progress, will change often.</em></p>
<p><strong>I have always found that planning is useless, but plans are indispensable.</strong><br />
-Dwight D. Eisenhower</p>
<p><em>Using the idea of resource allocation, I&#8217;ve created <code>plan</code>, a nifty command<br />
line application that automatically schedules activities based on how long they<br />
last, and when they should start. A truly automated digital replacement for the<br />
much-hated daily planners. Other digital planners just digitize the tedium of<br />
laying out your day. Replace your daily planners! Unix style.</em></p>
<p>Recently I came to the morbid realization that I had to manage my time, if I<br />
wanted to get anything (besides programming) done. Upon realizing this most<br />
unfortunate truth, I cried a little, found some graph paper, and shaded out the<br />
grid, so that my weekly schedule would be apparent upon glimpsing at it. Sadly,<br />
there was significant resistance from my stubborn circadian rhythm, which<br />
wouldn&#8217;t allow me to wake up quite as early as I would have liked.</p>
<p>Instead of re-shading another piece of graph paper, I gave in and tried using<br />
Google Calendar. It&#8217;s a very cute web application. Where &#8220;cute&#8221; is a euphemism<br />
for &#8220;my grandma might like it&#8221;. And yet, despite its simplistic<br />
point-click-drag paradigm, it was far too manual for my taste. But it got the<br />
job the done, and I was <em>actually managing my time</em>. I was a living oxymoron.</p>
<p>Unfortunately, I have problems. More specifically, I have problems with the<br />
implication of having a web-based time management system. The implication is<br />
that time management is useful if you have an internet connection, and<br />
unnecessary if you don&#8217;t. No connection, no Google Calendar, no time<br />
management. And my connection isn&#8217;t exactly persistent because I also have<br />
problems with my university&#8217;s IT department.</p>
<p>I could have found a local TMS. But that presents another problem, endemic<br />
to all of today&#8217;s TMSs. Namely repetition. Modern TMSs present the user with a<br />
calendar that spans infinitely (practically speaking) into the future, and<br />
allows the user to mark it up with a point-and-click interface. The first<br />
problem is the fact that you have to use the mouse. Compared to typing on the<br />
keyboard, the mouse is dog slow, and its use should be reserved for very<br />
specific activities like photo editing. The second problem is that these TMSs<br />
don&#8217;t differentiate between <em>activities</em> that have a starting time and a<br />
duration, and <em>todos</em> that merely have a due time. The last problem is that<br />
these TMSs fail to take into account is that most of our activities follow a<br />
repetative weekly routine, but diverge on certain dates. For example, in<br />
college, my class schedule is weekly, but my Calculus class is cancled on<br />
11/09/19 (I wish).</p>
<p>Also, my problem with graphical applications (and thus web applications), is<br />
that they are not integrated with Unix. Graphical applications can&#8217;t be piped<br />
into <code>grep</code> or <code>awk</code>. Sure they can be scripted (look at Firefox), but these<br />
scripts likely aren&#8217;t reuseable with other applications (or even with other<br />
scripts in the same application). Scriptable graphical applications create<br />
walled gardens, hinder code reuse, and generally <em>waste my time</em> by forcing me<br />
to be redundant and manual.  And besides, I don&#8217;t know of any local graphical<br />
TMS that is scriptable.</p>
<p>The Unix command line is perfection. And I won&#8217;t settle for anything less.</p>
<p>In the name of truly efficient time management, I&#8217;ve created a command line<br />
application, named <code>plan</code>. First, I&#8217;ll demonstrate what it&#8217;s capable of, then<br />
I&#8217;ll go into its implementation.</p>
<h2>Introduction and Demonstration</h2>
<p>It&#8217;s a single command that does the following:<br />
 * allocates activities on a per-day and per-date basis<br />
 * establishes todo&#8217;s on a per-day and per-date basis</p>
<p>When an activity is allocated on, say, Sunday, it is understood that it repeats<br />
every Sunday. However when that activity is allocated on (11-09-18), it is<br />
understood that it only ever occurs on (11-09-18). The same applies for todos.</p>
<p><strong>Creating and Destroying Activities and Todos</strong></p>
<p>We can create activities and todos either on a given weekday, or on a<br />
particular date.</p>
<p>Here we create an activity on Monday, called <code>class_calc</code>.</p>
<pre><code>#plan create Mon/class_calc
</code></pre>
<p>As you can see, day names can be abbreviated. &#8216;Monday&#8217;, &#8216;Mon&#8217;, &#8216;monday&#8217;, and<br />
&#8216;mon&#8217;, all refer to the same week day.</p>
<p>We&#8217;ll also create a todo called hw_calc, which also occurs on every Monday.</p>
<pre><code>#plan create Mon/@hw_calc
</code></pre>
<p>Todos are always prefixed with an ampersand (@).</p>
<p>Also, we can create a general todo, which is a todo that has no specific day,<br />
date, or time.</p>
<pre><code>#plan create @go_skydiving
</code></pre>
<p>The general gist of general todos is &#8220;I have to do this anytime between now,<br />
and the day I die&#8221;.</p>
<p>For the sake of populating more than just one weekday, we&#8217;ll also create an<br />
activity on Tuesday called <code>class_db</code> (my databases class):</p>
<pre><code>#plan create Tues/class_db
</code></pre>
<p>Creating a random activity on a specific date:</p>
<pre><code>#plan create 11-10-09/random_activity
</code></pre>
<p>We can also specify the previous command as:</p>
<pre><code>#plan create 11-Oct-09/random_activity
</code></pre>
<p>or:</p>
<pre><code>#plan create 2011-Oct-09/random_activity
</code></pre>
<p>and so on.</p>
<p>To destroy an activity or todo, we replace <code>create</code> with <code>destroy</code> in the above<br />
commands.</p>
<p><strong>Listing Activities and Todos</strong></p>
<p>First, let&#8217;s list all of the activities that occur on Monday.</p>
<pre><code>#plan list -a Monday
(0/1440)
NAME                    DYN    TIME     DUR
class_calc             true     N/A  00h00m
</code></pre>
<p>The <code>-a</code> specifies that we want to list activities.</p>
<p>So, we have one activity, class_calc.</p>
<p>The first line of output indicates the ratio of allocated minutes to available<br />
minutes on this particular day. The second column, <code>DYN</code> may be enigmatic at<br />
first. It&#8217;s a boolean that indicated whether or not an activity is dynamic. A<br />
dynamic activity has a duration, but no explicit starting time. Dynamic<br />
activities have their starting time assigned by <code>plan</code>.</p>
<p>The third column, indicates that the starting time is not yet assigned. The<br />
fourth one indicates that the duration is zero. When an activity has no<br />
duration, it is effectively unallocated.</p>
<p>Now, let&#8217;s list all of the activities that occur over a generic week.</p>
<pre><code>#plan list -a week
Monday
(0/1440)
NAME                    DYN    TIME     DUR
class_calc             true     N/A  00h00m

Tuesday
(0/1440)
NAME                    DYN    TIME     DUR
class_db               true     N/A  00h00m
</code></pre>
<p>So that&#8217;s pretty neat, we can see our weekly schedule.</p>
<p>But it gets better. We can also display the weekly schedule for the <em>current</em><br />
week which will, replace week days that diverge from the weekly routine, as a<br />
result of creating an activity on a date that this week overlaps with. As you<br />
can see this week diverges from our generic weekly routine (namely, Sunday has<br />
a random activity).</p>
<pre><code>#plan list -a this_week
Sunday (2011-10-09)
(0/1440)
NAME                    DYN    TIME     DUR
random_activity        true     N/A  00h00m

Monday (2011-10-10)
(0/1440)
NAME                    DYN    TIME     DUR
class_calc             true     N/A  00h00m

Tuesday (2011-10-11)
(0/1440)
NAME                    DYN    TIME     DUR
class_db               true     N/A  00h00m
</code></pre>
<p>Sometimes, we may want to display the schedule (or todos) for next week, in<br />
which case may do the following. The next week is the same as any generic week.</p>
<pre><code>#plan list -a next_week
Monday (2011-10-17)
(0/1440)
NAME                    DYN    TIME     DUR
class_calc             true     N/A  00h00m

Tuesday (2011-10-18)
(0/1440)
NAME                    DYN    TIME     DUR
class_db               true     N/A  00h00m
</code></pre>
<p>We can view todos in much the same fashion.</p>
<pre><code>#plan list -t week
NAME                   TIME
hw_calc               00:00
#plan list -t this_week
NAME                   TIME
hw_calc               00:00
#plan list -t next_week
NAME                   TIME
hw_calc               00:00
</code></pre>
<p>Since we have not created any todos that on a particular date, instead of a<br />
week day, the three above commands share the same output.</p>
<p><strong>Durations, Starting Times, and Awake Times</strong></p>
<p>Let&#8217;s attempt to put a starting time on <code>class_calc</code>.</p>
<pre><code>#plan set time=08:35 mon/class_calc
Monday: Can't set time on activity class_calc. Activity doesn't have duration.
</code></pre>
<p>What you&#8217;re seeing now is an error. <code>plan</code> won&#8217;t set a starting time on an<br />
activity if it has a duration of 0 minutes. <code>plan</code> essentially allocates time<br />
to activities and resolves conflicts between activities. We just asked it to<br />
allocate <em>nothing</em>, which is bogus (think <code>umem_alloc(0, flag)</code>, which always<br />
returns NULL).</p>
<p>We&#8217;ll have to set the duration first.</p>
<pre><code>#plan set duration=01h15m mon/class_calc
#plan list -a mon
(95/1440)
NAME                    DYN    TIME     DUR
class_calc             true   00:00  01h35m
</code></pre>
<p>So class_calc now has a duration and starts at 0:00 (12:00 am), because that&#8217;s<br />
the earliest time at which it can possibly start.</p>
<p>However, we can change that, by setting the <code>awake</code> property for Monday.</p>
<pre><code>#plan set awake=07:00,16h00m mon
#plan list -a mon
(95/960)
NAME                    DYN    TIME     DUR
class_calc             true   07:00  01h35m
</code></pre>
<p>Setting a wake-up time changes when the day starts. If the day doesn&#8217;t start<br />
before 07:00, it makes no sense to have the class_calc activity start at 00:00 (as<br />
we&#8217;re asleep). So <code>plan</code> automatically adjusted it to start at the earliest<br />
possible time.</p>
<p>Now let&#8217;s try to set the starting time again.</p>
<pre><code>#plan set time=08:35 mon/class_calc
#plan list -a mon
(95/960)
NAME                    DYN    TIME     DUR
class_calc            false   08:35  01h35m
</code></pre>
<p>Yay, it worked. And now, the class_calc activity is no longer dynamic.  Which<br />
means that it can&#8217;t be automatically moved to a different block of time if<br />
there is a conflict.</p>
<p>Let&#8217;s test this. What happens if we try to set the awake property to one hour<br />
after my Calculus 2 class starts?</p>
<pre><code>#plan set awake=09:35,16h00m mon
Monday: Activity class_calc can't fit in the alotted time
#plan list -a mon
(95/960)
NAME                    DYN    TIME     DUR
class_calc            false   08:35  01h35m
</code></pre>
<p>As you can see, we can&#8217;t set the awake time on Mondays to 9:35, because <code>class_calc</code><br />
starts an hour before that! And because <code>class_calc</code> is not dynamic, we can&#8217;t<br />
move it.  As the list command indicates, we didn&#8217;t change anything. The<br />
<code>16h00m</code> indicates that this day lasts for 16 hours. There are limitations:<br />
waking up at 13:00, would prevent you from having a day that lasts for more<br />
than 11 hours.  The reason this limitation exists, is that even though I can<br />
implement a day that starts at 13:00 and lasts 16 hours, I wouldn&#8217;t be able to<br />
wrap an activity, that lasts say 3 hours, and starts at 23:00, around the end<br />
of the day, due to an implementation detail. And besides, that would encourage<br />
the wrong kind of behaviour!</p>
<p>Sometimes allocating time to an activity will fail because there isn&#8217;t any free<br />
block of time that is big enough for the activity. If this activity is, say<br />
01h30m (and it doesn&#8217;t need to be contiguous), we can always split it into 2 or<br />
three chunks like so:</p>
<pre><code>#plan set duration=00h45m*2 [[activity-name]]
</code></pre>
<p>or</p>
<pre><code>#plan set duration=00h30m*3 [[activity-name]]
</code></pre>
<p>Note that a chunked activity can&#8217;t have an explicit starting time.</p>
<p><strong>Renaming Activities and Todos</strong></p>
<p>So, <code>class_calc</code> represents my Calculus 2 class. But I think <code>class_calc2</code><br />
would be a more apt name for it. I can rename it using the <code>rename</code> subcommand.</p>
<pre><code>#plan rename mon/class_calc mon/class_calc2
#plan list -a mon
(95/960)
NAME                    DYN    TIME     DUR
class_calc2           false   08:35  01h35m
</code></pre>
<p><strong>My Weekly Schedule</strong></p>
<p>Here&#8217;s the weekly schedule that I follow. It was entirely generated by <code>plan</code>.<br />
I must say, that I&#8217;m far more productive now than I&#8217;ve ever been.</p>
<pre><code>Sunday
(990/1080)
NAME                    DYN    TIME     DUR
shower                false   06:00  00h30m
study_calc             true   06:30  02h00m
study_algs             true   08:30  03h00m
brunch                false   11:45  00h30m
study_db               true   12:15  03h00m
study_calc             true   15:15  01h00m
dinner                false   16:30  00h30m
prj_zaps               true   17:00  02h00m
study_csoc             true   19:00  01h00m
study_phys             true   20:00  03h00m

Monday
(890/1080)
NAME                    DYN    TIME     DUR
shower                false   06:00  00h30m
prj_zaps               true   06:30  01h00m
breakfast             false   07:30  00h30m
class_calc2           false   08:35  01h35m
prj_zaps               true   10:10  01h00m
class_algs            false   11:25  01h15m
lunch                 false   12:40  00h30m
study_phys             true   13:10  01h30m
study_db               true   14:40  01h30m
dinner                false   16:30  00h30m
study_algs             true   17:00  01h30m
study_calc             true   18:30  01h30m
prj_zaps               true   20:00  02h00m

Tuesday
(850/1080)
NAME                    DYN    TIME     DUR
shower                false   06:00  00h30m
prj_zaps               true   06:30  01h00m
breakfast             false   07:30  00h30m
study_phys             true   08:00  01h30m
class_db              false   10:00  01h15m
lunch                 false   11:20  00h30m
study_algs             true   11:50  01h30m
class_phys            false   13:50  01h15m
prj_zaps               true   15:05  01h00m
dinner                false   16:30  00h30m
prj_zaps               true   17:00  00h30m
study_calc             true   17:30  00h45m
lab_phys              false   18:25  02h40m
study_calc             true   21:05  00h45m

Wednesday
(865/1080)
NAME                    DYN    TIME     DUR
shower                false   06:00  00h30m
study_phys             true   06:30  00h45m
breakfast             false   07:30  00h30m
study_db               true   08:00  00h30m
class_calc            false   08:35  01h15m
study_algs             true   09:50  01h30m
class_algs            false   11:25  01h15m
lunch                 false   12:40  00h30m
study_db               true   13:10  00h30m
lab_calc              false   13:50  01h15m
study_phys             true   15:05  00h45m
study_db               true   15:50  00h30m
dinner                false   16:30  00h30m
study_calc             true   17:00  00h45m
class_csoc            false   18:25  02h40m
study_calc             true   21:05  00h45m

Thursday
(930/1080)
NAME                    DYN    TIME     DUR
shower                false   06:00  00h30m
prj_yacc               true   06:30  00h30m
breakfast             false   07:00  00h30m
prj_zaps               true   07:30  02h00m
prj_yacc               true   09:30  00h30m
class_db              false   10:00  01h15m
lunch                 false   11:20  00h30m
study_db               true   11:50  01h30m
prj_yacc               true   13:20  00h30m
class_phys            false   13:50  01h15m
prj_zaps               true   15:05  01h00m
dinner                false   16:30  00h30m
study_phys             true   17:00  01h30m
study_algs             true   18:30  01h30m
study_calc             true   20:00  01h30m
prj_yacc               true   21:30  00h30m

Friday
(1020/1080)
NAME                    DYN    TIME     DUR
shower                false   06:00  00h30m
prj_yacc               true   06:30  01h00m
breakfast             false   07:30  00h30m
prj_yacc               true   08:00  00h30m
class_calc            false   08:35  01h15m
study_calc             true   09:50  01h30m
class_algs            false   11:25  01h15m
lunch                 false   12:40  00h30m
study_algs             true   13:10  01h30m
prj_yacc               true   14:40  01h00m
prj_zaps               true   15:40  00h30m
dinner                false   16:30  00h30m
study_calc2            true   17:00  01h30m
study_phys             true   18:30  01h30m
study_db               true   20:00  01h30m
prj_zaps               true   21:30  02h00m

Saturday
(930/1080)
NAME                    DYN    TIME     DUR
shower                false   06:00  00h30m
study_calc             true   06:30  02h00m
study_algs             true   08:30  03h00m
brunch                false   11:45  00h30m
study_phys             true   12:15  03h00m
study_calc             true   15:15  01h00m
dinner                false   16:30  00h30m
study_csoc             true   17:00  01h30m
study_db               true   18:30  03h00m
prj_zaps               true   21:30  00h30m
</code></pre>
<p><strong>Eating Children</strong></p>
<p><code>plan</code> is still very much being field-tested by me. I&#8217;m making the code<br />
available, but chances are, it <em>will</em> eat your children. I&#8217;d appreciate it if<br />
bugs were reported to the github <a href="https://github.com/nickziv/plan/issues">issue tracker</a>.</p>
<h2>Implementation Overview</h2>
<p>And now onto the technical part: the implementation. You may have taken notice<br />
that I&#8217;ve referred to <code>plan</code> as a <em>time allocator</em>. This is because time is a<br />
resource that is allocated in a way that is very similar to the way we allocate<br />
memory (a la interfaces such as <code>malloc</code>). We allocate a contiguous block of<br />
time just as we would allocate a contiguous block of memory (we use a best-fit<br />
algorithm to find a block of time that is free, and allocate it). Similarly,<br />
just as the memory allocation process is susceptible to fragmentation, so is<br />
the time allocation process.</p>
<p>Most operating systems textbooks demonstrate the principals of memory<br />
allocation, but they do not indicate that memory allocation is a subset of<br />
resource allocation. If I showed you how a resource allocator worked, you&#8217;d say<br />
that it looks very much like a simplistic memory allocator. You&#8217;d be right. Any<br />
resource that can be allocated as a block of integers (like process id&#8217;s,<br />
device id&#8217;s, virtual addresses) could be allocated with a resource allocator.<br />
Today you could implement a <code>malloc</code> interface using a couple dozen invocations<br />
of a resource allocator&#8217;s functions and some system calls.</p>
<p>This only goes to show that modern memory allocators are far, far more complex<br />
than the toy allocators found in systems programming texts. Modern allocators<br />
use very advanced concepts like slabs, magazines, and cache-coloring to<br />
maximize performance, and minimize fragmentation. In other words memory<br />
allocation is a special case of resource allocation.</p>
<p>Instead writing a new resource allocator , I&#8217;ve decided to make use of the<br />
existing resource allocator that is used as the back-end of <code>libumem</code> called<br />
<code>vmem</code>. I could&#8217;ve used any resource allocator (like the ubiquitous <code>rmalloc</code>),<br />
but seeing as how I use <code>libumem</code> for memory allocation anyway, <code>vmem</code> seemed<br />
like the apparent choice.</p>
<p>So as you can probably tell, <code>plan</code> won&#8217;t be able to run on any systems that<br />
don&#8217;t support, <code>libumem</code>. Fortunately, someone went through the trouble of<br />
porting <code>libumem</code> over to a bunch of platforms.</p>
<p>The other important aspect I have to mention is data-storage. Apparently we<br />
have to save the schedules we make. Most applications would use an on-disk<br />
format, such as a plain-text file, or an embedded database, like SQLite.</p>
<p>Since I use ZFS on all of my machines, a database is automatically a<br />
non-starter, because as they are modified, the data get fragmented. This is not<br />
ZFS&#8217;s fault. The databases were designed with 1980&#8242;s era filesystems (which<br />
were not COW or resilient) and data (which was rather small, relative to today)<br />
in mind.</p>
<p>Also, databases love to checksum stuff, which is unnecessary as ZFS already<br />
checksums all of the data. There is also a bunch of other overhead and<br />
redundancy associated with databases, which makes them the wrong choice for<br />
this particular problem.</p>
<p>The other option (plain text file) has to be parsed (using <code>lex</code> and <code>yacc</code> if<br />
the format is complex enough).</p>
<p>I love parsing, but I love efficiency even more. Efficiency in both the<br />
clock-cycle sense, and the programmer-productivity sense. In this case, I<br />
decided to settle on a middle ground between text-file and database. In case I<br />
didn&#8217;t make it obvious enough, I&#8217;m a die-hard unix developer. Unix has this<br />
great thing called a hierarchical file system. It&#8217;s great for representing any<br />
kind of hierarchical data. Unix also snagged the nifty concept of extended file<br />
attributes from BeOS. Manipulating directories and extended attributes is very<br />
easy, using system calls. By storing certain properties such as time and<br />
duration in distinct extended attributes, I have a random-access interface to<br />
these values (which are stored in their original binary representation). I<br />
won&#8217;t make any claims to the superior time-efficiency of this solution compared<br />
to the other two, but it sure is more convenient, as I didn&#8217;t have to roll a<br />
custom parser (and serializer) or write single line of SQL code. Also, since<br />
the data is structured at the filesystem level I didn&#8217;t have to use any tree<br />
structures to hold the data efficiently in memory. The most significant<br />
structures I use are an array, strings, an enum, and structures that hold<br />
information about activities and todos.</p>
<p>So, to use <code>plan</code> you need <em>at least</em> <code>libumem</code> and Illumos extended attributes<br />
(which are accessed via my new favourite system call: <code>openat()</code>). And it&#8217;s<br />
preferable to have ZFS.</p>
<p>If you think <code>plan</code> would be useful on platforms that don&#8217;t meet the above<br />
prerequisites, grab the code and port it.</p>
<p>And that&#8217;s the implementation overview. If you want more concrete details have<br />
a look at the <a href="https://github.com/nickziv/plan">code</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nickziv.wordpress.com/169/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nickziv.wordpress.com/169/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/nickziv.wordpress.com/169/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/nickziv.wordpress.com/169/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/nickziv.wordpress.com/169/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/nickziv.wordpress.com/169/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/nickziv.wordpress.com/169/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/nickziv.wordpress.com/169/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/nickziv.wordpress.com/169/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/nickziv.wordpress.com/169/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/nickziv.wordpress.com/169/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/nickziv.wordpress.com/169/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/nickziv.wordpress.com/169/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/nickziv.wordpress.com/169/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nickziv.wordpress.com&amp;blog=15835448&amp;post=169&amp;subd=nickziv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nickziv.wordpress.com/2011/10/10/introducing-plan-the-time-management-system-for-illumos-propeller-heads/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ae77a9245fb476328a04af02a009f2a8?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">nickzivkovic</media:title>
		</media:content>
	</item>
		<item>
		<title>Adventures of a DTrace Addict: Part 1.0</title>
		<link>http://nickziv.wordpress.com/2011/07/17/adventures-of-a-dtrace-addict-part-1-0/</link>
		<comments>http://nickziv.wordpress.com/2011/07/17/adventures-of-a-dtrace-addict-part-1-0/#comments</comments>
		<pubDate>Sun, 17 Jul 2011 00:40:59 +0000</pubDate>
		<dc:creator>nick zivkovic</dc:creator>
				<category><![CDATA[DTrace]]></category>
		<category><![CDATA[dtrace]]></category>
		<category><![CDATA[illumos]]></category>
		<category><![CDATA[lex]]></category>
		<category><![CDATA[lexer]]></category>
		<category><![CDATA[lexical analysis]]></category>
		<category><![CDATA[usdt]]></category>

		<guid isPermaLink="false">http://nickziv.wordpress.com/?p=103</guid>
		<description><![CDATA[<p>Lex and Yacc are two tools that have aided Unix developers in writing
compilers for decades. Yet, these tools are lacking in basic debugging
facilities. This is my attempt to add some DTrace magic to Lex and Yacc.</p>
<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nickziv.wordpress.com&amp;blog=15835448&amp;post=103&amp;subd=nickziv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h2>Lex, Yacc, and Embedding DTrace Probes </h2>
<p><strong>The purpose of life is to do whatever turns you on.</strong> -me</p>
<p><em>Lex and Yacc are two tools that have aided Unix developers in writing<br />
compilers for decades. Yet, these tools are lacking in basic debugging<br />
facilities. This is my attempt to add some DTrace magic to Lex and Yacc.</em></p>
<p>Two unix tools that have always fascinated me were <a href="http://en.wikipedia.org/wiki/Lex_(software)">lex</a> and <a href="http://en.wikipedia.org/wiki/Yacc">yacc</a>. Admittedly, it doesn&#8217;t take much to fascinate me; in fact, I often<br />
get distracted by shiny stuff.</p>
<p>I&#8217;ve always wanted to understand how <code>lex</code> and <code>yacc</code> work, and I&#8217;ve wanted to<br />
be able to observe what they&#8217;re doing in real time. Not for any particular<br />
reason, aside from the fact that observability turns me on (I&#8217;m like that).</p>
<p>Come to think of it, <code>lex</code> and <code>yacc</code>, despite being staples of parsing on unix<br />
systems, don&#8217;t have any kind of significant instrumentation at all. For<br />
decades, these tools have been nothing more but generators of magical black<br />
boxes. These black boxes are used all around unix &#8212; even DTrace uses lex+yacc<br />
powered parsing for its D language &#8212; but no one can <em>actually</em> see what the<br />
generated code is doing (short of tracing the program instruction for<br />
instruction, which is, well, complex).</p>
<p>Apparently they work, and the users and creators of these tools did just fine<br />
without these newfangled kernel-based instrumentation framerworks going around.<br />
In fact, it&#8217;s questionable whether or not instrumentation would even be<br />
helpful. Let&#8217;s face it: these tools were designed by people smarter than blokes<br />
like us. So maybe this is a matter that should be left alone&#8230;</p>
<p>But, there&#8217;s still that&#8230; itch. That notion that keeps nagging at you while<br />
you&#8217;re awake. Sometimes even when you&#8217;re asleep (ever dream about DTracing some<br />
cool app; I do all the time). &#8220;My. I wonder if baking in some DTrace probes in<br />
<code>$APP_FOO</code> would make it better&#8230; nah.&#8221; It&#8217;s like a mosquito buzzing about<br />
your ear in the middle of the night, waking you up periodically.  Maybe you<br />
turn, you try to ignore it, but at some point your baggy, blood shot eyes shoot<br />
open, and BAM! One blow, swift and impulsive. The nuissence is gone, the itch<br />
has been scratched, now you can sleep well at night. Temporarily at least, you<br />
will dream like normal people do. And then it starts all over again&#8230;</p>
<p>At any rate, I spent a few sleepless summer nights, fueled by a chaotic<br />
enthusiasm, tracing and instrumenting <code>lex</code> (which, by the way, matches a bunch<br />
of input tokens, using regular expressions, and either returns token values or<br />
does some other specified action). Do a</p>
<pre><code>man lex
</code></pre>
<p>for the full details.</p>
<p>Don&#8217;t get me wrong. I&#8217;d love to hold your hand and walk you through this but<br />
who has the time? Not me, certainly. Callous, perhaps, but there you go.</p>
<p>I used the first example in the man page as a test bed:</p>
<pre><code>%{
/* need this for the call to atof() below */
#include &lt;math.h&gt;
/* need this for printf(), fopen() and stdin below */
#include &lt;stdio.h&gt;
%}

DIGIT    [0-9]
ID       [a-z][a-z0-9]*
%%

{DIGIT}+                          {
    printf("An integer: %s (%d)\n", yytext,
    atoi(yytext));
    }

{DIGIT}+"."{DIGIT}*    {
    printf("A float: %s (%g)\n", yytext,
    atof(yytext));
    }

if|then|begin|end|procedure|function        {
    printf("A keyword: %s\n", yytext);
    }

{ID}                   printf("An identifier: %s\n", yytext);

"+"|"-"|"*"|"/"        printf("An operator: %s\n", yytext);

"{"[^}\n]*"}"         /* eat up one-line comments */

[ \t\n]+               /* eat up white space */

.                      printf("Unrecognized character: %s\n", yytext);

%%

int main(int argc, char *argv[])
{
    ++argv, --argc;  /* skip over program name */
    if (argc &gt; 0)
        yyin = fopen(argv[0], "r");
    else
        yyin = stdin;

    yylex();
}
</code></pre>
<p>I began by tracing just the function calls made in the binary itself and<br />
avoided any other module (like libc).</p>
<pre><code>pfexec dtrace -F -c './lexy' -n
    'pid$target:lexy::entry, pid$target:lexy::return {}'
</code></pre>
<p>As can be seen below, everything is wrapped in a big function called <code>yylex</code>,<br />
where &#8220;everything&#8221; refers to the large number of <code>yylook</code> functions.<br />
Unfortunately, <code>yylook</code> takes no arguments and returns no values (I looked at<br />
the source code). We&#8217;ll have to drill a little deeper to see what functions are<br />
getting called in <code>yylook</code>. Either way simply using the pid provider is<br />
insufficient to see the inner workings of <code>lex</code>.</p>
<pre><code>A float: 9.9 (9.9)
An integer: 10 (10)
An integer: 5 (5)
A float: 3.333 (3.333)
CPU FUNCTION
  0  -&gt; _start
  0    -&gt; __fsr
  0    &lt;- __fsr
  0    -&gt; main
  0      -&gt; yylex
  0        -&gt; yylook
  0        &lt;- yylook
  0        -&gt; yylook
  0        &lt;- yylook
  0        -&gt; yylook
  0        &lt;- yylook
  0        -&gt; yylook
  0        &lt;- yylook
  0        -&gt; yylook
  0        &lt;- yylook
  0        -&gt; yylook
  0        &lt;- yylook
  0        -&gt; yylook
  0        &lt;- yylook
  0        -&gt; yylook
  0        &lt;- yylook
  0        -&gt; yylook
  0        &lt;- yylook
  0      &lt;- yylex
  0    &lt;- main
</code></pre>
<p>In order to get meaningful data about the lexing process, I would have to embed<br />
semantically meaningful probes into the generated <code>lex</code> program. The most<br />
straight-forward way of doing this is to add the probes into the actions<br />
specified for each match. (<strong>UPDATE:</strong> See <a href="http://blogs.oracle.com/barts/entry/putting_user_defined_dtrace_probe">this post</a>, <a href="http://dtrace.org/blogs/ahl/2006/05/08/user-land-tracing-gets-better-and-better/">this<br />
post</a>, and <a href="http://dtrace.org/blogs/dap/2009/05/04/anatomy-of-a-dtrace-usdt-provider/">this post</a> about usdt probes.)</p>
<p>That would be kind of cool, but kind of trivial at the same time. The next<br />
logical step would be to actually look at <code>yylook</code>, and probe the snot out of<br />
it. This would be much cooler considering that <code>lex</code> is really just a state<br />
machine, and I would able to see the states <code>lex</code> goes through as it lexes the<br />
input. </p>
<p>Of course, manually modifying the lex-generated program is, well, manual. So I<br />
did what any lazy, self-respecting computer scientist would do: I modified the<br />
<code>lex</code> program to generate a DTrace source file and an instrumented lexer. All<br />
that needs to be done is the linking at build time. I added a &#8216;-D&#8217; option to<br />
<code>lex</code> which toggles the generation of a probed lexer plus the D source file.<br />
It&#8217;s in my git repo available <a href="https://github.org/nickziv/illumos-joyent">here</a>. Keep in mind, that <code>lex</code><br />
is part of Illumos, and you&#8217;d have to do a bit of dance involving some scripts<br />
in order get the new and improved lex bits. See the install notes at the at of<br />
this post. Yeah, it&#8217;s a pain. But it&#8217;s well worth it.</p>
<p>Unless.</p>
<p>Unless, of course, you weren&#8217;t one of us. Unless, you were but a mere muggle<br />
watching, awe-struck and slack-jawed as a wizard puts on a show, demonstrating<br />
the wickedness of his magic; making his caprice take its course.</p>
<p>To the wizardly, the inconvenience is worth it. Because magic is an end in and<br />
of itself. Muggles, however, wish to leverage magic to their own low-brow,<br />
pointy-haired purposes.</p>
<p>Enough about muggles. We&#8217;re wizards. And all wizards have a craving for one<br />
thing: data.</p>
<p>Data, data, data. Embedded DTrace probes now allowed for far more data to be<br />
collected during the lexing process. I&#8217;ll quickly demonstrate what one can do<br />
with the new and improved <code>lex</code>, using a quick-and-dirty lexer for the C<br />
programming language.</p>
<p>Here&#8217;s the code for the C lexer:</p>
<pre><code>/* C lexer */
%a 1000
%p 2000
%{
#include "y.tab.h"
/*
 * Used for debugging purposes...
 */
int
indicator()
{
    return (0);
}

/*
 * We define a set of labled states for use in the C lexer, and then
 * some regex's for neatness.
 *
 * The states are:
 *
 * S0 - C code that is not a comment, a string, or a character.
 * S1 - C comment blocks
 * S2 - C line comment
 */
%}

%s S0 S1 S2

RGX_WS          [\f\r\t\v ]*
RGX_WS_DT       [\f\n\r\t\v ]
RGX_NL          \n+
/* C Pre Proc */
RGX_INCL        "#"{RGX_WS}"include"
RGX_INCLLIB     "&lt;"{RGX_PATH}"&gt;"
RGX_INCLUSR     \"{RGX_PATH}\"
RGX_DEF         "#"{RGX_WS}"define"
RGX_UNDEF       "#"{RGX_WS}"undef"
RGX_MACIF       "#"{RGX_WS}"if"
RGX_MACELSE     "#"{RGX_WS}"else"
RGX_MACELIF     "#"{RGX_WS}"elif"
RGX_MACENDIF    "#"{RGX_WS}"endif"
RGX_IFDEF       "#"{RGX_WS}"ifdef"
RGX_IFNDEF      "#"{RGX_WS}"ifndef"
RGX_PRAGMA      "#"{RGX_WS}"pragma"
RGX_WARNING     "#"{RGX_WS}"warning"
RGX_ERROR       "#"{RGX_WS}"error"
RGX_LINE        "#"{RGX_WS}"line"
RGX_LINCONT     \\
/* C Num Constants */
RGX_DIGIT       [0-9]
RGX_INT         {RGX_DIGIT}+
RGX_FLOAT       {RGX_DIGIT}+"."{RGX_DIGIT}+
RGX_HEX         0x[0-9a-fA-F]+
/* C Types */
RGX_TCHAR       char
RGX_TINT        int
RGX_TFLOAT      float
RGX_TDBL        double
RGX_TLONG       long
RGX_TSHORT      short
RGX_SIGNED      signed
RGX_UNSIGNED    unsigned
RGX_EXT         extern
RGX_STAT        static
RGX_TYPEDEF     typedef
RGX_STRUCT      struct
RGX_ENUM        enum
RGX_UNI         union
RGX_SO          sizeof
/* C Control Flow*/
RGX_IF          "if"
RGX_THEN        "then"
RGX_ELSE        "else"
RGX_GOTO        "goto"
RGX_RET         "return"
RGX_SWITCH      "switch"
RGX_CASE        "case"
RGX_DEFAULT     "default"
RGX_BREAK       "break"
RGX_CONTIN      "continue"
RGX_LABLE       [a-zA-Z0-9]+:
RGX_DELIM       ";"
RGX_BEGIN       "{"
RGX_END         "}"
RGX_PARENO      "("
RGX_PARENC      ")"
RGX_COMMA       ","
/* C Operators */
RGX_MBR         "."
RGX_PMBR        "-&gt;"
RGX_INC         "++"
RGX_DEC         "--"
RGX_MULEQ       "*="
RGX_MULDEREF    "*"
RGX_DIVEQ       "/="
RGX_DIV         "/"
RGX_ADDEQ       "+="
RGX_ADD         "+"
RGX_MINEQ       "-="
RGX_MIN         "-"
RGX_MODEQ       "%="
RGX_MOD         "%"
RGX_TRTHEN      "?"
RGX_TRELSE      ":"
RGX_LAND        "&amp;&amp;"
RGX_LOR         "||"
RGX_LEQ         "=="
RGX_NEQ         "!="
RGX_GTE         "&gt;="
RGX_GT          "&gt;"
RGX_LTE         "&lt;="
RGX_LT          "&lt;"
RGX_BAND        "&amp;"
RGX_BOR         "|"
RGX_XOR         "^"
RGX_NOT         "!"
RGX_RSH         "&gt;&gt;"
RGX_LSH         "&lt;&lt;"
RGX_EQ          "="
RGX_ONECMP      "~"
/* C String Constants */
RGX_WORD        [a-zA-Z0-9]+
RGX_NAME        [a-zA-Z0-9\_]+
RGX_PATH        [a-zA-Z0-9/\-\_\.]+
/* RGX_ASCII    [`-=~!@#$%^&amp;*()_+[]\;',./{}|:"&lt;&gt;?a-zA-Z0-9]+ */
RGX_STR         ([^"\\\n]|\\[^"\n]|\\\")*
RGX_CHR         ([^'\\\n]|\\[^'\n]|\\')*
RGX_CMT         .|\n
%{
%}

%%

&lt;S0&gt;{RGX_NL}            { ; }
&lt;S2&gt;{RGX_NL}            { BEGIN(S0); }
&lt;S0&gt;"/*"                { BEGIN(S1); }
&lt;S1&gt;"*/"                { BEGIN(S0); }
&lt;S1&gt;{RGX_CMT}           { ; }
&lt;S2&gt;{RGX_CMT}           { ; }
&lt;S0&gt;"//"                { BEGIN(S2); }
&lt;S0&gt;{RGX_INCL}          { return (TOK_INCL); }
&lt;S0&gt;{RGX_DEF}           { return (TOK_DEF); }
&lt;S0&gt;{RGX_UNDEF}         { return (TOK_UNDEF); }
&lt;S0&gt;{RGX_MACIF}         { return (TOK_MACIF); }
&lt;S0&gt;{RGX_MACELSE}       { return (TOK_MACELSE); }
&lt;S0&gt;{RGX_MACELIF}       { return (TOK_MACELIF); }
&lt;S0&gt;{RGX_MACENDIF}      { return (TOK_MACENDIF); }
&lt;S0&gt;{RGX_IFDEF}         { return (TOK_IFDEF); }
&lt;S0&gt;{RGX_IFNDEF}        { return (TOK_IFNDEF); }
&lt;S0&gt;{RGX_PRAGMA}        { return (TOK_PRAGMA); }
&lt;S0&gt;{RGX_WARNING}       { return (TOK_WARNING); }
&lt;S0&gt;{RGX_ERROR}         { return (TOK_ERROR); }
&lt;S0&gt;{RGX_LINE}          { return (TOK_LINE); }
&lt;S0&gt;{RGX_LINCONT}       { return (TOK_LINCONT); }
&lt;S0&gt;\&lt;{RGX_PATH}\&gt;      { return (TOK_INCLIB); }
&lt;S0&gt;\"{RGX_PATH}\"      { return (TOK_INCUSR); }
&lt;S0&gt;{RGX_DIGIT}         { return (TOK_DIGIT); }
&lt;S0&gt;{RGX_INT}           { return (TOK_INT); }
&lt;S0&gt;{RGX_FLOAT}         { return (TOK_FLOAT); }
&lt;S0&gt;{RGX_HEX}           { return (TOK_HEX); }
&lt;S0&gt;{RGX_TCHAR}         { return (TOK_TCHAR); }
&lt;S0&gt;{RGX_TINT}          { return (TOK_TINT); }
&lt;S0&gt;{RGX_TFLOAT}        { return (TOK_TFLOAT); }
&lt;S0&gt;{RGX_TDBL}          { return (TOK_TDBL); }
&lt;S0&gt;{RGX_TLONG}         { return (TOK_TLONG); }
&lt;S0&gt;{RGX_TSHORT}        { return (TOK_TSHORT); }
&lt;S0&gt;{RGX_SIGNED}        { return (TOK_SIGNED); }
&lt;S0&gt;{RGX_UNSIGNED}      { return (TOK_UNSIGNED); }
&lt;S0&gt;{RGX_EXT}           { return (TOK_EXT); }
&lt;S0&gt;{RGX_STAT}          { return (TOK_STAT); }
&lt;S0&gt;{RGX_TYPEDEF}       { return (TOK_TYPEDEF); }
&lt;S0&gt;{RGX_STRUCT}        { return (TOK_STRUCT); }
&lt;S0&gt;{RGX_ENUM}          { return (TOK_ENUM); }
&lt;S0&gt;{RGX_UNI}           { return (TOK_UNI); }
&lt;S0&gt;{RGX_SO}            { return (TOK_SO); }
&lt;S0&gt;"["                 { return (TOK_ARRO); }
&lt;S0&gt;"]"                 { return (TOK_ARRC); }
&lt;S0&gt;{RGX_IF}            { return (TOK_IF); }
&lt;S0&gt;{RGX_THEN}          { return (TOK_THEN); }
&lt;S0&gt;{RGX_ELSE}          { return (TOK_ELSE); }
&lt;S0&gt;{RGX_GOTO}          { return (TOK_GOTO); }
&lt;S0&gt;{RGX_RET}           { return (TOK_RET); }
&lt;S0&gt;{RGX_LABLE}         { return (TOK_LABLE); }
&lt;S0&gt;{RGX_DELIM}         { return (TOK_DELIM); }
&lt;S0&gt;{RGX_BEGIN}         { return (TOK_BEGIN); }
&lt;S0&gt;{RGX_END}           { return (TOK_END); }
&lt;S0&gt;{RGX_PARENO}        { return (TOK_PARENO); }
&lt;S0&gt;{RGX_PARENC}        { return (TOK_PARENC); }
&lt;S0&gt;{RGX_COMMA}         { return (TOK_COMMA); }
&lt;S0&gt;{RGX_MBR}           { return (TOK_MBR); }
&lt;S0&gt;{RGX_PMBR}          { return (TOK_PMBR); }
&lt;S0&gt;{RGX_INC}           { return (TOK_INC); }
&lt;S0&gt;{RGX_DEC}           { return (TOK_DEC); }
&lt;S0&gt;{RGX_MULEQ}         { return (TOK_MULEQ); }
&lt;S0&gt;{RGX_MULDEREF}      { return (TOK_MULDEREF); }
&lt;S0&gt;{RGX_DIVEQ}         { return (TOK_DIVEQ); }
&lt;S0&gt;{RGX_DIV}           { return (TOK_DIV); }
&lt;S0&gt;{RGX_ADDEQ}         { return (TOK_ADDEQ); }
&lt;S0&gt;{RGX_ADD}           { return (TOK_ADD); }
&lt;S0&gt;{RGX_MINEQ}         { return (TOK_MINEQ); }
&lt;S0&gt;{RGX_MIN}           { return (TOK_MIN); }
&lt;S0&gt;{RGX_MODEQ}         { return (TOK_MODEQ); }
&lt;S0&gt;{RGX_MOD}           { return (TOK_MOD); }
&lt;S0&gt;{RGX_TRTHEN}        { return (TOK_TRTHEN); }
&lt;S0&gt;{RGX_TRELSE}        { return (TOK_TRELSE); }
&lt;S0&gt;{RGX_LAND}          { return (TOK_LAND); }
&lt;S0&gt;{RGX_LOR}           { return (TOK_LOR); }
&lt;S0&gt;{RGX_LEQ}           { return (TOK_LEQ); }
&lt;S0&gt;{RGX_NEQ}           { return (TOK_NEQ); }
&lt;S0&gt;{RGX_GTE}           { return (TOK_GTE); }
&lt;S0&gt;{RGX_GT}            { return (TOK_GT); }
&lt;S0&gt;{RGX_LTE}           { return (TOK_LTE); }
&lt;S0&gt;{RGX_LT}            { return (TOK_LT); }
&lt;S0&gt;{RGX_BAND}          { return (TOK_BAND); }
&lt;S0&gt;{RGX_BOR}           { return (TOK_BOR); }
&lt;S0&gt;{RGX_XOR}           { return (TOK_XOR); }
&lt;S0&gt;{RGX_NOT}           { return (TOK_NOT); }
&lt;S0&gt;{RGX_RSH}           { return (TOK_RSH); }
&lt;S0&gt;{RGX_LSH}           { return (TOK_LSH); }
&lt;S0&gt;{RGX_EQ}            { return (TOK_EQ); }
&lt;S0&gt;{RGX_ONECMP}        { return (TOK_ONECMP); }
&lt;S0&gt;{RGX_NAME}          { return (TOK_NAME); }
&lt;S0&gt;{RGX_WORD}          { return (TOK_WORD); }
&lt;S0&gt;\"{RGX_STR}\"       |
&lt;S2&gt;\"{RGX_STR}\"       {
                ;
            }
&lt;S0&gt;\'{RGX_CHR}\'       { ; }
&lt;S0&gt;{RGX_WS}    { ; }
%%
int
main(int ac, char *av[])
{
    ++av; --ac;
    if (ac &gt; 0) {
        yyin = fopen(av[0], "r");
    } else {
        yyin = stdin;
    }
    BEGIN(S0);
    while (!feof(yyin)) {
        yylex();
    }
}
</code></pre>
<p>We&#8217;ll be using this lexer to lex the C source file that&#8217;s been generated from<br />
lex code that specifies how to lex the C language.</p>
<p>First the available probes:</p>
<pre><code>provider lex {
    /* Tracks the lexer's state transitions */
    probe state(int);
    /* Indicates which action was taken */
    probe action(int);
    /* Fires whenever a match occures; arg0 is matched str */
    probe match(char*);
    /* The current character */
    probe ch(char);
    /* Fires whenever the lexer backtracks */
    probe fallback();
    /* Fires whenever we enter a compressed state */
    probe compr_st();
    /* Fires whenever the lexer stops to complete a match */
    probe stop();
    /* Fires whenever the user uses the BEGIN macro */
    probe begin(int, void*);
    /* Fires whenever the user uses the REJECT macro */
    probe reject();
};
</code></pre>
<p>As you can see, we probe events such as state transitions, matches,<br />
action-execution, backtracking, and the current character.</p>
<p>So the first thing we will do, is a high-resolution state-by-state,<br />
character-by-character, match-by-match trace of the C lexer lexing itself.</p>
<p>The following script, <code>trace.d</code> is used:</p>
<pre><code>lex$target:::match
{
}

lex$target:::reject
{
}

lex$target:::fallback_st
{
    trace(arg0);
}

lex$target:::state
{
    trace(arg0);
}

lex$target:::begin
{
    trace(arg0);
}

lex$target:::fallback_ch,
lex$target:::ch
/arg0 != '\t' &amp;&amp; arg0 != '\n' &amp;&amp; arg0 != ''/
{
    printf(": '%c'\n", (char) arg0);
}

lex$target:::fallback_ch,
lex$target:::ch
/arg0 == '\t'/
{
    printf(": '\\t'\n");
}

lex$target:::fallback_ch,
lex$target:::ch
/arg0 == '\n'/
{
    printf(": '\\n'\n");
}

lex$target:::fallback_ch,
lex$target:::ch
/arg0 == ''/
{
    printf(": '\'\n");
}

pid$target::yylex:entry
{

}

pid$target::yylex:return
{
    trace(arg1);
}
</code></pre>
<p>However, keep in mind that if you&#8217;re not using a BEGIN or REJECT macro in any<br />
of your lex code, you&#8217;ll have to commend out the parts of the code that enables<br />
those probes.</p>
<p>Running it on the lexer (the <code>clexer</code> binary):</p>
<pre><code>pfexec dtrace -c './clexer lex.yy.c' -s trace.d
</code></pre>
<p>Here are the first few lines of output:</p>
<pre><code>CPU     ID                    FUNCTION:NAME
  3  72077                       main:begin                 2
  3  72136                      yylex:entry
  3  72084                     yylook:state                 3
  3  72079                        yylook:ch : '#'

  3  72082               yylook:fallback_st                 2
  3  72084                     yylook:state                12
  3  72079                        yylook:ch : 'i'

  3  72084                     yylook:state                65
  3  72079                        yylook:ch : 'n'

  3  72084                     yylook:state               122
  3  72079                        yylook:ch : 'c'

  3  72084                     yylook:state               156
  3  72079                        yylook:ch : 'l'

  3  72084                     yylook:state               186
  3  72079                        yylook:ch : 'u'

  3  72084                     yylook:state               208
  3  72079                        yylook:ch : 'd'

  3  72084                     yylook:state               223
  3  72079                        yylook:ch : 'e'

  3  72084                     yylook:state               228
  3  72083                     yylook:match
  3  72138                     yylex:return               260
  3  72136                      yylex:entry
  3  72084                     yylook:state                 2
  3  72079                        yylook:ch : ' '

  3  72084                     yylook:state                 8
  3  72079                        yylook:ch : '&lt;'

  3  72083                     yylook:match
  3  72084                     yylook:state                 2
  3  72079                        yylook:ch : '&lt;'

  3  72084                     yylook:state                28
  3  72079                        yylook:ch : 's'
...
</code></pre>
<p>As you can see, we also trace the return value of yylex, which is the integer<br />
that represents a particular token. The definitions can be found in <code>y.tab.h</code><br />
which was generated from an incomplete yacc grammar.</p>
<p>Full <code>y.tab.h</code>:</p>
<pre><code># define TOK_SPACES 257
# define TOK_NL 258
# define TOK_TB 259
# define TOK_INCL 260
# define TOK_DEF 261
# define TOK_UNDEF 262
# define TOK_MACIF 263
# define TOK_MACELSE 264
# define TOK_MACELIF 265
# define TOK_MACENDIF 266
# define TOK_IFDEF 267
# define TOK_IFNDEF 268
# define TOK_PRAGMA 269
# define TOK_WARNING 270
# define TOK_ERROR 271
# define TOK_LINE 272
# define TOK_LINCONT 273
# define TOK_INCLIB 274
# define TOK_INCUSR 275
# define TOK_DIGIT 276
# define TOK_INT 277
# define TOK_FLOAT 278
# define TOK_HEX 279
# define TOK_TCHAR 280
# define TOK_TINT 281
# define TOK_TFLOAT 282
# define TOK_TDBL 283
# define TOK_TLONG 284
# define TOK_TSHORT 285
# define TOK_SIGNED 286
# define TOK_UNSIGNED 287
# define TOK_EXT 288
# define TOK_STAT 289
# define TOK_TYPEDEF 290
# define TOK_STRUCT 291
# define TOK_ENUM 292
# define TOK_UNI 293
# define TOK_SO 294
# define TOK_ARRO 295
# define TOK_ARRC 296
# define TOK_IF 297
# define TOK_THEN 298
# define TOK_ELSE 299
# define TOK_GOTO 300
# define TOK_RET 301
# define TOK_LABLE 302
# define TOK_DELIM 303
# define TOK_BEGIN 304
# define TOK_END 305
# define TOK_PARENO 306
# define TOK_PARENC 307
# define TOK_COMMA 308
# define TOK_MBR 309
# define TOK_PMBR 310
# define TOK_INC 311
# define TOK_DEC 312
# define TOK_MULEQ 313
# define TOK_MULDEREF 314
# define TOK_DIVEQ 315
# define TOK_DIV 316
# define TOK_ADDEQ 317
# define TOK_ADD 318
# define TOK_MINEQ 319
# define TOK_MIN 320
# define TOK_MODEQ 321
# define TOK_MOD 322
# define TOK_TRTHEN 323
# define TOK_TRELSE 324
# define TOK_LAND 325
# define TOK_LOR 326
# define TOK_LEQ 327
# define TOK_NEQ 328
# define TOK_GTE 329
# define TOK_GT 330
# define TOK_LTE 331
# define TOK_LT 332
# define TOK_BAND 333
# define TOK_BOR 334
# define TOK_XOR 335
# define TOK_NOT 336
# define TOK_RSH 337
# define TOK_LSH 338
# define TOK_EQ 339
# define TOK_ONECMP 340
# define TOK_NAME 341
# define TOK_WORD 342
# define TOK_CHR 343
# define TOK_STR 344
</code></pre>
<p>Now we will do a very similar trace, except we will also print timestamps in<br />
the right-most column for particular events like matches, fallbacks, etc,<br />
using the following script, <code>trace_stamp.d</code>:</p>
<pre><code>lex$target:::match
{
    trace(vtimestamp);
}

lex$target:::reject
{
    trace(vtimestamp);
}

lex$target:::fallback_st
{
    trace(arg0);
    trace(vtimestamp);
}

lex$target:::state
{
    trace(arg0);
    trace(vtimestamp);
}

lex$target:::begin
{
    trace(arg0);
    trace(vtimestamp);
}

lex$target:::fallback_ch,
lex$target:::ch
/arg0 != '\t' &amp;&amp; arg0 != '\n' &amp;&amp; arg0 != ''/
{
    printf(": '%c'\n", (char) arg0);
}

lex$target:::fallback_ch,
lex$target:::ch
/arg0 == '\t'/
{
    printf(": '\\t'\n");
}

lex$target:::fallback_ch,
lex$target:::ch
/arg0 == '\n'/
{
    printf(": '\\n'\n");
}

lex$target:::fallback_ch,
lex$target:::ch
/arg0 == ''/
{
    printf(": '\'\n");
}

pid$target::yylex:entry
{
    trace(vtimestamp);
}

pid$target::yylex:return
{
    trace(arg1);
    trace(vtimestamp);
}
</code></pre>
<p>The commenting out of BEGIN or REJECT probes also applies to this script.</p>
<p>And now, the trace:</p>
<pre><code>CPU     ID                    FUNCTION:NAME
  2  72077                       main:begin                 2      833713
  2  72136                      yylex:entry            835272
  2  72084                     yylook:state                 3      849428
  2  72079                        yylook:ch : '#'

  2  72082               yylook:fallback_st                 2      983644
  2  72084                     yylook:state                12      984585
  2  72079                        yylook:ch : 'i'

  2  72084                     yylook:state                65      985769
  2  72079                        yylook:ch : 'n'

  2  72084                     yylook:state               122      986795
  2  72079                        yylook:ch : 'c'

  2  72084                     yylook:state               156      987847
  2  72079                        yylook:ch : 'l'

  2  72084                     yylook:state               186      988972
  2  72079                        yylook:ch : 'u'

  2  72084                     yylook:state               208      990097
  2  72079                        yylook:ch : 'd'

  2  72084                     yylook:state               223      991058
  2  72079                        yylook:ch : 'e'

  2  72084                     yylook:state               228      992023
  2  72083                     yylook:match            993829
  2  72138                     yylex:return               260      995786
  2  72136                      yylex:entry            996406
  2  72084                     yylook:state                 2      996916
  2  72079                        yylook:ch : ' '

  2  72084                     yylook:state                 8      997841
  2  72079                        yylook:ch : '&lt;'

  2  72083                     yylook:match            999122
  2  72084                     yylook:state                 2      999588
  2  72079                        yylook:ch : '&lt;'

  2  72084                     yylook:state                28     1000575
  2  72079                        yylook:ch : 's'
...
</code></pre>
<p>So the last two outputs are very detailed, and can be very useful in debugging<br />
the lexer. However, examining a complete trace from top to bottom is a very<br />
tedious, time-consuming task (the full output for each of the above traces is<br />
195,352 lines long).</p>
<p>As an alternative, one can use DTrace to ask a very specific question about the<br />
lexing process as a whole, without having to manually sift through reams of<br />
output.</p>
<p>For instance, we may wish to see how many times a particular token is matched,<br />
using <code>tok_count.d</code>:</p>
<pre><code>pid$target::yylex:return
{
    @[arg1] = count();
}
</code></pre>
<p>The output:</p>
<pre><code>0                  1
269                1
287                1
289                1
294                1
317                1
332                1
329                2
274                3
328                3
331                3
324                4
336                4
260                5
299                5
333                5
300                6
323                6
326                6
288                7
263                8
268                9
325               10
264               12
311               12
312               12
280               13
330               13
291               15
310               16
267               18
327               19
261               24
295               25
296               25
320               32
266               35
297               41
281               44
314               60
339               71
301               93
302              102
272              104
275              109
304              151
305              151
306              307
307              307
303              371
318              531
276             1201
341             1433
277             2186
308             3272
</code></pre>
<p>We may also wish to see what average lex time per token-type is, using<br />
<code>tok_time_avg.d</code>:</p>
<pre><code>pid$target::yylex:entry
{
    self-&gt;ts = vtimestamp;
}

pid$target::yylex:return
{
    @[arg1] = avg(vtimestamp - self-&gt;ts);
}
</code></pre>
<p>Output of <code>pfexec dtrace -c './clexer lex.yy.c' -s tok_time_avg.d</code>:</p>
<pre><code>308              509
323              512
295              518
296              524
324              534
303              552
320              554
318              559
310              566
312              569
306              570
336              595
311              606
333              631
330              635
327              637
339              638
314              639
276              644
332              660
325              665
328              673
317              678
329              688
277              694
331              699
326              700
305              718
304              767
280              802
302              831
264              836
268              859
266              861
341              879
294              884
287              894
288              906
281              915
297              941
291              946
299              961
300              966
301              973
272             1050
267             1124
263             1257
275             1433
274             1550
307             1641
261             1879
289             5318
0              18105
260            26437
269            66109
</code></pre>
<p>So, on average token 269 takes the longest to lex. Looking a y.tab.h, 269 is<br />
<code>TOK_PRAGMA</code>. 260, the runner up, is <code>TOK_INLC</code>, and so forth.</p>
<p>To see what is so slow about parsing a pragma, we&#8217;ll use <code>trace_pragma.d</code> a<br />
modified version of <code>trace_stamp.d</code>. <code>trace_pragma.d</code> uses DTrace&#8217;s speculative<br />
tracing facility to display the trace data pragma matching and nothing else.</p>
<p>The output of that trace is 6312 lines long. And it only encompasses <em>one</em><br />
trace.</p>
<p>Here&#8217;s the initial screen of text:</p>
<pre><code>CPU     ID                    FUNCTION:NAME
  2  72076                      yylex:entry          61977649
  2  72086                     yylook:state                 2     61978154
  2  72081                        yylook:ch : '\n'

  2  72086                     yylook:state                 9     61979102
  2  72081                        yylook:ch : '/'

  2  72085                     yylook:match          61980055
  2  72086                     yylook:state                 3     61980550
  2  72081                        yylook:ch : '/'

  2  72084               yylook:fallback_st                 2     61981495
  2  72086                     yylook:state                23     61981973
  2  72081                        yylook:ch : '*'

  2  72086                     yylook:state                80     61982925
  2  72085                     yylook:match          61983512
  2  72080                      yylex:begin                 4     61985065
  2  72086                     yylook:state                 4     61985521
  2  72081                        yylook:ch : '\n'

  2  72086                     yylook:state                53     61986638
  2  72085                     yylook:match          61987130
  2  72086                     yylook:state                 5     61987618
  2  72081                        yylook:ch : ' '

  2  72083               yylook:fallback_ch : '\t'

  2  72084               yylook:fallback_st                 4     61990368
  2  72083               yylook:fallback_ch : '\t'

  2  72086                     yylook:state                53     61991362
  2  72085                     yylook:match          61991838
  2  72086                     yylook:state                 4     61992306
  2  72081                        yylook:ch : '*'

  2  72086                     yylook:state                54     61993324
  2  72081                        yylook:ch : ' '

  2  72085                     yylook:match          61994266
  2  72086                     yylook:state                 4     61994728
  2  72081                        yylook:ch : ' '
...
</code></pre>
<p>As you can see we started off by parsing a block comment. We don&#8217;t finish<br />
parsing the block comment until line 5879 of the output.</p>
<p>The reason <code>yylex</code> takes so long to return on a pragma, is because it never<br />
returns when matching a comment, it just keeps lexing and matching until it<br />
matches something that it has to return, in this case a pragma token.</p>
<p>If that comment wasn&#8217;t there, right above the pragma it may have returned<br />
much faster.</p>
<p>Looking at the bottom of the output, we see that matching a pargma token takes<br />
more than 8 microseconds. However, because of the multitiude of probe enablings<br />
in this script, we have to take overhead into account.</p>
<p>Bottom of output:</p>
<pre><code>...
  2  72085                     yylook:match          64343356
  2  72086                     yylook:state                 3     64343869
  2  72081                        yylook:ch : '#'

  2  72084               yylook:fallback_st                 2     64344798
  2  72086                     yylook:state                12     64345273
  2  72081                        yylook:ch : 'p'

  2  72086                     yylook:state                67     64346255
  2  72081                        yylook:ch : 'r'

  2  72086                     yylook:state               124     64347347
  2  72081                        yylook:ch : 'a'

  2  72086                     yylook:state               158     64348283
  2  72081                        yylook:ch : 'g'

  2  72086                     yylook:state               188     64349246
  2  72081                        yylook:ch : 'm'

  2  72086                     yylook:state               209     64350288
  2  72081                        yylook:ch : 'a'

  2  72086                     yylook:state               224     64351204
  2  72085                     yylook:match          64351727
  2  72138                     yylex:return               269     64352499
</code></pre>
<p>A better way to see how long lex takes to match a pragma is trace the time<br />
between matches. We will use the final state that occurs before a pragma is<br />
matched (224), to determine that this is the match that we care about.</p>
<p>The script used, <code>match_fst_time_avg.d</code>:</p>
<pre><code>dtrace:::BEGIN
{
    self-&gt;prevts = vtimestamp;
    self-&gt;curts = 0;
}

lex$target:::state
{
    self-&gt;st = arg0;
}

lex$target:::match
{
    self-&gt;curts = vtimestamp;
    @[self-&gt;st] = avg(self-&gt;curts - self-&gt;prevts);
    self-&gt;prevts = self-&gt;curts;
}
</code></pre>
<p>The output:</p>
<pre><code> 14             1397
 53             1401
 28             1413
 51             1433
 54             1434
 49             1464
 25             1486
 29             1509
 30             1558
 18             1596
 24             1606
 16             1660
 38             1797
 10             1819
 17             1828
 31             1838
 35             1838
 27             1842
 71             1842
 58             1845
 20             1848
 26             1852
  9             1854
110             1854
 33             1856
 80             1862
 89             1871
 21             1881
 19             1886
 91             1887
 76             1895
111             1897
100             1939
  8             2015
 90             2026
 77             2264
 75             2286
 79             2303
 84             2309
121             2382
 85             2395
137             2505
163             2840
167             2873
161             3090
 72             3318
181             3319
187             3780
 32             3857
213             3860
216             3864
217             3868
218             3883
214             3895
204             3931
206             4094
222             4258
221             4717
224             4898
230             4973
 37             5461
129             6119
116             7140
 60            11786
228           176927
</code></pre>
<p>As you can see matching pragmas is actually fast (5 microseconds as opposed to<br />
66).</p>
<p>I rather prefer this script to the <code>tok_time_avg.d</code> because this script&#8217;s<br />
measurements reflect the average time between matches, regardless of whether or<br />
not a particular match causes <code>yylex</code> to return. </p>
<p>And the following script associates final states with token types, discarding<br />
the states that have no token type.</p>
<p><code>tok_fstate.d</code>:</p>
<pre><code>lex$target:::state
{
    self-&gt;st = arg0;
}

pid$target::yylex:return
{
    @[arg1, self-&gt;st] = count();
}
</code></pre>
<p>So, here is the table of tokens and their associated final state:</p>
<pre><code>  0                3                1
269              224                1
287              230                1
289              217                1
294              216                1
317               76                1
332               28                1
329               91                2
274              129                3
328               58                3
331               89                3
324               26                4
336               10                4
260              228                5
299              163                5
333               14                5
300              167                6
323               31                6
326              110                6
288              213                7
263              121                8
268              222                9
325               71               10
264              181               12
311               75               12
312               77               12
280              161               13
330               30               13
291              218               15
341               38               15
310               79               16
267              206               18
327               90               19
261              221               24
295               33               25
296               35               25
320               21               32
266              204               35
297              100               41
281              137               44
314               18               60
339               29               71
301              214               93
</code></pre>
<p>And, so as you can see the most expensive match is the token that has final<br />
state 228, which is <code>TOK_INCL</code> (a &#8220;#include&#8221;).</p>
<p>The cheapest, a bitwise &#8216;and&#8217; (&#8220;&amp;&#8221;).</p>
<p>What surpises the hell out of me is that (state 38) <code>TOK_NAME</code> (variable names,<br />
function names, anything that&#8217;s not a keyword or operator) gets matched much<br />
faster, on average than <code>TOK_INCL</code> does. <code>TOK_NAME</code> gets parsed in almost 2<br />
microseconds, <code>TOK_INCL</code> gets parsed in 177 microseconds.</p>
<p>Perhaps this is because <code>TOK_INCL</code>&#8216;s get matched at the beginning of the file,<br />
and at that point our <code>clexer</code> binary is just starting execution, and has yet<br />
to have some of its instructions and memory cached. But that&#8217;s just a<br />
hypothesis.</p>
<p>Of course, averages get boring after a while, at which point you&#8217;d probably<br />
like to move on to using scripts that employ DTrace&#8217;s <code>quantize</code> function, like<br />
<code>tok_fst_time_qtz.d</code>:</p>
<pre><code>dtrace:::BEGIN
{
    self-&gt;prevts = vtimestamp;
    self-&gt;curts = 0;
}

lex$target:::state
{
    self-&gt;st = arg0;
}

lex$target:::match
{
    self-&gt;curts = vtimestamp;
    @[self-&gt;st] = quantize(self-&gt;curts - self-&gt;prevts);
    self-&gt;prevts = self-&gt;curts;
}
</code></pre>
<p>Tail of output:</p>
<pre><code>...
           84
       value  ------------- Distribution ------------- count
         512 |                                         0
        1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@              1461
        2048 |@@@@@@@@@@@@@                            723
        4096 |                                         0
        8192 |                                         1
       16384 |                                         0
       32768 |                                         0
       65536 |                                         0
      131072 |                                         1
      262144 |                                         0        

           20
       value  ------------- Distribution ------------- count
         512 |                                         0
        1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3270
        2048 |                                         1
        4096 |                                         1
        8192 |                                         0        

            8
       value  ------------- Distribution ------------- count
         512 |                                         0
        1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     2888
        2048 |@@@@                                     299
        4096 |                                         11
        8192 |                                         2
       16384 |                                         0
       32768 |                                         0
       65536 |                                         0
      131072 |                                         1
      262144 |                                         0        

           32
       value  ------------- Distribution ------------- count
         512 |                                         0
        1024 |@                                        35
        2048 |@@@@@@@@@@@@@@@@@@@                      624
        4096 |@@@@@@@@@@@@@@@@@@@                      619
        8192 |                                         2
       16384 |                                         0
       32768 |                                         0
       65536 |                                         0
      131072 |                                         1
      262144 |                                         0
</code></pre>
<p>As far as I know, that&#8217;s a bit of dynamic introspection that the lexing process<br />
never had! Cool. Now compiler writers, on Illumos, can shine a much brighter<br />
flashlight at their lexers, illuminating some of the depths of this process.<br />
And this is just the beginning, I hope to add more instrumentation in the<br />
future. For example, instead of having to use final states and returned token<br />
id&#8217;s, I&#8217;d like to be able to aggregate on the regular expressions defined in<br />
the lex source file.</p>
<p>Though, it just occurred to me that I could use DTrace&#8217;s built-in hash table<br />
construct to associate final states with token-types, but oh well. I&#8217;ll leave<br />
that as an exercise to the reader.</p>
<p>As for how <code>lex</code> works, it&#8217;s quite simple but I&#8217;m not into hand-holding, as<br />
mentioned earlier so, if lexing turns you on, check out the code. It&#8217;s<br />
relatively straight-foreward, and could even use a few enhancements. Such as<br />
generating code that is prettier to read, removing some of the deficiencies in<br />
the pattern matcher (there are always deficiencies <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  ), and so forth. </p>
<p>Also, there&#8217;s a repository with the DTrace scripts I used, available <a href="https://bitbucket.org/nickziv/lex-dscripts">here</a>.</p>
<p>So probing <code>lex</code> was a nice little warm-up. An appetizer. Foreplay, even. Now<br />
for the main course: <code>yacc</code>.</p>
<p>Stay tuned, for part 1.5.</p>
<p>INSTALL NOTES:<br />
* use bldenv to set up some environment variables<br />
* run make<br />
* copy the generated <code>lex</code> binary into wherever you keep your custom bins<br />
* copy <code>lex_probes.d</code> and <code>dtrace_ncform</code> from <code>common</code> to <code>/usr/share/lib/ccs/</code></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nickziv.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nickziv.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/nickziv.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/nickziv.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/nickziv.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/nickziv.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/nickziv.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/nickziv.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/nickziv.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/nickziv.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/nickziv.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/nickziv.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/nickziv.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/nickziv.wordpress.com/103/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nickziv.wordpress.com&amp;blog=15835448&amp;post=103&amp;subd=nickziv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nickziv.wordpress.com/2011/07/17/adventures-of-a-dtrace-addict-part-1-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ae77a9245fb476328a04af02a009f2a8?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">nickzivkovic</media:title>
		</media:content>
	</item>
		<item>
		<title>Adventures of a DTrace Addict: Part 0</title>
		<link>http://nickziv.wordpress.com/2011/04/08/adventures-of-a-dtrace-addict-part-0/</link>
		<comments>http://nickziv.wordpress.com/2011/04/08/adventures-of-a-dtrace-addict-part-0/#comments</comments>
		<pubDate>Fri, 08 Apr 2011 04:02:32 +0000</pubDate>
		<dc:creator>nick zivkovic</dc:creator>
				<category><![CDATA[DTrace]]></category>
		<category><![CDATA[Illumos]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[dtrace]]></category>
		<category><![CDATA[illumos]]></category>
		<category><![CDATA[kernel]]></category>
		<category><![CDATA[ls]]></category>
		<category><![CDATA[mmap]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[syscall]]></category>

		<guid isPermaLink="false">http://nickziv.wordpress.com/?p=22</guid>
		<description><![CDATA[From ls to mmap Back in my day, I would probe by hand. Now you can get software that does the job for you. -Kevin Mitnick My name is Nick and I am a DTrace addict. Hi Nick I&#8217;m not sure when it started (it&#8217;s hard to remember when you&#8217;re having fun), but it was [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nickziv.wordpress.com&amp;blog=15835448&amp;post=22&amp;subd=nickziv&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h2>From <code>ls</code> to <code>mmap</code></h2>
<p><strong>Back in my day, I would probe by hand. Now you can get software<br />
that does the job for you.</strong> -Kevin Mitnick</p>
<p>My name is Nick and I am a DTrace addict.</p>
<p><em>Hi Nick</em></p>
<p>I&#8217;m not sure when it started (it&#8217;s hard to remember when you&#8217;re having fun),<br />
but it was either near the end of high school or the beginning of college, when<br />
I transitioned from Haskell to C, having realized that I had something of a<br />
fetish for systems programming. I had OpenSolaris installed and was aching to<br />
rip into the intricate, gritty, and perversely beautiful mechanisms that make it<br />
sane to use a computer for pretty much everything important.</p>
<p>Initially, I set out to read the source code for various utilities that I use,<br />
ranging from the simple (ls, cp, mv, etc) to the complex (X11, firefox, jvm,<br />
etc). Then I planned on transitioning to infrastructure (libc, libumem, the<br />
kernel, etc).</p>
<p>Needless to say, I figured that just reading the source wasn&#8217;t enough to gather<br />
a true understanding of the system in a timely manner. I was also afraid that I<br />
might misunderstand something, and have a completely false intuition of how the<br />
system works, which I find to be a far worse affliction than not knowing at<br />
all.</p>
<p>So, having already chanced upon Bryan Cantrill&#8217;s <a href="http://www.youtube.com/watch?v=6chLw2aodYQ">DTrace Review</a><br />
(Google Tech Talk), I did some extensive research, and decided that DTrace was<br />
going to be an indispensable companion, offering guidance and illuminating the<br />
darkest of tunnels and infrastructural perplexities.</p>
<p>After reading the manual, following along with examples on the blogs of the<br />
<a href="http://www.dtrace.org/">creators of DTrace and other DTrace addicts</a>, and field testing the<br />
software on some class projects, I took time away to understand the operation<br />
of the utilities that I use, keeping the source code nearby.</p>
<p>And so, curious reader, the journey began, as I descended into the dark depths<br />
of a gaping abyss, not unlike a potholing version of Indiana Jones. Stopping,<br />
frequently, to turn over rocks and poke anything that looks strange or alive.</p>
<p>So, I started out light, deciding to check out the <code>ls</code> utility. How<br />
complicated could it be?</p>
<p>I had multiple vectors from which I could have started jabbing <code>ls</code>.</p>
<p>I chose to use the syscall provider to get a high-level view of what ls was<br />
asking the system to do.</p>
<p>Here is the one-liner I used on a directory full of source files (in<br />
particular, the root directory of the source for the <code>ls</code> command in <a href="http://www.illumos.org/">Illumos</a> rev 147):</p>
<pre><code>pfexec dtrace -c 'ls' -n 'syscall:::entry /pid == $target/
    {@[probefunc] = count();}'
</code></pre>
<p>Here are the results:</p>
<pre><code>  fcntl                                                             1
  getpid                                                            1
  getrlimit                                                         1
  openat                                                            1
  rexit                                                             1
  sysi86                                                            1
  write                                                             1
  getdents64                                                        2
  memcntl                                                           2
  mmap                                                              2
  mmapobj                                                           2
  open                                                              2
  resolvepath                                                       2
  setcontext                                                        2
  ioctl                                                             3
  stat64                                                            3
  fstat64                                                           4
  close                                                             5
  brk                                                               6
</code></pre>
<p>So, one can see that brk was run most frequently, followed by close, and so on.</p>
<p>But what parts of the code were calling on the system to do its bidding? The<br />
following one-liner gives the stack-trace of all the userland functions that<br />
were invoking the <a href="http://en.wikipedia.org/wiki/System_call">system calls</a>:</p>
<pre><code>pfexec dtrace -c 'ls' -n 'syscall:::entry /pid == $target/
    {@[probefunc,ustack()] = count();}'
</code></pre>
<p>The output of which looks like this:</p>
<pre><code>  brk
          libc.so.1`_brk_unlocked+0x15
          libc.so.1`sbrk+0x35
          libc.so.1`_morecore+0xfc
          libc.so.1`_malloc_unlocked+0x17f
          libc.so.1`malloc+0x35
          ls`xmalloc+0x14
          ls`main+0x35e
          ls`_start+0x7d
        1
  brk
          libc.so.1`_brk_unlocked+0x15
          libc.so.1`sbrk+0x35
          libc.so.1`_morecore+0xfc
          libc.so.1`_malloc_unlocked+0x17f
          libc.so.1`_smalloc+0x50
          libc.so.1`_malloc_unlocked+0x209
          libc.so.1`malloc+0x35
          libc.so.1`strdup+0x26
          libc.so.1`expand_locale_name+0x469
          libc.so.1`setlocale+0x8ab
          ls`main+0x28
          ls`_start+0x7d
        1
  brk
          libc.so.1`_brk_unlocked+0x15
          libc.so.1`sbrk+0x35
          libc.so.1`_morecore+0xfc
          libc.so.1`_malloc_unlocked+0x17f
          libc.so.1`_smalloc+0x50
          libc.so.1`_malloc_unlocked+0x209
          libc.so.1`malloc+0x35
          ls`xmalloc+0x14
          ls`xmemdup+0x16
          ls`xstrdup+0x21
          ls`gobble_file+0xa3c
          ls`print_dir+0x2d0
          ls`main+0x526
          ls`_start+0x7d
        1
  brk
          libc.so.1`_brk_unlocked+0x15
          libc.so.1`sbrk+0x35
          libc.so.1`_morecore+0x2e
          libc.so.1`_malloc_unlocked+0x17f
          libc.so.1`malloc+0x35
          ls`xmalloc+0x14
          ls`main+0x35e
          ls`_start+0x7d
        1
  brk
          libc.so.1`_brk_unlocked+0x15
          libc.so.1`sbrk+0x35
          libc.so.1`_morecore+0x2e
          libc.so.1`_malloc_unlocked+0x17f
          libc.so.1`_smalloc+0x50
          libc.so.1`_malloc_unlocked+0x209
          libc.so.1`malloc+0x35
          libc.so.1`strdup+0x26
          libc.so.1`expand_locale_name+0x469
          libc.so.1`setlocale+0x8ab
          ls`main+0x28
          ls`_start+0x7d
        1
  brk
          libc.so.1`_brk_unlocked+0x15
          libc.so.1`sbrk+0x35
          libc.so.1`_morecore+0x2e
          libc.so.1`_malloc_unlocked+0x17f
          libc.so.1`_smalloc+0x50
          libc.so.1`_malloc_unlocked+0x209
          libc.so.1`malloc+0x35
          ls`xmalloc+0x14
          ls`xmemdup+0x16
          ls`xstrdup+0x21
          ls`gobble_file+0xa3c
          ls`print_dir+0x2d0
          ls`main+0x526
          ls`_start+0x7d
        1
  close
          ld.so.1`__close+0x7
          ld.so.1`file_open+0x3f9
          ld.so.1`find_path+0x155
          ld.so.1`load_so+0xdd
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  close
          ld.so.1`__close+0x7
          ld.so.1`file_open+0x3f9
          ld.so.1`_find_file+0x110
          ld.so.1`find_file+0x1f7
          ld.so.1`load_so+0x340
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`elf_needed+0x1b1
          ld.so.1`analyze_lmc+0xd3
          ld.so.1`dlmopen_core+0x1d0
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  close
          libc.so.1`__close+0x15
          libc.so.1`fclose+0xaa
          ls`close_stream+0x2d
          ls`close_stdout+0xb0
          libc.so.1`_exithandle+0x63
          libc.so.1`exit+0x12
          ls`_start+0x7d
        1
  close
          libc.so.1`__close+0x15
          libc.so.1`fclose+0xaa
          ls`close_stream+0x2d
          ls`close_stdout+0x15
          libc.so.1`_exithandle+0x63
          libc.so.1`exit+0x12
          ls`_start+0x7d
        1
  close
          libc.so.1`__close+0x15
          libc.so.1`closedir+0x4d
          ls`print_dir+0x327
          ls`main+0x526
          ls`_start+0x7d
        1
  fcntl
          libc.so.1`syscall+0x13
          libc.so.1`fcntl+0x104
          libc.so.1`fdopendir+0x5c
          libc.so.1`opendir+0x3f
          ls`print_dir+0x23
          ls`main+0x526
          ls`_start+0x7d
        1
  fstat64
          libc.so.1`syscall+0x13
          libc.so.1`_findbuf+0x9c
          libc.so.1`_wrtchk+0x5d
          libc.so.1`_fwrite_unlocked+0x4a
          libc.so.1`fwrite+0x53
          ls`quote_name+0x107
          ls`print_name_with_quoting+0x151
          ls`print_file_name_and_frills+0x10a
          ls`print_current_files+0xb3
          ls`print_dir+0x47e
          ls`main+0x526
          ls`_start+0x7d
        1
  fstat64
          libc.so.1`syscall+0x13
          libc.so.1`fdopendir+0x78
          libc.so.1`opendir+0x3f
          ls`print_dir+0x23
          ls`main+0x526
          ls`_start+0x7d
        1
  fstat64
          libc.so.1`syscall+0x13
          libc.so.1`isseekable+0x4b
          libc.so.1`_setbufend+0x36
          libc.so.1`_findbuf+0x148
          libc.so.1`_wrtchk+0x5d
          libc.so.1`_fwrite_unlocked+0x4a
          libc.so.1`fwrite+0x53
          ls`quote_name+0x107
          ls`print_name_with_quoting+0x151
          ls`print_file_name_and_frills+0x10a
          ls`print_current_files+0xb3
          ls`print_dir+0x47e
          ls`main+0x526
          ls`_start+0x7d
        1
  fstat64
          libc.so.1`syscall+0x13
          libc.so.1`isptsfd+0x40
          libc.so.1`xpg4_fixup+0x2e
          libc.so.1`__openat+0x32
          libc.so.1`openat+0x83
          libc.so.1`opendir+0x29
          ls`print_dir+0x23
          ls`main+0x526
          ls`_start+0x7d
        1
  getpid
          libc.so.1`getpid+0x15
          ld.so.1`rt_thr_init+0x40
          ld.so.1`setup+0x15c2
          ld.so.1`_setup+0x310
          ld.so.1`_rt_boot+0x56
          0x8047b48
        1
  getrlimit
          libc.so.1`getrlimit+0x15
          ld.so.1`rt_thr_init+0x40
          ld.so.1`setup+0x15c2
          ld.so.1`_setup+0x310
          ld.so.1`_rt_boot+0x56
          0x8047b48
        1
  ioctl
          libc.so.1`ioctl+0x15
          libc.so.1`_findbuf+0x61
          libc.so.1`_wrtchk+0x5d
          libc.so.1`_fwrite_unlocked+0x4a
          libc.so.1`fwrite+0x53
          ls`quote_name+0x107
          ls`print_name_with_quoting+0x151
          ls`print_file_name_and_frills+0x10a
          ls`print_current_files+0xb3
          ls`print_dir+0x47e
          ls`main+0x526
          ls`_start+0x7d
        1
  ioctl
          libc.so.1`ioctl+0x15
          ls`main+0x97
          ls`_start+0x7d
        1
  ioctl
          libc.so.1`ioctl+0x15
          ls`decode_switches+0x29
          ls`main+0x97
          ls`_start+0x7d
        1
  memcntl
          ld.so.1`memcntl+0x7
          ld.so.1`elf_new_lmp+0x10bb
          ld.so.1`load_file+0x170
          ld.so.1`load_so+0x491
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`elf_needed+0x1b1
          ld.so.1`analyze_lmc+0xd3
          ld.so.1`dlmopen_core+0x1d0
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  memcntl
          ld.so.1`memcntl+0x7
          ld.so.1`elf_new_lmp+0x10bb
          ld.so.1`load_file+0x170
          ld.so.1`load_so+0x491
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  mmap
          libc.so.1`__systemcall+0x6
          libc.so.1`lmalloc+0xd8
          libc.so.1`__tls_static_mods+0xb1
          ld.so.1`tls_statmod+0x1ca
          ld.so.1`setup+0x15a0
          ld.so.1`_setup+0x310
          ld.so.1`_rt_boot+0x56
          0x8047b48
        1
  mmap
          ld.so.1`mmap+0x7
          ld.so.1`malloc+0x7e
          ld.so.1`calloc+0x25
          ld.so.1`elf_new_lmp+0x284
          ld.so.1`load_file+0x170
          ld.so.1`load_so+0x491
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  mmapobj
          ld.so.1`mmapobj+0x7
          ld.so.1`file_open+0x3e1
          ld.so.1`find_path+0x155
          ld.so.1`load_so+0xdd
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  mmapobj
          ld.so.1`mmapobj+0x7
          ld.so.1`file_open+0x3e1
          ld.so.1`_find_file+0x110
          ld.so.1`find_file+0x1f7
          ld.so.1`load_so+0x340
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`elf_needed+0x1b1
          ld.so.1`analyze_lmc+0xd3
          ld.so.1`dlmopen_core+0x1d0
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  open
          ld.so.1`syscall+0x5
          ld.so.1`open+0x25
          ld.so.1`file_open+0x3be
          ld.so.1`find_path+0x155
          ld.so.1`load_so+0xdd
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  open
          ld.so.1`syscall+0x5
          ld.so.1`open+0x25
          ld.so.1`file_open+0x3be
          ld.so.1`_find_file+0x110
          ld.so.1`find_file+0x1f7
          ld.so.1`load_so+0x340
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`elf_needed+0x1b1
          ld.so.1`analyze_lmc+0xd3
          ld.so.1`dlmopen_core+0x1d0
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  openat
          libc.so.1`syscall+0x13
          libc.so.1`openat+0x83
          libc.so.1`opendir+0x29
          ls`print_dir+0x23
          ls`main+0x526
          ls`_start+0x7d
        1
  resolvepath
          ld.so.1`resolvepath+0x7
          ld.so.1`find_path+0x155
          ld.so.1`load_so+0xdd
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  resolvepath
          ld.so.1`resolvepath+0x7
          ld.so.1`_find_file+0x110
          ld.so.1`find_file+0x1f7
          ld.so.1`load_so+0x340
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`elf_needed+0x1b1
          ld.so.1`analyze_lmc+0xd3
          ld.so.1`dlmopen_core+0x1d0
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  rexit
          libc.so.1`0xfee334e8
          ls`_start+0x7d
        1
  setcontext
          libc.so.1`syscall+0x13
          libc.so.1`libc_init+0x3c8
          ld.so.1`rt_thr_init+0x40
          ld.so.1`setup+0x15c2
          ld.so.1`_setup+0x310
          ld.so.1`_rt_boot+0x56
          0x8047b48
        1
  setcontext
          libc.so.1`__getcontext+0x19
          ld.so.1`rt_thr_init+0x40
          ld.so.1`setup+0x15c2
          ld.so.1`_setup+0x310
          ld.so.1`_rt_boot+0x56
          0x8047b48
        1
  stat64
          ld.so.1`syscall+0x5
          ld.so.1`rtld_stat+0x29
          ld.so.1`file_open+0x100
          ld.so.1`find_path+0x155
          ld.so.1`load_so+0xdd
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`dlmopen_core+0x134
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        1
  sysi86
          libc.so.1`sysi86+0x15
          ls`_start+0x6e
        1
  write
          libc.so.1`__write+0x15
          libc.so.1`_xflsbuf+0xb4
          libc.so.1`_fflush_u+0x37
          libc.so.1`fclose+0x80
          ls`close_stream+0x2d
          ls`close_stdout+0x15
          libc.so.1`_exithandle+0x63
          libc.so.1`exit+0x12
          ls`_start+0x7d
        1
  getdents64
          libc.so.1`getdents64+0x15
          ls`print_dir+0x2a2
          ls`main+0x526
          ls`_start+0x7d
        2
  stat64
          ld.so.1`syscall+0x5
          ld.so.1`rtld_stat+0x29
          ld.so.1`file_open+0x100
          ld.so.1`_find_file+0x110
          ld.so.1`find_file+0x1f7
          ld.so.1`load_so+0x340
          ld.so.1`load_path+0x52
          ld.so.1`load_one+0x1e7
          ld.so.1`elf_needed+0x1b1
          ld.so.1`analyze_lmc+0xd3
          ld.so.1`dlmopen_core+0x1d0
          ld.so.1`dlmopen_intn+0xe0
          ld.so.1`dlmopen_check+0xf1
          ld.so.1`dlopen+0x4a
          libc.so.1`load_locale+0x195
          libc.so.1`setlocale+0x93f
          ls`main+0x28
          ls`_start+0x7d
        2
</code></pre>
<p>This output is pretty cool. Actually I take it back. This is downright awesome!<br />
Within the time frame of a few minutes I was able to see the code-paths taken<br />
to execute all the system calls needed to list a directory. This would have<br />
taken weeks to fully understand, assuming I didn&#8217;t give up before then. It is<br />
this level of introspection that makes DTrace a programmer&#8217;s wet dream.</p>
<p>Of course it would help to know in what order these system calls are happening.<br />
And it is only a single line of code away:</p>
<pre><code>pfexec dtrace -F -c 'ls' -n 'syscall::: /pid == $target/ {}'
</code></pre>
<p>The &#8220;=&gt;&#8221; indicates that the system entered the system call and &#8220;&lt;=&#8221; indicated<br />
that the system exited it.</p>
<pre><code>CPU FUNCTION
  0  =&gt; mmap
  0  &lt;= mmap
  0  =&gt; setcontext
  0  &lt;= setcontext
  0  =&gt; getrlimit
  0  &lt;= getrlimit
  0  =&gt; getpid
  0  &lt;= getpid
  0  =&gt; setcontext
  0  &lt;= setcontext
  0  =&gt; sysi86
  0  &lt;= sysi86
  0  =&gt; brk
  0  &lt;= brk
  0  =&gt; brk
  0  &lt;= brk
  0  =&gt; stat64
  0  &lt;= stat64
  0  =&gt; resolvepath
  0  &lt;= resolvepath
  0  =&gt; open
  0  &lt;= open
  0  =&gt; mmapobj
  0  &lt;= mmapobj
  0  =&gt; close
  0  &lt;= close
  0  =&gt; mmap
  0  &lt;= mmap
  0  =&gt; memcntl
  0  &lt;= memcntl
  0  =&gt; stat64
  0  &lt;= stat64
  0  =&gt; stat64
  0  &lt;= stat64
  0  =&gt; resolvepath
  0  &lt;= resolvepath
  0  =&gt; open
  0  &lt;= open
  0  =&gt; mmapobj
  0  &lt;= mmapobj
  0  =&gt; close
  0  &lt;= close
  0  =&gt; memcntl
  0  &lt;= memcntl
  0  =&gt; ioctl
  0  &lt;= ioctl
  0  =&gt; ioctl
  0  &lt;= ioctl
  0  =&gt; brk
  0  &lt;= brk
  0  =&gt; brk
  0  &lt;= brk
  0  =&gt; openat
  0  &lt;= openat
  0  =&gt; fstat64
  0  &lt;= fstat64
  0  =&gt; fcntl
  0  &lt;= fcntl
  0  =&gt; fstat64
  0  &lt;= fstat64
  0  =&gt; getdents64
  0  &lt;= getdents64
  0  =&gt; brk
  0  &lt;= brk
  0  =&gt; brk
  0  &lt;= brk
  0  =&gt; getdents64
  0  &lt;= getdents64
  0  =&gt; close
  0  &lt;= close
  0  =&gt; ioctl
  0  &lt;= ioctl
  0  =&gt; fstat64
  0  &lt;= fstat64
  0  =&gt; fstat64
  0  &lt;= fstat64
  0  =&gt; write
  0  &lt;= write
  0  =&gt; close
  0  &lt;= close
  0  =&gt; close
  0  &lt;= close
  0  =&gt; rexit
</code></pre>
<p>So, there are multiple occurrences of some system calls (mmap, brk, etc), and<br />
it would be nice to see the stack traces in chronological order.</p>
<p>Another one-liner to the rescue:</p>
<pre><code>pfexec dtrace -c 'ls' -n 'syscall:::entry /pid == $target/
    {trace(probefunc); ustack();}'
</code></pre>
<p>So the first thing <code>ls</code> does is call <code>mmap</code>. Likely the one in the second stack<br />
trace from the top, which is a result of calling <code>setlocale</code>. Which begs the<br />
question, what the heck is <code>setlocale</code>? Luckily it has its own manpage. As much<br />
as I&#8217;d hate to admit it, if I hadn&#8217;t done this trace, it would have likely been<br />
a long, long time before I even heard of <code>setlocale</code>, let alone use it.<br />
Needless to say I studied the manpage to see if <code>setlocale</code> was of any interest<br />
to me.</p>
<p>So my shiny new toy now makes certain system facilities more discoverable than<br />
they were previously: merely a few command lines away, as opposed to involved<br />
studying of the source, and having to create the stack traces mentally (which<br />
isn&#8217;t difficult, if one has a brain, but is time consuming). Not that having<br />
the source isn&#8217;t just as cool&#8230;</p>
<p>Looking at that stack trace also reveals something else of interest: <code>ld.so.1</code>,<br />
which is used to load an ELF file has its own implementation of <code>malloc</code> and<br />
<code>calloc</code>, based on <code>mmap</code>. It doesn&#8217;t use the libc version.</p>
<p>So, I asked my self, now what? Well, it may be a good idea to keep going system<br />
call by system call, and take note of other implementation details of <code>ls</code>. Or<br />
I can go deeper, rappel into the rabit hole, and see what&#8217;s up with <code>libc.so</code><br />
and <code>ld.so</code>.</p>
<p>After all, <code>ls</code> isn&#8217;t that interesting: it merely lists directory contents.<br />
The ld shared object and the standard C library are where I saw the first<br />
glimmer of magic. I should dig into <code>mmap</code> and see the kernel magic in all of its<br />
coruscating brilliance. </p>
<p>So, <code>ls</code> patches into the kernel via <code>mmap</code>, after it calls <code>setlocale</code>. It would<br />
be cool to see what functions the kernel executes to carry out an <code>mmap</code>.  The<br />
command to carry this out is slightly more involved, and is thus rolled up into<br />
a script, <code>mmap_fbt.d</code>. We trace all the kernel functions triggered by<br />
<code>setlocale</code>&#8216;s call to <code>mmap</code>.</p>
<p>file mmap_fbt.d:</p>
<pre><code>pid$target::setlocale:entry
{
    self-&gt;follow = 1;
}

syscall::mmap:entry
/self-&gt;follow == 1/
{
    self-&gt;follow = 2;
}

syscall::mmap:return
/self-&gt;follow == 2/
{
    self-&gt;follow = 0;
    exit(0);
}

fbt:::entry,
fbt:::return
/self-&gt;follow == 2/
{
    trace(probemod);
}
</code></pre>
<p>executing said file:</p>
<pre><code>pfexec dtrace -c 'ls' -s mmap_fbt.d
</code></pre>
<p>The output of the above command is a rather detailed map of the kernel code<br />
taken to execute <code>mmap</code> in this particular instance. At this point it may be<br />
useful to pull up a copy of the Illumos source code. I did this on Illumos 147.</p>
<pre><code>CPU FUNCTION
  2  -&gt; smmap32                                 genunix
  2    -&gt; smmap_common                          genunix
  2      -&gt; as_rangelock                        genunix
  2      &lt;- as_rangelock                        genunix
  2      -&gt; zmap                                genunix
  2        -&gt; choose_addr                       genunix
  2          -&gt; map_addr                        unix
  2            -&gt; map_addr_proc                 unix
  2              -&gt; as_gap_aligned              genunix
  2                -&gt; avl_first                 genunix
  2                &lt;- avl_first                 genunix
  2                -&gt; as_findseg                genunix
  2                  -&gt; avl_find                genunix
  2                    -&gt; as_segcompar          genunix
  2                    &lt;- as_segcompar          genunix
  2                  &lt;- avl_find                genunix
  2                &lt;- as_findseg                genunix
  2                -&gt; avl_walk                  genunix
  2                &lt;- avl_walk                  genunix
  2                -&gt; avl_walk                  genunix
  2                &lt;- avl_walk                  genunix
  2                -&gt; valid_va_range_aligned    unix
  2                &lt;- valid_va_range_aligned    unix
  2              &lt;- as_gap_aligned              genunix
  2            &lt;- map_addr_proc                 unix
  2          &lt;- map_addr                        unix
  2        &lt;- choose_addr                       genunix
  2        -&gt; as_map                            genunix
  2          -&gt; as_map_locked                   genunix
  2            -&gt; gethrestime                   genunix
  2              -&gt; pc_gethrestime              unix
  2                -&gt; gethrtime                 genunix
  2                  -&gt; tsc_gethrtime           unix
  2                  &lt;- tsc_gethrtime           unix
  2                &lt;- gethrtime                 genunix
  2              &lt;- pc_gethrestime              unix
  2            &lt;- gethrestime                   genunix
  2            -&gt; as_map_ansegs                 genunix
  2              -&gt; map_pgszcvec                unix
  2                -&gt; map_szcvec                unix
  2                &lt;- map_szcvec                unix
  2              &lt;- map_pgszcvec                unix
  2              -&gt; as_map_segvn_segs           genunix
  2                -&gt; seg_alloc                 genunix
  2                  -&gt; valid_va_range          unix
  2                    -&gt; valid_va_range_aligned   unix
  2                    &lt;- valid_va_range_aligned   unix
  2                  &lt;- valid_va_range          unix
  2                  -&gt; valid_usr_range         unix
  2                  &lt;- valid_usr_range         unix
  2                  -&gt; kmem_cache_alloc        genunix
  2                  &lt;- kmem_cache_alloc        genunix
  2                  -&gt; mutex_init              unix
  2                  &lt;- mutex_init              unix
  2                  -&gt; seg_attach              genunix
  2                    -&gt; as_addseg             genunix
  2                      -&gt; gethrestime         genunix
  2                        -&gt; pc_gethrestime    unix
  2                          -&gt; gethrtime       genunix
  2                            -&gt; tsc_gethrtime   unix
  2                            &lt;- tsc_gethrtime   unix
  2                          &lt;- gethrtime       genunix
  2                        &lt;- pc_gethrestime    unix
  2                      &lt;- gethrestime         genunix
  2                      -&gt; avl_walk            genunix
  2                      &lt;- avl_walk            genunix
  2                      -&gt; avl_insert_here     genunix
  2                        -&gt; avl_insert        genunix
  2                        &lt;- avl_insert        genunix
  2                      &lt;- avl_insert_here     genunix
  2                    &lt;- as_addseg             genunix
  2                  &lt;- seg_attach              genunix
  2                &lt;- seg_alloc                 genunix
  2                -&gt; segvn_create              genunix
  2                  -&gt; anon_resvmem            genunix
  2                    -&gt; rctl_incr_swap        genunix
  2                    &lt;- rctl_incr_swap        genunix
  2                  &lt;- anon_resvmem            genunix
  2                  -&gt; hat_map                 unix
  2                  &lt;- hat_map                 unix
  2                  -&gt; crhold                  genunix
  2                  &lt;- crhold                  genunix
  2                  -&gt; avl_walk                genunix
  2                  &lt;- avl_walk                genunix
  2                  -&gt; avl_walk                genunix
  2                  &lt;- avl_walk                genunix
  2                  -&gt; kmem_cache_alloc        genunix
  2                  &lt;- kmem_cache_alloc        genunix
  2                  -&gt; lgrp_privm_policy_set   unix
  2                    -&gt; lgrp_mem_policy_default   unix
  2                    &lt;- lgrp_mem_policy_default   unix
  2                  &lt;- lgrp_privm_policy_set   unix
  2                &lt;- segvn_create              genunix
  2              &lt;- as_map_segvn_segs           genunix
  2            &lt;- as_map_ansegs                 genunix
  2            -&gt; as_setwatch                   genunix
  2              -&gt; avl_numnodes                genunix
  2              &lt;- avl_numnodes                genunix
  2            &lt;- as_setwatch                   genunix
  2          &lt;- as_map_locked                   genunix
  2        &lt;- as_map                            genunix
  2      &lt;- zmap                                genunix
  2      -&gt; as_rangeunlock                      genunix
  2        -&gt; cv_signal                         genunix
  2        &lt;- cv_signal                         genunix
  2      &lt;- as_rangeunlock                      genunix
  2    &lt;- smmap_common                          genunix
  2  &lt;- smmap32                                 genunix
</code></pre>
<p>It&#8217;s also a very good idea to generate an index of all the source files like so:</p>
<pre><code>ls -R illumos_gate &gt; ig-ix
</code></pre>
<p>In the above output, the kernel is alternating between two modules: unix and<br />
genunix. Genunix contains all the functions that are platform independent,<br />
while unix contains functions that are platform specific. For example,<br />
<code>pc_gethrestime</code> is probably unique to the x86 version of the kernel. As can be<br />
seen, generic code calls non-generic code, and vice versa, all the time. I<br />
think this is an important point. Most undergrad curicula present the design of<br />
software as the layering of one body of code on top of another, as opposed to a<br />
web of functions or modules calling each other at various points of execution.</p>
<p>Tangent 1: Dogmatically layering abstractions on top of each other, because<br />
it&#8217;s The Right Way(tm), is a very bad habit to get into, and is an example of<br />
defensive programming. Layering often results in needlessly deep call-stacks,<br />
and code that is not very cache coherent. The kernel needs to be portable, but<br />
it also needs to be fast. The kernel engineers accomplished this by using<br />
conditional compilation, instead of aggressive layering. In fact, in the last<br />
decade or so, the kernel team has been collapsing layers to both improve<br />
performance and relialibility; witness <a href="http://en.wikipedia.org/wiki/ZFS">ZFS</a>. Also, <a href="http://en.wikipedia.org/wiki/Solaris_Zones">Zones</a> are a method of improving the perfomance of virtualization by<br />
removing unneccessary layers (like the CPU emulator).</p>
<p>Tangent 2: To go on a tanget to this tangent, I did some stack traces on<br />
<a href="http://mercurial.selenic.com/">Mercurial</a>, a python application. In this particular instance, I ran<br />
<code>hg log</code> and saved the stack traces to a file. You&#8217;ll notice that the default<br />
stack depth of dtrace&#8217;s <code>ustack</code> action is too small, and I had to expand it to<br />
150.  These stacks are very, very deep. We&#8217;re monitoring an application that is<br />
layered on top of the Python VM. Which goes to show that layering can lead to<br />
the problems mentioned above. Not that the Mercurial code is bad; in fact, it<br />
is a very high-quality and one of my favourite revision control systems. It already<br />
performs quite well. I&#8217;m just demonstrating that layering results in deep call<br />
stacks which are potentially problematic for some programs.</p>
<p>Here&#8217;s the one-liner used on <code>hg log</code>:</p>
<pre><code>pfexec dtrace -c 'hg log' -n 'syscall:::entry /pid == $target/
    {@[ustack(150)] = count();}'
</code></pre>
<p>Here&#8217;s one of the smaller stack-traces from <code>hg log</code>:</p>
<pre><code>          libc.so.1`__read+0x15
          libc.so.1`_filbuf+0xd3
          libc.so.1`fread+0x118
          libpython2.6.so.1.0`Py_UniversalNewlineFread+0x160
          libpython2.6.so.1.0`file_read+0xeb
          libpython2.6.so.1.0`PyCFunction_Call+0x19d
          libpython2.6.so.1.0`call_function+0x3e6
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`function_call+0x190
          libpython2.6.so.1.0`PyObject_Call+0x67
          libpython2.6.so.1.0`PyObject_CallFunctionObjArgs+0x3d
          libpython2.6.so.1.0`slot_tp_descr_get+0x93
          libpython2.6.so.1.0`PyObject_GenericGetAttr+0x1e7
          libpython2.6.so.1.0`PyObject_GetAttr+0x96
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x27a4
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`fast_function+0x174
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`fast_function+0x174
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`gen_send_ex+0xbf
          libpython2.6.so.1.0`gen_iternext+0x18
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x1fec
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`function_call+0x13d
          libpython2.6.so.1.0`PyObject_Call+0x67
          libpython2.6.so.1.0`ext_do_call+0x156
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2d88
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`function_call+0x13d
          libpython2.6.so.1.0`PyObject_Call+0x67
          libpython2.6.so.1.0`ext_do_call+0x156
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2d88
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`fast_function+0x174
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`fast_function+0x174
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`fast_function+0x174
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`fast_function+0x174
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`fast_function+0x10b
          libpython2.6.so.1.0`call_function+0xe4
          libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb
          libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb
          libpython2.6.so.1.0`PyEval_EvalCode+0x35
          libpython2.6.so.1.0`run_mod+0x3d
          libpython2.6.so.1.0`PyRun_FileExFlags+0x6e
          libpython2.6.so.1.0`PyRun_SimpleFileExFlags+0x18a
          libpython2.6.so.1.0`PyRun_AnyFileExFlags+0x71
          libpython2.6.so.1.0`Py_Main+0xa9a
          isapython2.6`main+0x66
          isapython2.6`_start+0x7d
           77
</code></pre>
<p>Anyway, apologies for the tangents, but it&#8217;s a topic I feel strongly about.<br />
Back to the main plot.</p>
<p>Where does one go from here? Study the AVL tree implementation of the kernel?<br />
Read the assembly responsible for memory mapping? Dive into <code>gethrestime</code>,<br />
which would probably lead us into the cyclics subsystem? Perhaps, one should<br />
dig into the kernel memory allocator, entered through <code>kmem_cache_alloc</code>? There<br />
are many starting points, for the budding Illumos kernel engineers out there.</p>
<p>The main contention was whether or not I ought to study the mechanics of <code>mmap</code><br />
via synthesis or decomposition. Seeing as how I tend to write code,<br />
predominantly, in a bottom-up style, it seems most natural to explore <code>mmap</code> via<br />
synthesis. Also, the way these blog posts are written is an example of<br />
synthesis (translation: I&#8217;m winging them).</p>
<p>The inner-most functions in the stack trace are, in chronological order:</p>
<ul>
<li><code>as_rangelock</code></li>
<li><code>avl_first</code></li>
<li><code>as_segcompar</code></li>
<li><code>avl_walk</code></li>
<li><code>valid_va_range_aligned</code></li>
<li><code>tsc_gethrtime</code></li>
<li><code>map_szcvec</code></li>
<li><code>valid_usr_range</code></li>
<li><code>kmem_cache_alloc</code></li>
<li><code>mutex_init</code></li>
<li><code>avl_insert</code></li>
<li><code>rctl_incr_swap</code></li>
<li><code>hat_map</code></li>
<li><code>crhold</code></li>
<li><code>lgrp_mem_policy_default</code></li>
<li><code>avl_numnodes</code></li>
<li><code>cv_signal</code></li>
</ul>
<p>It seems that all functions that are prefixed with <code>as_</code> are functions related<br />
to address space management. The block comments in <code>as.h</code> are good introductory<br />
material to the structures used by these functions (and should be read prior to<br />
continuing). Starting with <code>as_rangelock</code>. It&#8217;s implementation is in<br />
<code>vm_as.c</code>:</p>
<pre><code>/*
 * Serialize all searches for holes in an address space to
 * prevent two or more threads from allocating the same virtual
 * address range.  The address space must not be "read/write"
 * locked by the caller since we may block.
 */
void
as_rangelock(struct as *as)
{
    mutex_enter(&amp;as-&gt;a_contents);
    while (AS_ISCLAIMGAP(as))
        cv_wait(&amp;as-&gt;a_cv, &amp;as-&gt;a_contents);
    AS_SETCLAIMGAP(as);
    mutex_exit(&amp;as-&gt;a_contents);
}
</code></pre>
<p>The first thing I noticed was that <code>mutex_enter</code> and <code>mutex_exit</code>, didn&#8217;t<br />
appear in the dtrace output. I suspect this is due to function inlining.<br />
There&#8217;s a quick way to check, using mdb, the modular debugger.</p>
<p>Run this:</p>
<pre><code>pfexec mdb -k
</code></pre>
<p>A command prompt should pop up.</p>
<p>This disassembles the function.</p>
<pre><code>&gt; as_rangelock::dis
as_rangelock:                   pushq  %rbp
as_rangelock+1:                 movq   %rsp,%rbp
as_rangelock+4:                 subq   $0x8,%rsp
as_rangelock+8:                 movq   %rdi,-0x8(%rbp)
as_rangelock+0xc:               pushq  %r12
as_rangelock+0xe:               pushq  %r13
as_rangelock+0x10:              subq   $0x8,%rsp
as_rangelock+0x14:              movq   %rdi,%r13
as_rangelock+0x17:              call   -0x2e46b4        &lt;mutex_enter&gt;
as_rangelock+0x1c:              movzbl 0x8(%r13),%eax
as_rangelock+0x21:              testl  $0x40,%eax
as_rangelock+0x26:              je     +0x1b    &lt;as_rangelock+0x43&gt;
as_rangelock+0x28:              leaq   0xa(%r13),%r12
as_rangelock+0x2c:              movq   %r12,%rdi
as_rangelock+0x2f:              movq   %r13,%rsi
as_rangelock+0x32:              call   -0x175667        &lt;cv_wait&gt;
as_rangelock+0x37:              movzbl 0x8(%r13),%eax
as_rangelock+0x3c:              testl  $0x40,%eax
as_rangelock+0x41:              jne    -0x17    &lt;as_rangelock+0x2c&gt;
as_rangelock+0x43:              orl    $0x40,%eax
as_rangelock+0x46:              movb   %al,0x8(%r13)
as_rangelock+0x4a:              movq   %r13,%rdi
as_rangelock+0x4d:              call   -0x2e45ca        &lt;mutex_exit&gt;
as_rangelock+0x52:              addq   $0x8,%rsp
as_rangelock+0x56:              popq   %r13
as_rangelock+0x58:              popq   %r12
as_rangelock+0x5a:              leave
as_rangelock+0x5b:              ret
&gt; ::quit
</code></pre>
<p>So, suprisingly, the calls and symbols are there. It&#8217;s just that DTrace doesn&#8217;t<br />
trace them. It turns out, that while DTrace <em>does</em> match <code>mutex_enter</code> and<br />
<code>mutex_exit</code> as valid <code>fbt</code> probes on Illumos 147, it doesn&#8217;t trace them<br />
because the lock primitives aren&#8217;t instrumented.</p>
<p>Though, it would appear, that the DTrace creators didn&#8217;t intend for the probe<br />
to be matched, as indicated in chapter 3, page 64 of <a href="http://amzn.to/faaZ4E">the dtrace book</a>.</p>
<p>Also, for those who don&#8217;t typically read x86 assembly, reading the above output<br />
would be made easier with an <a href="http://en.wikipedia.org/wiki/Intel_assembly">introductory article to x86 assembly</a> and a <a href="http://en.wikipedia.org/wiki/X86_instruction_listings">listing of all x86 instructions</a>. </p>
<p>So <code>mutex_enter</code> claims a mutex, and <code>mutex_exit</code> releases it. Claiming a mutex<br />
creates a <a href="http://en.wikipedia.org/wiki/Mutex">mutual exclusion</a> over some part of memory, so that<br />
only one thread can write to that part of memory, while other threads wait<br />
(&#8220;block&#8221; is a common synonym for wait).</p>
<p>When we do the <code>mutex_enter</code> we are locking down some of the fields of the<br />
address space structure. Then <code>as_rangelock</code> proceeds make a claim over the<br />
address space, if it isn&#8217;t already claimed. If it is already claimed,<br />
<code>as_rangelock</code> waits (via <code>cv_wait</code>) for the claim to be withdrawn, and<br />
then claims it, but releases the original mutex. As can be seen in the DTrace<br />
output, we didn&#8217;t have to wait.</p>
<p>As is indicated in <code>as.h</code> the address-space structures and meta-data are stored<br />
in an <a href="http://en.wikipedia.org/wiki/AVL_tree">avl tree</a>, and this is visible in the dtrace output.<br />
Some <code>as_</code> functions make calls to <code>avl_</code> functions. What&#8217;s even more<br />
interesting is that <code>avl_find</code> makes a call to <code>as_segcompar</code>; <code>avl_find</code><br />
probably takes a callback that it uses for comparison purposes. We&#8217;ll find out<br />
later.</p>
<p>The only other <code>as_</code> function that&#8217;s an innermost function is <code>as_segcompar</code><br />
(according to DTrace).</p>
<pre><code>/*
 * compar segments (or just an address) by segment address range
 */
static int
as_segcompar(const void *x, const void *y)
{
    struct seg *a = (struct seg *)x;
    struct seg *b = (struct seg *)y;

    if (a-&gt;s_base &lt; b-&gt;s_base)
        return (-1);
    if (a-&gt;s_base &gt;= b-&gt;s_base + b-&gt;s_size)
        return (1);
    return (0);
}
</code></pre>
<p>And this turns out to be true. Though I&#8217;m not sure why &#8216;compare&#8217; is misspelled.<br />
I suspect that this is historical, and that the developers of the original unix<br />
were motivited by similar conditions that motivated the unfortunate misspelling<br />
of the <code>creat</code> system call. I&#8217;m sure they had really good reasons to do this &#8211;<br />
but typeing and reading <code>creat</code> still irks the crap out of me.</p>
<p>So it&#8217;s just a simple comparison function.</p>
<p>That&#8217;s all for the inner-most <code>as_</code> functions. Onto <code>avl_</code> functions.<br />
<code>avl_first</code> is located in <code>usr/src/common/avl.c</code>. Before continuing, give<br />
<code>avl.h</code> a glance.</p>
<pre><code>/*
 * Return the lowest valued node in a tree or NULL.
 * (leftmost child from root of tree)
 */
void *
avl_first(avl_tree_t *tree)
{
    avl_node_t *node;
    avl_node_t *prev = NULL;
    size_t off = tree-&gt;avl_offset;

    for (node = tree-&gt;avl_root;
        node != NULL;
        node = node-&gt;avl_child[0])
        prev = node;

    if (prev != NULL)
        return (AVL_NODE2DATA(prev, off));
    return (NULL);
}
</code></pre>
<p>Once one understands the structures and macros in <code>avl.h</code>, the above should be<br />
self explanatory.</p>
<p>Next up, <code>avl_walk</code>.</p>
<pre><code>/*
 * Walk from one node to the previous valued node (ie. an infix walk
 * towards the left). At any given node we do one of 2 things:
 *
 * - If there is a left child, go to it, then to it's rightmost
 * descendant.
 *
 * - otherwise we return thru parent nodes until we've come from a
 * right child.
 *
 * Return Value:
 * NULL - if at the end of the nodes
 * otherwise next node
 */
void *
avl_walk(avl_tree_t *tree, void *oldnode, int left)
{
    size_t off = tree-&gt;avl_offset;
    avl_node_t *node = AVL_DATA2NODE(oldnode, off);
    int right = 1 - left;
    int was_child;

    /*
     * nowhere to walk to if tree is empty
     */
    if (node == NULL)
        return (NULL);

    /*
     * Visit the previous valued node. There are two possibilities:
     *
     * If this node has a left child, go down one left, then all
     * the way right.
     */
    if (node-&gt;avl_child[left] != NULL) {
        for (node = node-&gt;avl_child[left];
            node-&gt;avl_child[right] != NULL;
            node = node-&gt;avl_child[right])
            ;
    /*
     * Otherwise, return thru left children as far as we can.
     */
    } else {
        for (;;) {
            was_child = AVL_XCHILD(node);
            node = AVL_XPARENT(node);
            if (node == NULL)
                return (NULL);
            if (was_child == right)
                break;
        }
    }

    return (AVL_NODE2DATA(node, off));
}
</code></pre>
<p>So this is slightly more involved, though it should be intuitive to those<br />
familiar with AVL Trees. Basically <code>avl_walk</code> attempts to find the biggest<br />
value that is less than the current node we&#8217;re in (i.e. <code>void *oldnode</code>).</p>
<p>Continueing with <code>avl_insert</code>. Quickly glossing over the code reveals that<br />
<code>avl_insert</code> isn&#8217;t an inner-most function as it might call <code>avl_rotation</code>.</p>
<p>Either way, <code>avl_rotation</code> rotates and rebalances the tree, using <code>new_balance</code><br />
to calculate how to rotate it. <code>avl_insert</code> merely adds data to the tree as a<br />
new leaf, and then call <code>avl_rotation</code> if a rebalancing is required.</p>
<p><code>avl_numnodes</code> merely returns the number of nodes in the tree. Accordingly,<br />
it&#8217;s a very short function.</p>
<pre><code>/*
 * Return the number of nodes in an AVL tree.
 */
ulong_t
avl_numnodes(avl_tree_t *tree)
{
    ASSERT(tree);
    return (tree-&gt;avl_numnodes);
}
</code></pre>
<p>So that covers the innermost <code>avl_</code> and <code>as_</code> functions. Let&#8217;s move up one<br />
layer of functions and have a look at <code>avl_find</code>.</p>
<pre><code>/*
 * Search for the node which contains "value".  The algorithm is a
 * simple binary tree search.
 *
 * return value:
 *      NULL: the value is not in the AVL tree
 *              *where (if not NULL)  is set to indicate the insertion
 *              point
 *      "void *"  of the found tree node
 */
void *
avl_find(avl_tree_t *tree, const void *value, avl_index_t *where)
{
    avl_node_t *node;
    avl_node_t *prev = NULL;
    int child = 0;
    int diff;
    size_t off = tree-&gt;avl_offset;

    for (node = tree-&gt;avl_root; node != NULL;
        node = node-&gt;avl_child[child]) {

        prev = node;

        diff = tree-&gt;avl_compar(value, AVL_NODE2DATA(node, off));
        ASSERT(-1 &lt;= diff &amp;&amp; diff &lt;= 1);
        if (diff == 0) {
#ifdef DEBUG
            if (where != NULL)
                *where = 0;
#endif
            return (AVL_NODE2DATA(node, off));
        }
        child = avl_balance2child[1 + diff];

    }

    if (where != NULL)
        *where = AVL_MKINDEX(prev, child);

    return (NULL);
}
</code></pre>
<p>As can be seen inside the for-loop, we do have a user-defined comparison<br />
function, stored in the tree structure. In the case of the <code>as_</code> functions,<br />
it&#8217;s <code>as_segcompar</code>. Also, this misspelling of &#8216;compare&#8217; seems to be<br />
pathologically effecting the Illumos code base. Sad day.</p>
<p>And on and on it goes. DTrace a syscall, find the bottom-level functions,<br />
understand them, move up one level, synthesize, repeat.</p>
<p>That&#8217;s my modus operandi whenever I want to understand a system call, some part<br />
of the kernel, or even a user land application. It&#8217;s not enough to ask what a<br />
function does. I care about how various components interact with each other.</p>
<p>Most people seem content starting from <code>main</code> and working their way down to the<br />
depths of the kernel/app. That&#8217;s only because <code>main</code> provides a convenient<br />
starting point in the absence of DTrace (which apparently works wonderfully as<br />
a code-mapper). DTrace gives the developer the ability to see the interplay<br />
between functions, between modules, and between processes.</p>
<p>Sythesizing how a program works is more efficient (IMHO), because one has less<br />
mental state to keep track of and does less mental context switching between<br />
functions. Try it some time.</p>
<p>I&#8217;m sure most people think DTrace is exclusively a performance analysis tool,<br />
which is what it was meant to be used as, but I hope I&#8217;ve demonstrated that it<br />
is also extremely well suited for edificatory purposes.</p>
<p>Now, go forth and DTrace.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/nickziv.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/nickziv.wordpress.com/22/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/nickziv.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/nickziv.wordpress.com/22/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/nickziv.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/nickziv.wordpress.com/22/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/nickziv.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/nickziv.wordpress.com/22/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/nickziv.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/nickziv.wordpress.com/22/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/nickziv.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/nickziv.wordpress.com/22/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/nickziv.wordpress.com/22/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/nickziv.wordpress.com/22/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=nickziv.wordpress.com&amp;blog=15835448&amp;post=22&amp;subd=nickziv&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://nickziv.wordpress.com/2011/04/08/adventures-of-a-dtrace-addict-part-0/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ae77a9245fb476328a04af02a009f2a8?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">nickzivkovic</media:title>
		</media:content>
	</item>
	</channel>
</rss>
