<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mighty Heave Software</title>
	<atom:link href="http://www.mightyheave.com/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.mightyheave.com/blog</link>
	<description>The Quest for Real Productivity</description>
	<lastBuildDate>Sat, 03 Jul 2010 05:33:25 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>wh 0.5.5</title>
		<link>http://www.mightyheave.com/blog/?p=315</link>
		<comments>http://www.mightyheave.com/blog/?p=315#comments</comments>
		<pubDate>Sat, 03 Jul 2010 05:33:25 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.mightyheave.com/blog/?p=315</guid>
		<description><![CDATA[Wormhole Parser Generator 0.5.5 is now available under downloads.
Features include:
* Revamped conditional inclusion mechanism (much better now)
* Use the xp/vista scheme, looks better
* Added the ability to Cancel, which was needed badly for long ambiguity  searches.
* Other less noteworthy fixes]]></description>
			<content:encoded><![CDATA[<p><a href="/downloads/whsetup-0.5.5.exe">Wormhole Parser Generator 0.5.5</a> is now available under downloads.<br />
Features include:<br />
* Revamped conditional inclusion mechanism (much better now)<br />
* Use the xp/vista scheme, looks better<br />
* Added the ability to Cancel, which was needed badly for long ambiguity  searches.<br />
* Other less noteworthy fixes</p>]]></content:encoded>
			<wfw:commentRss>http://www.mightyheave.com/blog/?feed=rss2&amp;p=315</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Release of the week</title>
		<link>http://www.mightyheave.com/blog/?p=294</link>
		<comments>http://www.mightyheave.com/blog/?p=294#comments</comments>
		<pubDate>Sat, 16 Jan 2010 04:41:27 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[post]]></category>

		<guid isPermaLink="false">http://www.mightyheave.com/blog/?p=294</guid>
		<description><![CDATA[Here comes the wormhole parser generator release of the week, 0.5.4.  Mainly a consequence of working with the C++ grammar, I fixed the broken %rank disambiguation, and added %short/%long disambiguation as well.  Finally, I got tired of the long non-interactive stalls, and now ambiguity searches and grammar loading takes place in background threads. [...]]]></description>
			<content:encoded><![CDATA[<p>Here comes the wormhole parser generator release of the week, 0.5.4.  Mainly a consequence of working with the C++ grammar, I fixed the broken %rank disambiguation, and added %short/%long disambiguation as well.  Finally, I got tired of the long non-interactive stalls, and now ambiguity searches and grammar loading takes place in background threads.  0.5.4 also includes a fix to lexercommon.cpp, which was incorrectly using delete on memory it obtained with malloc/realloc.  As usual, the installer is available in the downloads section.</p>]]></content:encoded>
			<wfw:commentRss>http://www.mightyheave.com/blog/?feed=rss2&amp;p=294</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Monster C++ Ambiguity Search</title>
		<link>http://www.mightyheave.com/blog/?p=288</link>
		<comments>http://www.mightyheave.com/blog/?p=288#comments</comments>
		<pubDate>Sat, 16 Jan 2010 00:19:48 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[post]]></category>

		<guid isPermaLink="false">http://www.mightyheave.com/blog/?p=288</guid>
		<description><![CDATA[I&#8217;ve been trying to get my C++ parser in shape, so I&#8217;ll do an ambiguity search, correct the grammar, and then another search.  I just finished a monster C++ ambiguity search.  Depth: 6.  The first couple of these turned out to need syntactic disambiguation. 
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been trying to get my C++ parser in shape, so I&#8217;ll do an ambiguity search, correct the grammar, and then another search.  I just finished a monster C++ ambiguity search.  Depth: 6.  The first couple of these turned out to need syntactic disambiguation. </p>
<p><img src="http://www.mightyheave.com/blog/wp-content/uploads/2010/01/monstersearch.jpg" alt="monstersearch" title="monstersearch" width="567" height="320" class="aligncenter size-full wp-image-287" /></p>]]></content:encoded>
			<wfw:commentRss>http://www.mightyheave.com/blog/?feed=rss2&amp;p=288</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Just uploaded wh parser generator 0.5.3&#8230;.</title>
		<link>http://www.mightyheave.com/blog/?p=284</link>
		<comments>http://www.mightyheave.com/blog/?p=284#comments</comments>
		<pubDate>Sat, 09 Jan 2010 07:29:09 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[status]]></category>

		<guid isPermaLink="false">http://www.mightyheave.com/blog/?p=284</guid>
		<description><![CDATA[Just uploaded wh parser generator 0.5.3.3, which includes the &#8220;final&#8221; version of the ambiguity checker, a depth first search that is quite stable on enormous search spaces.  Of course, don&#8217;t expect deep searches to finish in a human lifetime.  Available in the downloads. ]]></description>
			<content:encoded><![CDATA[<p>Just uploaded wh parser generator 0.5.3.3, which includes the &#8220;final&#8221; version of the ambiguity checker, a depth first search that is quite stable on enormous search spaces.  Of course, don&#8217;t expect deep searches to finish in a human lifetime.  Available in the downloads. </p>]]></content:encoded>
			<wfw:commentRss>http://www.mightyheave.com/blog/?feed=rss2&amp;p=284</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More fun with Grammar Ambiguities</title>
		<link>http://www.mightyheave.com/blog/?p=278</link>
		<comments>http://www.mightyheave.com/blog/?p=278#comments</comments>
		<pubDate>Sun, 03 Jan 2010 18:32:06 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[post]]></category>
		<category><![CDATA[Parsers]]></category>

		<guid isPermaLink="false">http://www.mightyheave.com/blog/?p=278</guid>
		<description><![CDATA[Just cut wormhole 0.5.3.2, which improves the ambiguity checking sufficiently to filter out ambiguities due to &#8220;resolved&#8221; conflicts.
The ambiguity checker can reach a bit further into the C99 search space, going 6 deep before it runs out of memory.  All conflicts in C99 are resolved, but C++ awaits.  It needs more work before [...]]]></description>
			<content:encoded><![CDATA[<p>Just cut wormhole 0.5.3.2, which improves the ambiguity checking sufficiently to filter out ambiguities due to &#8220;resolved&#8221; conflicts.</p>
<p>The ambiguity checker can reach a bit further into the C99 search space, going 6 deep before it runs out of memory.  All conflicts in C99 are resolved, but C++ awaits.  It needs more work before it can truly do a depth first search or at least iteratively deepening.  Once that is taken care of, the only real issue will be the massive search space.</p>]]></content:encoded>
			<wfw:commentRss>http://www.mightyheave.com/blog/?feed=rss2&amp;p=278</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Happy New Years!</title>
		<link>http://www.mightyheave.com/blog/?p=277</link>
		<comments>http://www.mightyheave.com/blog/?p=277#comments</comments>
		<pubDate>Fri, 01 Jan 2010 06:15:41 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[status]]></category>

		<guid isPermaLink="false">http://www.mightyheave.com/blog/?p=277</guid>
		<description><![CDATA[Happy New Years!]]></description>
			<content:encoded><![CDATA[<p>Happy New Years!</p>]]></content:encoded>
			<wfw:commentRss>http://www.mightyheave.com/blog/?feed=rss2&amp;p=277</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wormhole parser generator 0.5.3</title>
		<link>http://www.mightyheave.com/blog/?p=270</link>
		<comments>http://www.mightyheave.com/blog/?p=270#comments</comments>
		<pubDate>Wed, 30 Dec 2009 22:57:33 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.mightyheave.com/blog/?p=270</guid>
		<description><![CDATA[The 0.5.3 installer has been readied.  The latest includes fixes to the ambiguity search, as well as a new symbol string ambiguity checker, and miscellaneous fixes.  Here&#8217;s the symbol string ambiguity checker in action:
]]></description>
			<content:encoded><![CDATA[<p>The 0.5.3 installer has been readied.  The latest includes fixes to the ambiguity search, as well as a new symbol string ambiguity checker, and miscellaneous fixes.  Here&#8217;s the symbol string ambiguity checker in action:</p>
<p><img class="aligncenter size-full wp-image-271" title="ambgteststring" src="http://www.mightyheave.com/blog/wp-content/uploads/2009/12/ambgteststring.jpg" alt="ambgteststring" width="562" height="318" /></p>]]></content:encoded>
			<wfw:commentRss>http://www.mightyheave.com/blog/?feed=rss2&amp;p=270</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Available in the next release of the wor&#8230;</title>
		<link>http://www.mightyheave.com/blog/?p=267</link>
		<comments>http://www.mightyheave.com/blog/?p=267#comments</comments>
		<pubDate>Wed, 30 Dec 2009 18:32:34 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[status]]></category>

		<guid isPermaLink="false">http://www.mightyheave.com/blog/?p=267</guid>
		<description><![CDATA[Available in the next release of the wormhole grammar analyzer, the string ambiguity checker.  Pick a symbol off the list, add to the string, repeat.  Click OK and it will check if the string is ambiguous in the currently loaded grammar.]]></description>
			<content:encoded><![CDATA[<p>Available in the next release of the wormhole grammar analyzer, the string ambiguity checker.  Pick a symbol off the list, add to the string, repeat.  Click OK and it will check if the string is ambiguous in the currently loaded grammar.</p>]]></content:encoded>
			<wfw:commentRss>http://www.mightyheave.com/blog/?feed=rss2&amp;p=267</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ambiguity Checker, new with 0.5.2</title>
		<link>http://www.mightyheave.com/blog/?p=254</link>
		<comments>http://www.mightyheave.com/blog/?p=254#comments</comments>
		<pubDate>Tue, 29 Dec 2009 07:00:32 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.mightyheave.com/blog/?p=254</guid>
		<description><![CDATA[It&#8217;s a tad early for mainstream use, but I was so excited to get it working, I couldn&#8217;t wait.  Wormhole parser generator 0.5.2 includes an ambiguity checker.  It works, but the probabilistic functionality does nothing.

This of course, would be much more useful if the GLR generator subsystem of the wormhole parser generator weren&#8217;t currently broken, [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s a tad early for mainstream use, but I was so excited to get it working, I couldn&#8217;t wait.  Wormhole parser generator 0.5.2 includes an ambiguity checker.  It works, but the probabilistic functionality does nothing.</p>

<a href='http://www.mightyheave.com/blog/?attachment_id=255' title='amgdlg'><img width="150" height="81" src="http://www.mightyheave.com/blog/wp-content/uploads/2009/12/amgdlg.jpg" class="attachment-thumbnail" alt="" title="amgdlg" /></a>
<a href='http://www.mightyheave.com/blog/?attachment_id=256' title='amgmenu'><img width="150" height="85" src="http://www.mightyheave.com/blog/wp-content/uploads/2009/12/amgmenu.jpg" class="attachment-thumbnail" alt="" title="amgmenu" /></a>
<a href='http://www.mightyheave.com/blog/?attachment_id=257' title='amgresult'><img width="150" height="130" src="http://www.mightyheave.com/blog/wp-content/uploads/2009/12/amgresult.jpg" class="attachment-thumbnail" alt="" title="amgresult" /></a>

<p><span id="more-254"></span></p>
<p>This of course, would be much more useful if the GLR generator subsystem of the wormhole parser generator weren&#8217;t currently broken, but nobody&#8217;s perfect.</p>
<p>If you didn&#8217;t already know, checking for grammar ambiguities in the general case is a fool&#8217;s errand, because it has been proven undecidable (the algorithm to decide if a grammar is ambiguous will run forever on some grammars, and we can&#8217;t even decide if it will run forever or not).  Despite the exponential growth of our search, this doesn&#8217;t mean we are stuck with nothing.  The exact length of what can be checked depends on the available system resources, and effective branching factor of the language, but by using the GLR algorithm and turning it into a next symbol generator, we are able to check billions of permutations resulting is finding ambiguities up to dozens of symbols long, sufficient for many modern languages.</p>]]></content:encoded>
			<wfw:commentRss>http://www.mightyheave.com/blog/?feed=rss2&amp;p=254</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A short explanation of the GLR algorithm</title>
		<link>http://www.mightyheave.com/blog/?p=236</link>
		<comments>http://www.mightyheave.com/blog/?p=236#comments</comments>
		<pubDate>Fri, 18 Dec 2009 06:34:25 +0000</pubDate>
		<dc:creator>Ralph</dc:creator>
				<category><![CDATA[Parsers]]></category>

		<guid isPermaLink="false">http://www.mightyheave.com/blog/?p=236</guid>
		<description><![CDATA[I am in the process of revamping some old code, part of which contains the GLR algorithm.  The GLR algorithm is rather involved, so I figured this was the perfect opportunity to explain it once and for all, if for no other reason than to have a copy of this on hand.
Like many parsers, the [...]]]></description>
			<content:encoded><![CDATA[<p>I am in the process of revamping some old code, part of which contains the GLR algorithm.  The GLR algorithm is rather involved, so I figured this was the perfect opportunity to explain it once and for all, if for no other reason than to have a copy of this on hand.</p>
<p>Like many parsers, the GLR algorithm is an LR based parser will build a parse tree from a series of tokens.  Unlike most though, it is able to handle ambiguous grammars.  Of practical interest, this means that a GLR parser can handle the ambiguous grammar of C++.  It accomplishes this, in concept, keeping a multitude of stacks, each representing one of the many possible parses of the input.  Naturally, the more ambiguous any particular input, the greater the performance hit for using this approach.</p>
<p>To explain what goes on inside a GLR parser.  We&#8217;ll consider what happens when processing a single input token.  As a prerequisite, you need an LR table, produced from a suitable grammar, such as the wormhole parser generator.</p>
<p><span id="more-236"></span></p>
<p>Before we dive into the algorithm, we need to understand the structure of the data. GLR carries it&#8217;s stacks in something called the graph structured stack.   The parser represents it&#8217;s &#8220;stacks&#8221; as a series of graph nodes, each node represents a state, and is connected to it&#8217;s stack predecessors, and each association between nodes carries along the parse tree for that particular part of the stack.  The structure of the parse tree isn&#8217;t important, but it is just important to know that it is carried there.</p>
<blockquote><p>struct node</p>
<p>{</p>
<p>int state;</p>
<p>link links[];</p>
<p>};</p>
<p>struct link</p>
<p>{</p>
<p>node *prev;</p>
<p>parsetree *t;</p>
<p>int reject;</p>
<p>};</p></blockquote>
<p>For instance, the stack (C, B, A) with states (1, 2, 3) respectively could be declared by hand with these structures.</p>
<blockquote><p>node C = {1, { {0, 0} } };</p>
<p>node B = {2, { {&amp;C, 0} } };</p>
<p>node A = {3, { {&amp;B,0} } };</p></blockquote>
<p>The A is on the top of the stack, while C is on the bottom.  Now, if we wanted to declare an additional stack (C, B, D) with states (1, 2, 3) respectively, we need only add a node D and hook it to B, like this.</p>
<blockquote><p>node D = {4, { {&amp;B,0} } };</p></blockquote>
<p>The stacks are in compact form, with the shared parts (C, B) requiring only one set of data.  Similarly, we could simultaneously declare stacks (C, E, A), (C, B, A), (C, E, D), (C, B, D) like this:</p>
<blockquote><p>node C = {1, { {0, 0} } };</p>
<p>node B = {2, { {&amp;C, 0} } };</p>
<p>node E = {5, { {&amp;C,0} } };</p>
<p>node A = {3, { {&amp;B,0}, {&amp;E, 0} } };</p>
<p>node D = {4, { {&amp;B,0}, {&amp;E, 0} } };</p></blockquote>
<p>Hopefully now it is clear how graph structured stacks work.  When we want to record the fact that we have these four stacks, we need to have an array with the tops of the stacks, in this case D and A.  In the GLR algorithm, we call this the <em>activestacks</em>.</p>
<blockquote><p>node *activestacks[];</p></blockquote>
<p>Like we said earlier, let&#8217;s consider what happens when we add a single token.  Now, if this were just a plain LR algorithm, we would have a single stack, and adding a token would either cause us to <em>shift</em> the token onto the stack (push a new state on the stack) a <em>reduce</em> (pop some elements from the stack, combine them into a single element, and  push the combined element onto the stack, followed by possible additional <em>reduce</em> actions until finally we <em>shift</em> the token onto the stack) or if the token is the end-of-input token (designated <em>$</em>) we might <em>accept</em> the input as complete or if that token is unexpected at that point in the parse, we might get an <em>error</em> in which case we simply drop the stack with the error from the <em>activestacks</em> list.  We&#8217;ll ignore errors, and only consider valid input tokens.</p>
<p>Well, GLR works the same way, except we need to handle all of the stacks at the same time, and sometimes, we need to handle multiple actions at the same time, where we have to <em>shift</em> and <em>reduce</em> at the same time, a situation which is considered a grammar construction error to LR algorithms, but which instead produces an additional stack in GLR.  GLR differs in another way too, since it handles ambiguous grammars as well, it necessarily had <em>reject</em> productions as well.   That is, in addition to the usual production definition, a production can be marked as a reject production, in which case if the production&#8217;s non-terminal is reduced in a regular production as well, it will be ignored instead.</p>
<blockquote><p>floating : [number] &#8216;.&#8217; [number] ;</p>
<p>floating: &#8216;.&#8217;                                              %reject ;</p></blockquote>
<p>In this example, (1.1, 1., .1, .) are all valid reductions of floating.  However &#8216;.&#8217; by itself makes a poor floating point number, so we added the reject production, and so &#8216;.&#8217; will be ignored as a floating value.</p>
<p>Let&#8217;s not get ahead of ourselves.  Our entry point where we start our update is a function, <em>update_states</em>.</p>
<blockquote><p>void update_states(int nexttoken);</p></blockquote>
<p><em>update_states</em> first job is to try to<em> </em>process the<em> activestates</em> and figure out if the <em>nexttoken</em> calls for us to <em>shift</em> or <em>reduce</em>.  Since we&#8217;ll be updating <em>activestates</em>, we immediately take a copy of them, this copy is called the <em>for_actor</em>.  Once those states are updated, we want to evaluate reject productions (if we evaluate reject productions before all states are evaluated, some reductions can go without being rejected).</p>
<blockquote><p>node *for_actor[];</p>
<p><em>copy the contents of activestates into for_actor</em></p>
<p>node *for_actor_delayed[];</p>
<p><em>for_actor_delayed is initially empty<br />
</em></p></blockquote>
<p>For each stack head in <em>for_actor</em>, we look up the actions for the state on the top of the stack(s).  <em>shift</em> actions call for us to &#8220;push&#8221; onto the stack.  In this context though, push means create a new node, with a link that points to the previous top of the stack, and to propagate that new node such that it appears within <em>activestates</em> during the next call to <em>update_states</em>.  Simply adding the new node to <em>activestates</em> is problematic.  If we try to mix new states in with old, we can&#8217;t say which are meant to survive into the next round.  Moreover, we can potentially end up adding duplicate states from other stacks in this same round.  If we try to fix this by doing a find-or-create solution, we may find nodes with matching states from the previous round, in which case we can corrupt the stack by having nodes which link to themselves.    Of course, the solution is very simple, just delay the shift until the reductions are complete, and then use a find-or-create style lookup on a seperate list.  Until we are ready to act on the shifts, we simply hold them in a list called the <em>for_shifter</em> list.</p>
<blockquote><p>struct lrshift<br />
{<br />
pnode *prevstate;<br />
int shiftto;<br />
};</p>
<p>lrshift for_shifter[];</p></blockquote>
<p>Moving on.  When the action is to <em>reduce</em>, we pull N states off the top of the stack (N = the number of symbols in the production we are reducing), combine them into a single node, and then push them onto the stack (<em>shift</em> the production) awaiting further action based on the current input token.  Since we are already processing the <em>for_actor</em> list, we simply put any new states at the end of that list, and things work out naturally.  However, this is where things start to get a little messy.  To &#8220;pull N states off the top of the stack&#8221; we have to deal with the fact that any one stack can represent a multitude of stacks.  To handle this properly, we need to make a list of all possible states of the proper length preceding the top of the stack.  We keep a <em>reducepath</em> to hold onto that list, and then call a function recursively, exploring the alternative paths N nodes deep.  For each of the accumulated paths, we merge <em>reducepath</em> into any previously accumulated paths, and call <em>reducer</em> to do the real reduction action.  We&#8217;ll explain <em>reducer</em> shortly, but for the moment the code looks roughly like this:</p>
<blockquote><p>void update_states(int nexttoken)</p>
<p>{</p>
<p>node *for_actor[] = activestates;</p>
<p>node *for_actor_delayed[]</p>
<p>lrshif for_shifter[]</p>
<p>while for_actor or for_actor_delayed are not empty</p>
<p>{</p>
<p>if  for_actor is empty, remove a node from for_actor_delayed and put into for_actor</p>
<p>for each node in for_actor</p>
<p>{</p>
<p>for each action from the node.state on input nexttoken</p>
<p>{</p>
<p>if the action is shift, add to for_shifter</p>
<p>if the action is reduce (accept, reject), call do_reductions(node, N = the number of symbols in the production being reduced, p = the production we are reducing)</p>
<p>}</p>
<p>}</p>
<p>empty for_actor</p>
<p>}</p>
<p><em>(Apply the shifts in for_shifter)</em></p>
<p><em>(Drop rejected paths)<br />
</em></p>
<p>}</p>
<p>void do_reductions(node *state, int count, production p)</p>
<p>{</p>
<p>if( count &gt; 0 )</p>
<p>{</p>
<p>(<em>This is the part that expands all the possible stacks preceding state</em>)</p>
<p>add state.t to the end of <em>reducepath</em></p>
<p>do_reductions(state.prev, count-1)</p>
<p><em> </em>remove the last element of <em>reducepath<br />
</em></p>
<p>}</p>
<p>(<em>We fall through to this once we accumulate a stack of size N</em>)</p>
<p>create a new parsenode, containing the contents of <em>reducepath</em>, but do not alter <em>reducepath</em>.</p>
<p>if we are handling an accept action, store and/or merge the parsenode into the <em>accept parse tree</em> and return.</p>
<p><em>(Remember, state is actually N nodes deep in the stack now)</em></p>
<p>int next_state = the number of the state to shift to from state state.state on the input p</p>
<p>call reducer<em>(</em>state, next_state)<em><br />
</em>}</p>
<p>void reducer(node *state, int next_state);</p></blockquote>
<p>So now that we see how to get all of the possible stacks from a given state and put them into <em>reducepath</em> we need to talk about what we do with it.  If the action is in fact an <em>accept</em> action, we can stop.  Generally, we&#8217;ll store the <em>reducepath</em> as the accept parse tree.  In GLR, there are potentially more than one way to <em>accept</em> a parse string.  If there is already an accept parse tree, we merge the tree with the previous accept parse tree.  Otherwise, we call <em>reducer</em>.</p>
<p><em>reducer</em> tries to create a new node and link path from state (which is N deep on the stack) to a node with node.state = next_state.  reducer will do a find-or-create for such a state <em>activestacks</em> and a find-or-create in the node for a link to the state.  If we are rejecting, then new nodes receive a reject flag, links (new and old alike) receive a reject flag.  We search for them in <em>activestacks</em> because we might have already created such a state.  When new nodes are created from reduction, they are put into for_actor for evaluation, and when they are created from rejection they are put into <em>for_actor_delayed</em> so they can be sure to reject at the right time.</p>
<p>When the link to state is found to exist already, it means that we have found an ambiguity, and the previously existing parsetree link.t must be merged with the new parsetree formed by the merged <em>reducepath</em>.  On the other hand, adding a link to state can create a number of new stacks that simply didn&#8217;t exist before.  This simply means that there are new ways to <em>reduce</em> previously inspected stacks.  It also means that we must run <em>reducer</em> on all of the new stacks created by the addition of the new link.  To do this, we simply create a new function <em>do_new_reductions</em> that behaves much like <em>do_reductions</em>, except will only bother to call <em>reducer</em> if the stack contains the new link, otherwise it is ignored, and we run this new function on all of the nodes in <em>activestacks</em>.</p>
<p>If there are new ways to <em>reduce</em> previously inspected stacks, then there are new ways to <em>reject</em> them also, and I have found that under some circumstances, despite the presence of <em>for_actor_delayed</em>, the standard GLR algorithm does not apply all <em>rejects</em> (or, depending on how you look at it, sometimes my reject rules are malformed).  Right now, I wish I could remember the exact circumstances.  I believe it was something like this using the SGLR variant:</p>
<blockquote><p>id : /[a-z]+/</p>
<p>keyword : &#8216;keyword&#8217;</p>
<p>id : keyword    %reject</p></blockquote>
<p>To correct for this, an equivalent of <em>do_new_reductions</em>, more aptly named <em>do_new_rejections</em> must be created.  Like <em>do_new_reductions</em>, <em>do_new_rejections</em> cares only about new stacks that contain the new link.  However, <em>do_new_rejections</em> is only interested in finding previous reductions, so it will do find, to find states and links, but not create.  When it finds a link, and it will try to remove any previously merged <em>parsetree</em> from ambiguous parsetrees, and mark clearly rejected reduction links as rejected.  Finally, it will call <em>do_new_rejections</em> instead of <em>reducer</em>, since it is not interested in creating new states.</p>
<p>Finally, after all of this, we get back to <em>update_states</em>.  One we are done processing <em>for_actor</em>, we need to process the shifts (just find-or-create the state in the lrshift structure, and link it to the lrshift.prevnode), and put the results somewhere.  Since it&#8217;s handily empty, we put the resulting states into <em>for_actor_delayed</em>.  After adding each state, we drop rejected stacks.  This just means we walk the graph structured stack and remove links with either the rejected flag or that link to rejected states, and since <em>reducer</em> has been keeping the flags updated for us, this is a cinch.</p>
<p>As the final detail, we must assign for_actor_delayed back onto <em>activestacks</em>, and the cycle is complete.  Repeat calling <em>update_stacks</em> until out of input, and you&#8217;ve got a parser.</p>
<p>The source for this algorithm can be found by perusing the wormhole parser generator.  Feel free to leave comments and ask questions.</p>]]></content:encoded>
			<wfw:commentRss>http://www.mightyheave.com/blog/?feed=rss2&amp;p=236</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
