Re: Parser combinator problem from Wolfgang Lux on 2008-02-08 (Curry mailing list)

From: Wolfgang Lux <wlux_at_uni-muenster.de>
Date: Fri, 08 Feb 2008 17:52:50 +0100

Hello Steffen!

> Hello,
>
> I have a question regarding the <*> operator (successive
> application) known from parser combinator libraries,
>
> Normally, I would define it as follows:
>
> (<*>)::Parser (res1->res2)->Parser res1->Parser res2
> (p1 <*> p2) g = (pv qv, g'')
> where
> (pv, g') = p1 g
> (qv, g'') = p2 g'
>
> However, applied to real input this results in stack overflow
> in the Münster Curry Compiler and non- or really late
> termination in PACKS.
>
> If I use this formulation, no problems occur:
>
> (p1 <*> p2) g | p1 g =:= (pv,g') & p2 g' =:= (qv, g'') = (pv qv, g'')
> where g',g'',pv,qv free
>
> However, I think, that the latter one is not as intuitive as the
> first one, so I wonder why it performs so much better.

Thank you for this interesting example.

It is somewhat hard to tell what is going wrong in your real parser
without knowing that parser, but given the fact that you get a stack
overflow in MCC it may well be the case that you are using
non-deterministic parsing combinators that perform a choice before
consuming any input and that therefore the search space of the parser
is explored in the wrong order.

To become a little more concrete, consider the following simple
parser which recognizes a sequence of three binary digits.

   end [] = ([],[])
   char x (c:cs) | x=:=c = ((x:), cs)
   digit = char '0' ? char '1'
   threeDigits = digit <*> (digit <*> (digit <*> end))

Now consider this parser is applied to the string "100". The order
in which the search space of the parser is explored with the lazy
definition of <*> depends on whether the first or the second component
of the parser's result tuple is evaluated first. If the second
component is evaluated first, the lazy combinator <*> will evaluate
the application p2 g' first. Given the above definition of digit
this means that the parser will perform a non-deterministic choice
for the two alternatives char '0' and char '1'. Only then will it
enter the evaluation of the first argument of <*>. In effect, this
means that the parser needlessly tries to match the input string
against the sequences 000, 001, 010, and 011.

Your non-lazy definition of (<*>) as well as Bernd's definition
using case expressions simply ensures that the parser's search
space is explored in the intended order, viz. from left to right.

Regards
Wolfgang

_______________________________________________
curry mailing list
curry_at_lists.RWTH-Aachen.DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/curry
Received on Fr Feb 08 2008 - 18:28:35 CET

This archive was generated by hypermail 2.3.0 : Do Jun 20 2024 - 07:15:09 CEST