This seems like a good thread to ask: are there more featureful languages that d...

a3n · on Jan 15, 2015

awk applies actions to lines matching patterns. The design seems concise and limited on purpose.

Not that you shouldn't want more features, but in the context that awk is typically used, what sort of features would you want to add? I confess I'm not able to understand what "can work in a data driven mode" might mean.

To answer your question a little more directly (and yet still be almost a non-answer), additional features can be found on the other side of the pipe into which you direct awk's output.

proveanegative · on Jan 15, 2015

>awk applies actions to lines matching patterns. The design seems concise and limited on purpose.

That is what I meant by a "data driven mode".

>what sort of features would you want to add?

An extended awk could at least add new types of patterns, e.g., binary, stateful and nested patterns or formal grammars, regex patterns with match groups, etc., meaning you could correctly process "real" CSV files or log files with complex structure. I believe these features could be made to fit in with the design of the language making them all feel like they belong to the same kind. It could also add new functions such as ones for Unicode text normalization.

Perl5 does much of this and Perl6 even introduces grammars but because of some of its design decisions for me Perl is not a joy to use like Awk. Both of its versions are just plain too big.

lsiebert · on Jan 16, 2015

You can use multiple file options, but I am not sure if you can include files to load a library file from within awk itself, nor am I aware of any options for conditional linking. Those would be nice, it would make it easy to write awk libraries.

kazinator · on Jan 16, 2015

I made a programming language called TXR which has a "data driven mode", consciously based on the concept, but not derived in any way, and with very different syntax.

The data driven mode isn't based on applying patterns to records, but rather patterns to entire text streams.

Here is a TXR script I use that transforms the output of the Linux kernel "checkpatch.pl" script into a conventional compiler-like error report that Vim understands:

    #!/usr/local/bin/txr
    @(next *stdin*)
    @(repeat)
    @type: @message
    #@code: FILE: @path:@lineno:
    @(output)
    @path:@lineno:@type (#@code):@message
    @(end)
    @(end)

This is "pure TXR": there are no traces of the embedded programming language TXR Lisp. Stuff that isn't @ident or @(syntax) is literal text which is matched. So the block

    @type: @message
    #@code: FILE: @path:@lineno:

matches a two-line pattern in the checkpatch output, extracting a type, followed by a space and colon, the message, then on the next line a code value prceded by a hash mark, followed by ": FILE: " and then a path delimited by a colon, and a line number terminated by a colon.

The following more complicated script scans diff output (usually the output of `git show -p` or `git diff`) and produces an "errors.err" file for Vim such that "vim -q" will navigate over the diffs as a quickfix list, and the messages have some information, like how many lines were added or removed at that point by the diff.

Here, there is an overall pattern matching logic for scanning the sections of a diff output: parsing out multiple diffs, and the "hunks" within each diff, with the line number info from the hunk header and such.

Some stateful Lisp logic calculates what is needed out of the extracted pieces and produces the output as it goes.

    #!/usr/local/bin/txr
    @(next *stdin*)
    @(do (set *stdout* (open-file "errors.err" "w")))
    @(repeat)
    +++ @nil/@path
    @  (repeat)
    @@@@ -@line0,@len0 +@line1,@len1 @@@@@(skip)
    @    (bind (line start minuses pluses old) (0 nil 0 0 nil))
    @    (set line1 @(toint line1))
    @    (repeat)
    @      (cases)
    -@text
    @        (do (inc minuses)
                 (set start line)
                 (push text old))
    @      (or)
     @nil
    @        (do (inc line)
                 (when start
                   (let ((wording (cond
                                    ((zerop minuses) `@pluses lines added`)
                                    ((zerop pluses) `@minuses lines deleted`)
                                    (t `@minuses lines edited to @pluses lines`))))
                     (put-line `@path:@(+ line1 start):@wording`)
                     (put-lines (nreverse old))
                     (set minuses 0)
                     (set pluses 0)
                     (set start nil)
                     (set old nil))))
    @      (or)
    +@nil
    @        (do (if (eql 1 (inc pluses))
                   (set start line))
                 (inc line))
    @      (or)
    @        (accept)
    @      (end)
    @    (until)
    @/[^+\- ]/@(skip)
    @    (end)
    @  (until)
    --- @(skip)
    @  (end)
    @(end)

Example:

    $ git show -p
    [ ... ]
    diff --git a/Makefile b/Makefile
    index a68fd84..f97b77b 100644
    --- a/Makefile
    +++ b/Makefile
    @@ -163,14 +163,14 @@ enforce:
     # Installation macro.
     #
     # $1 - chmod perms
    -# $2 - source file
    +# $2 - source file(s)
     # $3 - dest directory
     #
     define INSTALL
            mkdir -p $(3)
            cp -f $(2) $(3)
            chmod $(1) $(3)/$(notdir $(2))
    -       touch -r $(2) $(3)/$(notdir $(2))
    +       for x in $(2) ; do touch -r $$x $(3)/$$(basename $$x) ; done
     endef
     
     PREINSTALL := :


    $ git show -p | diff2err.txr 
    $ cat errors.err 
    Makefile:166:1 lines edited to 1 lines
    # $2 - source file
    Makefile:173:1 lines edited to 1 lines
    	touch -r $(2) $(3)/$(notdir $(2))

In "vim -q" you are taken to where the changes are, and in the ":cope" window you can see the additional lines that give the original text that was modified, if applicable. For instance, the first item navigates to the line "# $2 - source file(s)" and you know that one line was edited at that point, and the original text was " $2 - source file".