How far should a programming language aware diff go?

xg15 · on July 22, 2024

I think I'd appreciate some sort of "semantic grouping" of individual changes more than drawing someone random line and classifying all changes below it as "trivial".

The problem is that even a lot of the changes that normally constitute clutter can become relevant in certain situations or even introduce bugs.

One example would be ordering of Python imports: Changing the order of imports should have no effect on program behaviour if all your packages are well-behaved - and in 99.99% of cases it indeed hasn't. But the fact remains that imports are statements that are executed and can have side-effects. If a package does something nontrivial during load, changing the import order can have effects. Hiding such a change could mask introduction of a bug.

Hiding changes can also lead to confusion if you are trying to understand a series of changes that are based on each other, or if all changes of a commit are hidden. I've had the latter situation with IntelliJ, where the working tree was shown as "unclean" but the diff was completely empty. Solution: The diff wasn't actually empty, IntelliJ was just set to hide the changes.

I think a more interesting solution would be to build a sort of "tree of changes": At the bottom, you'd have the individual changes in the file; one level up, the changes would be grouped into higher-level operations, such as "change formatting", "rename identifier", "remove field", "move function", etc. If possible, those could be grouped into even higher-level changes, such as "implement new class" or "extract expression into function", etc.

tsimionescu · on July 22, 2024

Agreed, I don't think the value of a semantic diff would be in hiding changes. Instead, the value should be in generating more useful diffs.

Normal diff often gets "confused" compared to how you'd logically identify the code. For example, if you extract a piece of a larger function as a smaller function, instead of showing that a piece of code was moved, it will show that you changed a header, deleted some lines, added others below, etc. A semantic diff should be able to refine these diffs in a better way, but shouldn't hide them. Even for the whitespace changes, I'd like it to show the diff, but the overlay to explain that only whitespace is different, so I know I don't need to look at it carefully.

blackenedgem · on July 22, 2024

I think the problem you'll eventually run into is figuring out intent from the diff. It seems like an easier version of reverse compiling.

When it comes down to semantic diffs I'm more interested in something like the Semantic Patch Language by Coccinelle. Being able to represent mundane refactorings across an entire codebase in a few lines seems great. And it unifies intent with the diff.

golergka · on July 22, 2024

And just like that, another GPT-4 wrapper startup was born.

shaftway · on July 22, 2024

Personally, I really don't care about these cases. What really grinds my gears is when a diff plucks out a weird line in the middle of a block of code that only has a closing curly brace and that's the line that it thinks is the same, and everything around it is a diff.

If you're going to call yourself a semantic diff-ing company, fix that before you worry about the order of my imports.

rob74 · on July 22, 2024

I think I have heard of their product before, and reading the blog post intrigued me, so I wanted to try it, but... VS Code Integration? GitHub Integration? No standalone version which you could actually use as a diff tool for git locally? Ok, I guess only having a "cloud" version makes licensing easier, and you can call me old fashioned, but seeing an eminently "offline" task such as diff being turned into "online-only" seems a bit strange to me.

DarkPlayer · on July 22, 2024

The VS Code extension works offline. The diff calculation is performed on the host where the VS Code GUI is running (makes a difference in case of SSH/Docker/WSL).

kmoser · on July 22, 2024

> - const foo = function(a, b) { ... }

> + const foo = (a, b) => { ... }

Assuming this is JS code, these differences should not be ignored, as an arrow function can behave differently than a traditional function.

culi · on July 22, 2024

More specifically, the `function` keyword version of an anonymous function preserves the keyword `this` whilst the arrow syntax anonymous function does not. Arrow functions also cannot use the `yield` keyword nor be used as constructors

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

egnehots · on July 22, 2024

There is also the function scope vs block scope...

var x = 3 will escape the latter.

jolmg · on July 22, 2024

If you mean the following, it actually doesn't:

  (function () {
    var x = 0;
    const foo1 = function(a, b) { var x = 2; }
    const foo2 = (a, b) => { var x = 3; }
    foo1(); console.log(x); // prints 0
    foo2(); console.log(x); // prints 0
  })()

However, there is the difference in how the implicit semicolons are inserted:

  const foo1 = function(a, b) { return a + b; }
  (2, 3)
  console.log(foo1) // prints 5

  const foo2 = (a, b) => { return a + b; }
  (2, 3)
  console.log(foo2) // prints [Function: foo2]

olliej · on July 22, 2024

haha, oh ASI - I was very confused by your example as I read this as if it was

  > const foo1 = function(a, b) { return a + b; }
  (2, 3)
  > console.log(foo1) // prints 5

  > const foo2 = (a, b) => { return a + b; }
  (2, 3)
  > console.log(foo2) // prints [Function: foo]

Even though it makes no sense for (2, 3) to be a result in those cases, that was just how I ended up reading it, and I was exceptionally confused about how the printed output could possibly happen.

A super nice example of how subtle differences can really change things though.

As a side note, ASI for JS is actually super easy to implement and the rules are actually really simple (leaving aside whether the feature itself is good :D ) as it's just "these specific statements can have a new line instead of a semicolon" - so in the parser instead of consume(semicolon) you can just do "semicolon or newline" (You can check the logic in JSC in https://github.com/WebKit/WebKit/blob/main/Source/JavaScript... - just look for autoSemicolon() or autoSemi() I can't recall off the top of my head)

Am4TIfIsER0ppos · on July 23, 2024

> implicit semicolons are inserted

lmao what a p

culi · on July 22, 2024

I believe this is actually a difference between named and anonymous functions. The named function syntax is

  function foo1() { ... }

Both of the below examples are anonymous functions

  const foo2 = function() { ... }

  const foo3 = () => { ... }

glhaynes · on July 22, 2024

My guess would be that quite a large portion of changes we'd expect at a glance to be identical aren't, especially for inputs that would not be expected. I'd also guess this is much more likely in languages in which valid code commonly produces undefined behavior.

If the tool could show you, for example, "this change is functionally identical except for when the sum of the two inputs overflows a UInt64", that'd be pretty cool.

kmoser · on July 22, 2024

That would neat, although I suspect most compilers/linters should already be able to warn you about potential overflows.

If you want to boil down what devs are looking for in a diff tool to one thing, it would be "which change(s) between these two versions of code result in a different binary (or AST/opcodes/bytecode, depending on the language)?" All other changes, while certainly sometimes useful to know about, are just syntactic sugar.

tsimionescu · on July 22, 2024

Literally every time you add/subtract/multiply two variables there is a potential overflow. In relatively rare cases, the compiler might be able to prove that they can't overflow, but in the general case it can't, and I doubt any actually do.

culi · on July 22, 2024

I think you've answered the question posed by the title here. That's feels too far

persnickety · on July 23, 2024

Same goes for the loop example, where the i variable has a different scope.

I think those are meant to be illustrative of an idea rather than exact examples.

kqr · on July 23, 2024

But if they have worked in this field for a few months, shouldn't they have a crapton of exact examples to draw from? If not, isn't that worrying?

persnickety · on July 23, 2024

Depends on how technical the audience of this post should be. Putting an accurate example which is only understandable to someone with years of experience might make newbies think it's a made up concern.

kqr · on July 23, 2024

Yeah, that was a bit of an unfortunate example for a blogvertising post. Even I as a non-frontend developer knew to watch out for that one. A company working with semantic diffs should really know better and such mistakes do not inspire confidence for me!

olliej · on July 22, 2024

Yeah I came to say that these are not semantically equivalent (I guess you _could_ verify equivalence if you ensured it did not use this or eval)

MathMonkeyMan · on July 22, 2024

I haven't actually checked the source, but I've heard that clang-format works by assigning "badness" weights to each choice of whitespace between tokens, and then runs Dijkstra's (or some other DP) to find the least bad set of choices. A recent Tom7 video said that Knuth did the same thing for text justification.

How about we do a similar thing for ASTs? Like a peephole optimizer looking for runs of instructions that could be substituted for simpler alternatives, a tree diff could identify diff patterns that "might be trivial." You have a whole catalog of these patterns, and assign to each a weight. Then the displayed diff is the optimal set of choices "consider different or not?"

You would need some additional ingredient, though; some boundary condition. Otherwise "everything is the same" would always minimize badness.

amelius · on July 22, 2024

https://en.wikipedia.org/wiki/Edit_distance

ckdot2 · on July 22, 2024

Not far. Just show all changes. Like the blog article already states, for many projects you already have code formatters, so changes in format usually don’t happen a lot - and if they do there might be a reason you don’t want to hide (like… you change your rules of code formatting). For all the other example I neither see the point why you would want to hide it. If you don’t want to see commas added in a list, make it a rule that the comma always has to be appended after the last element. Most languages allow that. Semantic equivalence? The JS example isn’t even equivalent because „this“ may have a different context. I’d prefer to have a „dumb“ diff that simply shows all the changes instead of adding these kind of complexities. Just keep your MRs small and there’s no real issue.

HelloNurse · on July 23, 2024

The healthy workflow is: notice formatting discrepancies -> reformat -> reopen the diff, now containing only intentional, substantial changes.

Of course the edited source files should have been reformatted automatically, on save or on build, before someone opens a diff: this should never happen except as a symptom of inadequate reformatting (e.g. I decide to adopt redundant commas at the end of comma-separated lists) or abnormal operations (e.g. non-reformatted code was accidentally committed to version control).

firethief · on July 22, 2024

Interesting idea. I've just tried it with a couple of languages:

- TS with Vue: SFC are not really working (it's showing a style change as if the whole stylesheet were replaced with a mostly-identical stylesheet).

- Rust: It doesn't seem semantic at all. It's showing a lot of character-level insertions and deletions that seem worse than how git-diff or GitHub would break down the changes.

It doesn't seem ready yet for what I'd like to use it for.

DarkPlayer · on July 22, 2024

Hi, author of SemanticDiff here.

I'm sorry you didn't have a good experience testing the tool. If it doesn't work / makes things worse than a standard diff, that's definitely considered a bug. It is probably something specific to your code and not a general issue. It would therefore be great if you could open an issue [1] or support ticket [2], ideally with some sample code, so we can take a look. Thanks in advance!

[1] https://github.com/Sysmagine/SemanticDiff/issues [2] support@semanticdiff.com

whirlwin · on July 22, 2024

After switching to difftastic for semantic diff, I have never looked back. (https://github.com/Wilfred/difftastic)

How does semanticdiff compare to that? Anyone got experience?

DarkPlayer · on July 22, 2024

You can find a comparison of the two tools here: https://semanticdiff.com/blog/semanticdiff-vs-difftastic/

As author of SemanticDiff, I am obviously a bit biased. But Wilfred, the author of difftastic, found the analysis to be "pretty even-handed" [1], so I think it should be somewhat fair.

[1]: https://x.com/_wilfredh/status/1764424652611318146

rty32 · on July 22, 2024

In theory semantic diff is useful, but based on my code review experience, it hardly matters. For a language like Python or JavaScript, a developer fluent in these languages don't really pay much attention to these things anyway, just like you don't normally pay much attention to commas and periods in a sentence unless it causes confusion. Personally I wouldn't pay $5/month out of the pocket for this functionality.

emporas · on July 22, 2024

There is also diffsitter. I was testing it a month ago, it works fine. Not sure what language-aware diffing exactly means, but diffsitter uses tree-sitter and it is comparing ASTs and CSTs of the files.

[1] https://github.com/afnanenayet/diffsitter

sureglymop · on July 23, 2024

Seems to be the best choice since treesitter is already generic and supports many languages. Will try this out first.

emporas · on July 23, 2024

tree-sitter is very generic and supports so many languages, it is really great. The first use case of the article, "Level 1: Irrelevant Whitespace" is covered by diffsitter.

I wanted at some point, to diff files and ignore comments for Rust source code. I wrote a small program, to remove the two different comment nodes the Rust grammar defines: line_comments and block_comments. Then i diffed the resulting uncommented code using diffsitter.

From start to finish, writing the program and testing it to many different files it took 5 hours.

diffxx · on July 22, 2024

- def foo(): int | None + def foo(): None | int

Whether or not this makes a semantic difference is language implementation dependent. I think that is why this kind of tool is not especially appealing to me. I would have to have almost complete knowledge of the compiler and the diff tool to truly trust that there is no semantic difference. Moreover, I would like to know why changes to the text that are being made that have no semantic effect are being mixed with those that do.

For me, text is king and that is the level at which I want to evaluate diffs 99% of the time, but I do recognize that others have different goals and preferences.

PullJosh · on July 22, 2024

I was expecting this to refer to different ways to represent the same diff. (For example, you could represent a change from `console.log(“hello”)` as `console.log('hello')` as +'-“ … +’-“ or as +'hello'-“hello”)

I don’t have a specific example in mind, but it seems reasonable that different languages could benefit from different ways of representing the same diff.

Jabbles · on July 22, 2024

nit: the order of Go's imports makes no difference: https://go.dev/ref/spec#Program_initialization_and_execution...

jmull · on July 22, 2024

I think the general answer is, it depends.

Hopefully this tool gives a dev ready control of what kinds of differences to hide/show.

I'm actually not convinced of the concept of semantic diff (not talking just about this tool specifically)... when we talk about code that is different but equivalent, I think we're talking about elements of style.

It seems to me that it would pretty much always be better to normalize the elements of style considered insignificant, rather than hide them just in the diff tool. That covers diffing as well as viewing/reading the code.

If you don't care about a particular element of style then either it shouldn't be coming up much or I think you'd be better off using some kind of enforcing/fixing linter.

bmitc · on July 23, 2024

As far as possible. Git's line-based diffing is ridiculously primitive and gets in the way of software development. I wonder how many bugs are introduced because of Git's diffing system.

g4zj · on July 22, 2024

I've never knowingly used a language-aware diff tool before, but I wouldn't mind the option. I think it would come in handy on occasion.

Smaug123 · on July 22, 2024

Personally I have `git difft` aliased to `difft --display side-by-side`, so it's one extra character for a semantic diffing tool (for me, Difftastic).

1-more · on July 22, 2024

Because I am silly I have

    dit () {
        verb="$1"
        shift 1
        GIT_EXTERNAL_DIFF=difft git $verb --ext-diff $@
    }

it works for `dit d` and `dit show HEAD` but it fails on `dit stash show -p stash@{0}`

g4zj · on July 22, 2024

Thanks for mentioning Difftastic. It looks very interesting! I'll give it a try.

gumby · on July 22, 2024

Never even used `diff -p`? That's been in diff for many decades.

joe_the_user · on July 22, 2024

It's a very interesting question. One idea I've toyed with over the years is a language specifically designed to facilitate effective diffs.

Anyway, it seems the "Level 3: semantic diff" actually could be divided into different levels. But "Level 4: Mostly identical" seems quite problematic.

zokier · on July 22, 2024

I think this question has been already largely been answered by automatic style (etc) tools. Such tools generally should not make semantic changes to programs, so they (implicitly) define what are meaningful semantic changes and what are meaningless changes.

philipwhiuk · on July 22, 2024

I think a slider on the code review would be nice.

That way I could start at the 'definitely a change' stuff and then slide down towards L2 until I decided it was fine.

fragmede · on July 23, 2024

I'm not sure how they wrote up that whole article without bringing up looking at the AST representation of the code being diffed.

twic · on July 22, 2024

Accurately identifying whether any change is a semantic difference involves solving the Halting Problem, right?

Smaug123 · on July 22, 2024

Fortunately the article already says that hiding all semantically identical changes is "probably going too far", so they can just not try and solve the halting problem.