Show HN: Wild Moose – Autonomous agent for production debugging

infixed · on Oct 3, 2023

One of the issues with AI products is that they often require a good amount of input to use. It seems to me that a great AI product would save me time and do things for me without being asked to.

For example, at our company we have a quite a few of alerts set up. Datadog also automatically detects anomalies. It would be neat if this (or something else) could automatically do an initial triage without being prompted and give me a free headstart on issues that come in.

Otherwise, it feels like it's "work" to learn how to use the product, which seems to miss the promise of AI (doing things for us!).

yasmind · on Oct 3, 2023

100%. We already have some of those more-automated features, e.g. giving you context about recent changes that might be related to an alert, like deployments that might be related, feature flags, etc. But def interesting to do more around triage. Are you using anything today for triage or is it all manual?

jadbox · on Oct 3, 2023

It sounds neat. I'd need to see how productive it is for a real world production environment to figure out the value. The splash page design though is an absolute delight for me. Scrolling feels slow though.

yasmind · on Oct 3, 2023

Thanks, we'll fix the scrolling issue!

Do you have any examples for recent prod issues you faced? Would be interesting to see how we could help speed up the investigation there.

morsela · on Oct 3, 2023

Pretty wild. I can see these kinds of ai-based tools become used more often as they connect to more systems.

With that said, in my experience, usually, coming up with a good AI question can sometimes be harder than the act of looking for the data itself, i.e. asking "what were the last 3 errors yesterday" is not something I have ever done.

yasmind · on Oct 4, 2023

Those are good points, but like you said, the more systems connected the deeper the value the tool can bring. It goes beyond a better interface for querying logs and metrics: since our agent can run code, it can give you the bottom line for questions you have based on information aggregated from multiple data sources. You don't need to think of "AI question", rather just ask directly the thing you want to know, and our agent will query multiple sources of data and analyze those to give direct answers. For example, if you're trying to understand who are the users impacted by a certain issue - you can just ask exactly that.

roei · on Oct 4, 2023

True, there is so much to say for and against AI-based interfaces for working with data. But in reality, following the advent of products such as code interpreter, we're actually seeing users beginning to expect their data to be accessible not just via baroque query languages and multi-hop UXs. Finally, as Yasmin noted above, "Last 3 errors yesterday" is a bit diminutive :) you might ask where are transactions dropped, correlate errors with metrics, create dashboards on the fly, ...

Please feel very free to continue this discussion here -- it's an important one for us!

denvrede · on Oct 4, 2023

Support for the Grafana stack (especially Loki & Tempo) would be great.

yasmind · on Oct 4, 2023

Great, thanks for the input! Do you mind sharing how many eng. are in your company? We’re trying to get a better sense of what tools are used per r&d size so we’d be able to create solutions that fit the needs of each :)

denvrede · on Oct 5, 2023

It's around ~15 SWEs

yasmind · on Oct 5, 2023

Got it, thanks, that's helpful to know.

ozfive · on Oct 3, 2023

Clicking links in your (wildmoose.ai) stack menu on the site doesn't close the menu on Android Chrome. The link location loads underneath though. Just a friendly heads up.

yasmind · on Oct 4, 2023

Thanks so much!

pico-games · on Oct 3, 2023

Being able to go through massive amounts of production logs quickly is super interesting - how can we pinpoint and trace requests quickly going through multiple microservices?

roeischuster · on Oct 3, 2023

Roei the CTO here :) During onboarding we preliminarily index some data about your microservice architecture so we have some "common language" with your Datadog instance. Couple that with run-of-the-mill distributed tracing deployed at the user's end, and our moose can start querying/reasoning about multi-service transactions, and/or at the finer granularity of individual requests and log lines.

Does this answer your question?

mdaniel · on Oct 4, 2023

please don't use link shorteners, they're a 404 waiting to happen and we're all professionals here and not scared of URLs with UUIDs in them: https://www.loom.com/share/9a4adc39806742c48d14cdd39da6e560?...

yasmind · on Oct 4, 2023

Fixed. Thanks for the catch!

jbmsf · on Oct 4, 2023

Neat idea. Looking forward to support for more integrations.

yasmind · on Oct 5, 2023

Thanks! Let us know if you have any specific ones in mind that would be helpful for you.