Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Wild Moose – Autonomous agent for production debugging (wildmoose.ai)
55 points by yasmind on Oct 3, 2023 | hide | past | favorite | 19 comments
Hi Hacker News! We launched an autonomous agent that helps debug production issues, and we’re curious to get your feedback.

Today's GenAI devtools, such as Copilot, are limited: they are great for writing code, but we all know that programming is only 20% coding, and 80% debugging.

So how can GenAI be used for debugging? As opposed to code completion or test automation, production debugging is not about generating text. Debugging is mostly about root-cause analysis. We realized two things:

1) Generative AI is drastically changing the way we work with data, thanks to its ability to not only generate queries, but also run code and analyze unstructured data. This enables building better data-exploration experiences with far more intuitive interfaces.

2) RCA is all about exploring different types of data. When you don’t know why a transaction was dropped or which customers are affected – you explore metrics, logs, your code, other people’s code, old slack messages, and whatnot, to figure out what’s broken.

Putting those two together, we built an autonomous agent that helps debug production issues. Our LLM "moose" (ok, it's corny but we like it) connects to your logs, metrics, and other observability data, and lets you extract and analyze them by conversing with it. Once it gets a task, it will start reasoning, invoking APIs, and running code, until it comes back with an answer.

For example, when requested to “show me IDs of transactions that took over 1 minute today”, it will fetch those transactions from Datadog for you. You might then ask it if long-running transactions correlate with a metric such as DB CPU load. It will fetch the metric values, visualize them on a graph alongside the long transaction frequency, and give you the answer.

Our software both runs code and invokes API calls; the interplay between these is nontrivial to design and a fertile ground for innovation. There are “textbook” solutions to let agents write and run code (open sourced by, for example, Open Interpreter), and also to invoke tools/APIs (for example, Gorilla). But doing both together is harder, and yet required. We welcome your thoughts on this!

Try our tool with your Datadog’s logs and metrics >> https://app.wildmoose.ai/slack/install

Setup demo >> https://www.loom.com/share/9a4adc39806742c48d14cdd39da6e560?...

If you want to see other integrations, or have ideas for features, or you’ve spotted behaviors that seem off - we’d love to hear. Hit us up in the comments!



One of the issues with AI products is that they often require a good amount of input to use. It seems to me that a great AI product would save me time and do things for me without being asked to.

For example, at our company we have a quite a few of alerts set up. Datadog also automatically detects anomalies. It would be neat if this (or something else) could automatically do an initial triage without being prompted and give me a free headstart on issues that come in.

Otherwise, it feels like it's "work" to learn how to use the product, which seems to miss the promise of AI (doing things for us!).


100%. We already have some of those more-automated features, e.g. giving you context about recent changes that might be related to an alert, like deployments that might be related, feature flags, etc. But def interesting to do more around triage. Are you using anything today for triage or is it all manual?


It sounds neat. I'd need to see how productive it is for a real world production environment to figure out the value. The splash page design though is an absolute delight for me. Scrolling feels slow though.


Thanks, we'll fix the scrolling issue!

Do you have any examples for recent prod issues you faced? Would be interesting to see how we could help speed up the investigation there.


Pretty wild. I can see these kinds of ai-based tools become used more often as they connect to more systems.

With that said, in my experience, usually, coming up with a good AI question can sometimes be harder than the act of looking for the data itself, i.e. asking "what were the last 3 errors yesterday" is not something I have ever done.


Those are good points, but like you said, the more systems connected the deeper the value the tool can bring. It goes beyond a better interface for querying logs and metrics: since our agent can run code, it can give you the bottom line for questions you have based on information aggregated from multiple data sources. You don't need to think of "AI question", rather just ask directly the thing you want to know, and our agent will query multiple sources of data and analyze those to give direct answers. For example, if you're trying to understand who are the users impacted by a certain issue - you can just ask exactly that.


True, there is so much to say for and against AI-based interfaces for working with data. But in reality, following the advent of products such as code interpreter, we're actually seeing users beginning to expect their data to be accessible not just via baroque query languages and multi-hop UXs. Finally, as Yasmin noted above, "Last 3 errors yesterday" is a bit diminutive :) you might ask where are transactions dropped, correlate errors with metrics, create dashboards on the fly, ...

Please feel very free to continue this discussion here -- it's an important one for us!


Support for the Grafana stack (especially Loki & Tempo) would be great.


Great, thanks for the input! Do you mind sharing how many eng. are in your company? We’re trying to get a better sense of what tools are used per r&d size so we’d be able to create solutions that fit the needs of each :)


It's around ~15 SWEs


Got it, thanks, that's helpful to know.


Clicking links in your (wildmoose.ai) stack menu on the site doesn't close the menu on Android Chrome. The link location loads underneath though. Just a friendly heads up.


Thanks so much!


Being able to go through massive amounts of production logs quickly is super interesting - how can we pinpoint and trace requests quickly going through multiple microservices?


Roei the CTO here :) During onboarding we preliminarily index some data about your microservice architecture so we have some "common language" with your Datadog instance. Couple that with run-of-the-mill distributed tracing deployed at the user's end, and our moose can start querying/reasoning about multi-service transactions, and/or at the finer granularity of individual requests and log lines.

Does this answer your question?


please don't use link shorteners, they're a 404 waiting to happen and we're all professionals here and not scared of URLs with UUIDs in them: https://www.loom.com/share/9a4adc39806742c48d14cdd39da6e560?...


Fixed. Thanks for the catch!


Neat idea. Looking forward to support for more integrations.


Thanks! Let us know if you have any specific ones in mind that would be helpful for you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: