I'm still not 100% sure I understand what a symlink in a git repository actually does, especially across different operating systems. Maybe it's fine?
Anthropic say "put @AGENTS.md in your CLAUDE.md" file and my own experiments confirmed that this dumps the content into the system prompt in the same way as if you had copied it to CLAUDE.md manually, so I'm happy with that solution - at least until Anthropic give in and support AGENTS.md directly.
It just creates the same symlink on any other checkout. (On Linux/macOS at least, Windows I believe requires local settings changes.)
Only sane (guaranteed portable) option is for it to be a relative symlink to another file within the same repo, of course. i.e. CLAUDE.md would be -> 'AGENTS.md', not '/home/simonw/projects/pelicans-on-bicycles/AGENTS.md' or whatever.
On windows, it depends on the local git configuration. It’s not something I’ve been happy with, especially since symlinks also behave differently again when you’re running a docker container to get your windows usable for development.
I use symbolic links, and Claude Code often gets confused, requiring several iterations to understand that the CLAUDE.md file is actually a symbolic link to AGENTS.md, and that these are not two different, duplicate files
The recommended approach has the advantage of separating information specific to Claude Code, but I think that in the long run, Anthropic will have to adopt the AGENTS.md format
Also, when using separate files, memories will be written to CLAUDE.md, and periodic triaging will be required: deciding what to leave there and what to move to AGENTS.md
> Instead of a bloated API, an MCP should be a simple, secure gateway that provides a few powerful, high-level tools [...] In this model, MCP’s job isn’t to abstract reality for the agent; its job is to manage the auth, networking, and security boundaries and then get out of the way.
Agreed. My only MCP is a code interpreter. I also recently started experimenting with making an MCP “proxy” which acts a better harness that lets the agent call MCP from within a code interpreter [1]
But in general I still don’t really use MCP. Agents are just so good at solving problems themselves. I wish MCP would mostly focus at the auth part instead of the tool part. Getting an agent access to an API with credentials usually gives them enough power to solve problems on their own.
Thanks! I def don't think I would have guessed this use case when MCP first came out, but more and more it seems Claude just yearns for scripting on data rather than a bunch of "tools". My/MCPs job has become just getting it that data.
Have you tried using light CLIs rather than MCP? I’ve found that CLIs are just easier for Claude, especially if you write them with Claude and during planning instruct it to think about adding guidance to users who get confused.
Our auth, log diving, infra state, etc, is all usable via cli, and it feels pretty good when pointing Claude at it.
Yeah if that's possible or you are willing to build it, that's the right solution. Today pretty much all of my integrations are pure CLIs like that rather than MCPs.
You can do anything you want via a CLI but MCP still exists as a standard that folks and platforms might want to adopt as a common interface.
This is how MCP works if you use it for as essential an internal tool API gateway (stateless http) instead of a client facing service that end users are connecting directly to. It's basically just OpenAPI but slightly more tuned for LLM inference.
> If you’re not already using a CLI-based agent like Claude Code or Codex CLI, you probably should be.
Are the CLI-based agents better (much better?) than the Cursor app? Why?
I like how easy it is to get Cursor to focus a particular piece of code. I select the text and Cmd-L, saying "fix this part, it's broken like this ____."
I haven't really tried a CLI agent; sending snippets of code by CLI sounds really annoying. "Fix login.ts lines 148-160, it's broken like this ___"
Yeah I started with Cursor, went hybrid, and then in the last month or so I've totally swapped over.
Part of it is the snappy more minimal UX but also just pure efficacy seems consistently better. Claude does its best work in CC. I'm sure the same is true of Codex.
All I can say is when I switched from Cursor to Claude it took me less than 24 hours to realise I wouldn’t go back. The extra UI Cursor slaps on to VS Code is just bloat, which I found quite buggy (might be better now though), and the output was nowhere near as good. Maybe things have improved since I switched but Claude CLI with VS Code is giving me no reasons to want to try anything else. Cursor seemed like a promising and impressive toy, Claude CLI is just a great product that’s delivering value for me every day.
That particular part is the same, roughly. The bigger issue is just that CC's a better agent than Cursor, last I checked.
There's even an official Anthropic VS Code extension to run CC in VS Code. The biggest advantage is being able to use VS Code's diff views, which I like more than in the terminal. But the VS Code CC extension doesn't support all the latest features of the terminal CC, so I'm usually still in the terminal.
As fascinating as these tools can be - are we (the industry) once again finding something other than our “customer” to focus our brains on (see Paul Graham’s “Top idea in your mind” essay)?
Nothing crazy about it, judging by how much CPU and memory it uses. Now, if it managed to grow features without bringing my M4 Mac with 64GB of ram to a crawl... that's be magic.
> The Takeaway: Skills are the right abstraction. They formalize the “scripting”-based agent model, which is more robust and flexible than the rigid, API-like model that MCP represents.
Just to not confuse, MCP is like an api but the underlying api can execute an Skill. So, its not MCP vs Skill as a contest. It's just the broad concept of a "flexible" skill vs "parameter" based Api. And again parameter based APIs can also be flexible depending on how we write it except that it lacks SKILL.md in case of Skills which guides llm to be more generic than a pure API.
By the way, if you are a Mac user, you can execute Skills locally via OpenSkills[1] that I have created using apple contianers.
Skills are also a convenient way for writing self-documenting packages. They solve the problem of teaching the LLM how to use a library.
I have started experimenting with a skills/ directory in my open source software, and then made a plugin marketplace that just pulls them in. It works well, but I don't know how scalable it will be.
I use claude code every day, and havent had a chance to dig super deep into skills, but even though ive read a lot of people describe them and say they're the best thing so far, I still dont get them. Theyre things the agent chooses to call right? They have different permissions? is it a tool call with different permissions and more context? I have yet to see a single post give an actual real-world concrete example of how theyre supposed to be used or a compare and contrast with other approaches.
The prerequisite thought here is that you're using CC to invoke CLI tools.
So now you need to get CC to understand _how_ to do that for various tools in a way that's context efficient, because otherwise you're relying on either potentially outdated knowledge that Claude has built in (leading to errors b/c CC doesn't know about recent versions) or chucking the entirety of a man page into your default context (inefficent).
What the Skill files do is then separate the when from the how.
Consider the git cli.
The skill file has a couple of sentences on when to use the git cli and then a much longer section on how it's supposed to be used, and the "how" section isn't loaded until you actually need it.
I've got skills for stuff like invoking the native screenshot CLI tool on the Mac, for calling a custom shell script that uses the github API to download and pull in screenshots from issues (b/c the cli doesn't know how to do this), for accessing separate APIs for data, etc.
I think if it literally as a collection of .md files and scripts to help perform some set of actions. I'm excited for it not really as a "new thing" (as mentioned in the post) but as effectively an endorsement for this pattern of agent-data interaction.
So if youre building your own agent, this would be a directory of markdown documents with headers that you tell the agent to scan so that its aware of them, and then if it thinks they could be useful it can choose to read all the instructions into its context? Is it any more than that?
I guess I dont understand how this isnt just RAG with an index you make the agent aware of?
It also looks a lot like a tool that has a description mentioning it has a more detailed MD file the LLM can read for instructions on complex workflows, doesn’t it? MCP has the concept of resources for this sort of thing. I don’t see any difference between calling a tool and calling a CLI otherwise.
I mean it is technically RAG as the LLM is deciding to retrieve a document. But it’s very constrained.
The skills that I use all direct a next action and how to do it. Most of them instruct to use Tasks to isolate context. Some of them provide abstraction specific context (when working with framework code, find all consumers before making changes. add integration tests for the desired state if it’s missing, then run tests to see…) and others just inject only the correct company specific approach to solving only this problem into Task context.
They are composable and you can build the logic table of when an instance is “skilled” enough. I found them worse than hooks with subagents when I started, but now I see them as the coolest thing in Claude code.
The last benefit is nobody on your team even had to know they exist. You can just have them as part of onboarding and everyone can take advantage of what you’ve learned even when working on greenfield projects that don’t have a CLAUDE.md.
I don't understand how people use the `git worktree` workflow. I get that you want to isolate your work, but how do you deal with dev servers, port conflicts and npm installs? When I tried it, it was way more hassle than it was worth.
I have a bash script that creates the worktree, copies env over and changes the ports of containers and the services. I then can proxy the "real" port to any worktree, it's common I'll have 3 worktrees active to switch back and forth
Agree, depending on the repo and changes it’s hard with local dev servers. It sometimes works well if you don’t need local dockers and want to outsource git workflow to CC as well. Then it can do on that branch whatever it wants and main work is in another worktree with more steering and or docker env.
I generally like to use it. But I one project in the org which simply can’t work because the internal built system expects a normal .git directory at the root.
Means I have to rewrite some of the build code that isn’t aware of this git feature. And yes we use a library to read from git but not the git cli or a more recent compatible one that understands that the current work tree is not the main one.
Just my curiosity: Why are you producing so much code? Is it because it is now possible to do so with AI, or because you have a genuine need (solid business usecase) that requires a lot of code?
I just started developing self-hosted services largely with AI.
It wasn't possible before for me to do any of this at this kind of scale. Before, getting stuck on a bug could mean hours, days, or maybe even weeks of debugging. I never made the kind of progress I wanted before.
Many of the things I want, do already exist, but are often older, not as efficient or flexible as they could be, or just plain _look_ dated.
But now I can pump out react/shadcn frontends easily, generate apis, and get going relatively quickly. It's still not pure magic. I'm still hitting issues and such, but they are not these demotivating, project-ending, roadblocks anymore.
I can now move at a speed that matches the ideas I have.
I am giving up something to achieve that, by allowing AI to take control so much, but it's a trade that seems worth it.
Often code in SaaS companies like ours is indeed how we solve customer problems. It's not so much the amount of code but the rate (code per time) we can effectively use to solve problems/build solutions. AI, when tuned correctly, lets us do this faster than ever possible before.
If you don't want to think and offload it to llm they burn through a lot of tokens to implement in a non-efficient way something you could often do in 10 lines if you though about it for a few minutes.
I’ve just implemented a proof of concept that involved an API, a MCP server, an Authorization Server, a React frontend, token validation and proof of possession on the client, a CIBA flow for authentication… took a week , and I don’t even know the technologies used very well, it was all TypeScript but I work on JVM languages normally. This was a one off for a customer and I was able to show a fairly complex workflow end to end and what each part involves. I let the LLM write most of it but I understand every line and did have to make manual adjustments (though to be honest, I could easily explain to the LLM what I needed changed and given my experience it would eventually get there.
If you tell me I didn’t really need a LLM to be able to do all that in a week and just some thought and 10 lines of code would do, I suspect you are not really familiar with the latest developments in AI and just vastly underestimates the capabilities they have to do tricky stuff.
In a large project with decent code structure there can be quite a bit of boilerplate, convention, testing required. Also we are not talking about a 10-line change. More like 10k line feature.
Before LLMs we simply wouldn't implement many of those features since they were not exactly critical and required a lot of time, but now when the required development time is cut signifficantly, they suddenly make sense to implement.
Are there any of those CLI clients (coded in plain and simple C, or basic python/perl without 1 billion of expensive dependencies) able to access those 'coding AI' prompt anonymously then rate limited?
If no anonymous access is provided, is there a way to create an account with a noscript/basic (x)html/classic web browsers in order to get an API key secret?
Because I do not use web engines from the "whatng" cartel.
To add insult to injury, my email is self-hosted with IP literals to avoid funding the DNS people which are mostly now in strong partnership with the "whatng" cartel (email with IP literals are "stronger" than SPF since it does the same and more). An email is often required for account registration.
Blog posts like this would really benefit from specific examples. While I can get some mileage out of these tools for greenfield projects, I'm actually shocked that this has proven useful with projects of any substantial size or complexity. I'm very curious to understand the context where such tools are paying off.
Makes sense. I work for a growth stage startup and most of these apply to our internal mono repo so hard to share specifics. We use this for both new and legacy code each with their own unique AI coding challenges.
If theres enough interest, I might replicate some examples in an open source project.
> Generally my goal is to “shoot and forget”—to delegate, set the context, and let it work. Judging the tool by the final PR and not how it gets there.
This feels like a false economy to me for real sized changes, but maybe I’m just a weak code reviewer. For code I really don’t care about, I’m happy to do this, but if I ever need to understand that code I have an uphill battle. OTOH reading intermediate diffs and treating the process like actual pair programming has worked well for me, left me with changes I’m happy with, and codebases I understand well enough to debug.
I treat everything I find in code review as something to integrate into the prompts. Eventually, on a given project, you end up getting correct PRs without manual intervention. That's what they mean. You still have to review your code of course!
I feel like these posts are interesting, but become irrelevant quickly. Does anyone actually follow these as guides, or just consume them as feedback for how we wish we could interface with LLMs and the workarounds we currently use?
Right now these are reading like a guide to prolog in the 1980s.
Given that this space is so rapidly evolving, these kinds of posts are helpful just to make sure you aren't missing anything big. I've caught myself doing something the hard way after reading one of these. In this case, the framing is basically man pages for CLIs was a helpful description of sills that gives me some ideas about how to improve interaction with an in-house CLI my co. uses.
Yeah I like to think not everyone can spend their day exploring/tinkering with all these features so it's handy to just snapshot what exists and what works/doesn't.
I wouldn't say I follow them as guides, but I think the field is changing quickly enough that it's good, or at least interesting, to read what's working well for other people.
This one is already out of date. The bit on the top about allocating space in CLAUDE.md for each tool is largely a waste of tokens these days. Use the skills feature.
The skill lets you compress the amount loaded to just the briefest description, with the “where do I go to get more info” being implicit. You should use a SKILL.md for evry added tool. At which point, putting instructions in CLAIDE.md becomes redundant and confusing to the LLM.
When touch typing and talking to someone, I accidentally typed something to claude with my fingers off the home row, e.g. ttoubg kuje tgus ubti tge ckayde cide ternubak. Claude understood it just fine. Didn't even remark on it.
It truly is an idiot savant. It's absurdly good, and then if there is any unaccounted complexity, ugh let's just make the tests stubs.... tests pass now.
Does anyone have any suggestions on making Claude prefer to use project internal abstractions and utility functions? My C++ project has a lot of them. If I just say something like "for I/O and networking code, check IOUtils.h for helpers" then it often doesn't do that. But mentioning all helper functions and classes in the context also seems like a bad idea. What's the best way? Are the new Skills a solution?
Hooks can also be useful for this. If it's using the wrong APIs then can hint on write or block on commit with some lint function that checks for this.
I use both at the same time. CC seems to have better access to web and researching capabilities compared to Codex. Maybe I'm not using Codex right or missing something, but it has frequent troubles browsing internet. Also Claude Code is faster. So I use it when I know it can handle the task.
If you have no modifications or customization of Claude code then it comes down to a preference for proactivity (codex) or a bit more restraint.
If you are using literally any of Claude Code’s features the experience isn’t close, and regardless of model preference (Claude is my least favorite model by far) you should probably use Claude code. It’s just a much more extensible product for teams.
CC has better agent tools and is faster. The ability to switch from plan mode to execution mode and back is huge. Toggling thinking also. And of course they are innovating all of these agentic features like MCP, sub-agents, skills, etc...
Codex writes higher quality code, but is slower and less feature rich. I imagine this will change within months. The jury is still out. Exciting times!
No thanks. I rather write the code myself that use generated slop. I actually like to code and see little benefit in other peoples copypaste code (thats essentially what ai slop is really)
I feel or have the fear that the world will tumble and crack under the sheer amount of code we produce and can’t be maintained because at one point no one human can understand all the stuff that was written.
At the moment though I also code on and off with an agent. I’m not ready or willing to only vibe code my projects. For one is the fact that I had tons of examples where the agent gaslighted me only to turn around at the last stage. And in some cases the code output was to result focused and didn’t think about the broader general usage. And sure that’s in part because I hold it wrong. Don’t specify 10million markdown files etc. But it’s a feedback loop system. If I don’t trust the results I don’t jump in deeper. And I feel a lot of developers have no issue with jumping ever deeper. Write MCPs now CLIs and describe projects with custom markdown files.
But I think we really need both camps. Otherwise we don’t move forward.
read the document at https://blog.sshh.io/p/how-i-use-every-claude-code-feature and tell me how to improve my Claude code setup
I researched this the other day, the recommended (by Anthropic) way to do this is to have a CLAUDE.md with a single line in it:
Then keep your actual content in the other file: https://docs.claude.com/en/docs/claude-code/claude-code-on-t...Anthropic say "put @AGENTS.md in your CLAUDE.md" file and my own experiments confirmed that this dumps the content into the system prompt in the same way as if you had copied it to CLAUDE.md manually, so I'm happy with that solution - at least until Anthropic give in and support AGENTS.md directly.
Only sane (guaranteed portable) option is for it to be a relative symlink to another file within the same repo, of course. i.e. CLAUDE.md would be -> 'AGENTS.md', not '/home/simonw/projects/pelicans-on-bicycles/AGENTS.md' or whatever.
The recommended approach has the advantage of separating information specific to Claude Code, but I think that in the long run, Anthropic will have to adopt the AGENTS.md format
Also, when using separate files, memories will be written to CLAUDE.md, and periodic triaging will be required: deciding what to leave there and what to move to AGENTS.md
But I can’t speak to it working across OS.
I thought git by default treats symlinks simply as file copies when cloning new.
Ie git may not be aware of the symlink.
> Instead of a bloated API, an MCP should be a simple, secure gateway that provides a few powerful, high-level tools [...] In this model, MCP’s job isn’t to abstract reality for the agent; its job is to manage the auth, networking, and security boundaries and then get out of the way.
But in general I still don’t really use MCP. Agents are just so good at solving problems themselves. I wish MCP would mostly focus at the auth part instead of the tool part. Getting an agent access to an API with credentials usually gives them enough power to solve problems on their own.
[1]: https://x.com/mitsuhiko/status/1984756813850374578?s=46
Our auth, log diving, infra state, etc, is all usable via cli, and it feels pretty good when pointing Claude at it.
You can do anything you want via a CLI but MCP still exists as a standard that folks and platforms might want to adopt as a common interface.
Are the CLI-based agents better (much better?) than the Cursor app? Why?
I like how easy it is to get Cursor to focus a particular piece of code. I select the text and Cmd-L, saying "fix this part, it's broken like this ____."
I haven't really tried a CLI agent; sending snippets of code by CLI sounds really annoying. "Fix login.ts lines 148-160, it's broken like this ___"
Part of it is the snappy more minimal UX but also just pure efficacy seems consistently better. Claude does its best work in CC. I'm sure the same is true of Codex.
There's even an official Anthropic VS Code extension to run CC in VS Code. The biggest advantage is being able to use VS Code's diff views, which I like more than in the terminal. But the VS Code CC extension doesn't support all the latest features of the terminal CC, so I'm usually still in the terminal.
Just to not confuse, MCP is like an api but the underlying api can execute an Skill. So, its not MCP vs Skill as a contest. It's just the broad concept of a "flexible" skill vs "parameter" based Api. And again parameter based APIs can also be flexible depending on how we write it except that it lacks SKILL.md in case of Skills which guides llm to be more generic than a pure API.
By the way, if you are a Mac user, you can execute Skills locally via OpenSkills[1] that I have created using apple contianers.
1. OpenSkills -https://github.com/BandarLabs/open-skills
I have started experimenting with a skills/ directory in my open source software, and then made a plugin marketplace that just pulls them in. It works well, but I don't know how scalable it will be.
https://github.com/juanre/ai-tools
So now you need to get CC to understand _how_ to do that for various tools in a way that's context efficient, because otherwise you're relying on either potentially outdated knowledge that Claude has built in (leading to errors b/c CC doesn't know about recent versions) or chucking the entirety of a man page into your default context (inefficent).
What the Skill files do is then separate the when from the how.
Consider the git cli.
The skill file has a couple of sentences on when to use the git cli and then a much longer section on how it's supposed to be used, and the "how" section isn't loaded until you actually need it.
I've got skills for stuff like invoking the native screenshot CLI tool on the Mac, for calling a custom shell script that uses the github API to download and pull in screenshots from issues (b/c the cli doesn't know how to do this), for accessing separate APIs for data, etc.
I think if it literally as a collection of .md files and scripts to help perform some set of actions. I'm excited for it not really as a "new thing" (as mentioned in the post) but as effectively an endorsement for this pattern of agent-data interaction.
So if youre building your own agent, this would be a directory of markdown documents with headers that you tell the agent to scan so that its aware of them, and then if it thinks they could be useful it can choose to read all the instructions into its context? Is it any more than that?
I guess I dont understand how this isnt just RAG with an index you make the agent aware of?
The skills that I use all direct a next action and how to do it. Most of them instruct to use Tasks to isolate context. Some of them provide abstraction specific context (when working with framework code, find all consumers before making changes. add integration tests for the desired state if it’s missing, then run tests to see…) and others just inject only the correct company specific approach to solving only this problem into Task context.
They are composable and you can build the logic table of when an instance is “skilled” enough. I found them worse than hooks with subagents when I started, but now I see them as the coolest thing in Claude code.
The last benefit is nobody on your team even had to know they exist. You can just have them as part of onboarding and everyone can take advantage of what you’ve learned even when working on greenfield projects that don’t have a CLAUDE.md.
Em dash and "it's not X, it's Y" in one sentence. Tired of reading posts written by AI. Feels disrespectful to your readers
Latest version from 2 momths ago, >4700 stars on GitHub
0: https://github.com/whyisdifficult/jiratui
It wasn't possible before for me to do any of this at this kind of scale. Before, getting stuck on a bug could mean hours, days, or maybe even weeks of debugging. I never made the kind of progress I wanted before.
Many of the things I want, do already exist, but are often older, not as efficient or flexible as they could be, or just plain _look_ dated.
But now I can pump out react/shadcn frontends easily, generate apis, and get going relatively quickly. It's still not pure magic. I'm still hitting issues and such, but they are not these demotivating, project-ending, roadblocks anymore.
I can now move at a speed that matches the ideas I have.
I am giving up something to achieve that, by allowing AI to take control so much, but it's a trade that seems worth it.
This is basically a "thinking tax".
If you don't want to think and offload it to llm they burn through a lot of tokens to implement in a non-efficient way something you could often do in 10 lines if you though about it for a few minutes.
If you tell me I didn’t really need a LLM to be able to do all that in a week and just some thought and 10 lines of code would do, I suspect you are not really familiar with the latest developments in AI and just vastly underestimates the capabilities they have to do tricky stuff.
Before LLMs we simply wouldn't implement many of those features since they were not exactly critical and required a lot of time, but now when the required development time is cut signifficantly, they suddenly make sense to implement.
If no anonymous access is provided, is there a way to create an account with a noscript/basic (x)html/classic web browsers in order to get an API key secret?
Because I do not use web engines from the "whatng" cartel.
To add insult to injury, my email is self-hosted with IP literals to avoid funding the DNS people which are mostly now in strong partnership with the "whatng" cartel (email with IP literals are "stronger" than SPF since it does the same and more). An email is often required for account registration.
If theres enough interest, I might replicate some examples in an open source project.
To see if it is easy to digest, no repeated code etc or is it just slop that should be consumed by another agent and never by human.
This feels like a false economy to me for real sized changes, but maybe I’m just a weak code reviewer. For code I really don’t care about, I’m happy to do this, but if I ever need to understand that code I have an uphill battle. OTOH reading intermediate diffs and treating the process like actual pair programming has worked well for me, left me with changes I’m happy with, and codebases I understand well enough to debug.
It's much easier to review larger changes when you've aligned on a Claude generated plan up front.
Right now these are reading like a guide to prolog in the 1980s.
Skills doesn't totally deprecate documenting things in CLAUDE.md but agree that a lot of these can be defined as skills instead.
Skill frontmatter also still sits in the global context so it's not really a token optimization either.
If you are using literally any of Claude Code’s features the experience isn’t close, and regardless of model preference (Claude is my least favorite model by far) you should probably use Claude code. It’s just a much more extensible product for teams.
Codex writes higher quality code, but is slower and less feature rich. I imagine this will change within months. The jury is still out. Exciting times!
At the moment though I also code on and off with an agent. I’m not ready or willing to only vibe code my projects. For one is the fact that I had tons of examples where the agent gaslighted me only to turn around at the last stage. And in some cases the code output was to result focused and didn’t think about the broader general usage. And sure that’s in part because I hold it wrong. Don’t specify 10million markdown files etc. But it’s a feedback loop system. If I don’t trust the results I don’t jump in deeper. And I feel a lot of developers have no issue with jumping ever deeper. Write MCPs now CLIs and describe projects with custom markdown files. But I think we really need both camps. Otherwise we don’t move forward.