
If you are a developer, you have surely heard about Copilot, an AI assistant that autocompletes your code as you write, directly in your IDE (Visual Studio). A marvelous tool, even if it makes you highly dependent very fast (because you get used to not coding), that will definitely boost your productivity, and it only costs 10€. But if you don’t know, Copilot is a MASSIVE loss for Microsoft—it costs way more to run than the subscription price can cover. And just like Netflix, the good (almost free) stuff will not last, and I strongly believe that Microsoft will raise the prices in the near future when the AI bubble bursts!
So, since I got an RTX 3060 12G recently, I can run 7B and 14B models with decent speed (~30 t/s for the 7B and ~20 t/s for the 14B), which got me thinking that I could integrate more self-hosted LLMs into my daily life—making my purchase worth it in no time. One possible application is a Copilot for open LLM: Continue.dev
What is continue.dev ?
Like I said, Continue.dev is a coding assistant that behaves like Copilot for open LLM or any LLM provider, such as OpenAI, Anthropic, Gemini (Google), Ollama, LM Studio, and more. It is an extension that you can add directly from the Visual Studio Code IDE, via the extension panel, that will autocomplete your code as you write, provide a chatbot, allow you to ask the model to perform actions such as editing specific parts of the code, and more!
How to install continue ?
Installing Continue is straightforward, you simply search for Continue in the extension panel on the side menu and install it.

Configuring it to work with lmstudio/ollama
Note : This part suppose that you already know ollama/lmstudio and have it running on your pc
Making it work with Ollama/LM Studio is very easy but not entirely straightforward. First, you need to have Ollama or LM Studio running, along with the model you want to use. Here, I will use LM Studio with Qwen2.5-Coder-7B-Instruct for both chat and completion.
The tool operates in two main modes:
- Chat Mode – A set of tools that work in a chatbot-like manner, allowing you to interact with the LLM. You can chat, ask it to edit code, execute commands, and more.
- Autocompletion – It suggests the next few tokens of code to add to your current line.
From my limited experience, Qwen 2.5 7B starts to work well enough for both chat and autocompletion. However, if you can run higher-parameter models, go for it! Just keep in mind that completion needs to process the prompt quickly—otherwise, it will always lag behind if you type fast.
Running multiple models at once can also be impossible depending on your hardware (which is why I wish to upgrade my GPU to run 32B at high speed!).
Once you have all installed, go to the app icon and :
- On the left panel, you should see an input field for accessing the chatbot. Below the input, you can see the model currently in use. By default (as I write this article), it is Claude 3.5 Sonnet. If you click on the model, you should be able to add a new one. Do so and choose LM Studio or Ollama as the provider. Leave the model set to autodetect to be able to choose from your available models.


- If it doesn’t open, click on the config file underlined link to open the
config.json
. (You can also press “Add Model” again to make the pop-up appear.)
At this point, your config should look something like this:

- Now, to make it work with your local instance of LM Studio/Ollama, add or edit the
apiBase
field to include your server IP.- If everything is running on the same machine, you can use localhost.
- Otherwise, enter the address of the server hosting the model.
- If you want to use a specific model for chat by default, set the
model
field accordingly.
Note: Don’t forget to add /v1/ at the end of the url, because continue won’t add it automatically - You now have your chat functionality setup, but you need to do the same for the tabAutompleteModel, which will configure the autocompletion. Because it need to run very fast, like more than 25 token/s since you will be typing also fast and its only few line completion, you will have to use the fastest model available for your hardware.
- Once everything is setup, see bellow, you should be able to start doing stuff right away!
My feedback of continue
After trying it, I found it to be very efficient and a real boost to productivity. Being self-hosted means you can leverage AI without the risk of data leaks, which is crucial if you work in a restricted environment like FinTech or Big Pharma.
The autocompletion isn’t perfect, it can’t read my mind, but it’s incredibly useful. Being able to complete an obvious piece of code in 2 seconds just by pressing Tab is a game-changer.
Also, the edit functionality (triggered with Ctrl + I) is extremely convenient. You can simply ask your model to edit code for you, making refactoring and bug fixing super easy.
Overall, I’m very surprised and happy with the results, and I highly recommend this tool!
Gabriel