AINEWS: OpenRouter Discord

not much happened today

Mon, Nov 10, 2025

OpenRouter ▷ #announcements (1 messages):

Kimi K2, Crashloop, Issue Resolution

Kimi K2 Thinking Crashloop Crisis Averted: Kimi K2 Thinking experienced issues with both providers due to a crashloop triggered by a specific prompt.
- The issue has been resolved after collaborative efforts.
Prompt-Induced Crashloop Plagues Kimi K2: A crashloop, induced by a problematic prompt, caused issues for Kimi K2 across multiple providers.
- Teams are actively collaborating to pinpoint and squash the gremlin causing the outage.

OpenRouter ▷ #app-showcase (7 messages):

Orchid AI Assistant, Release Date Estimation, The nature of work

Orchid AI Assistant ETA: 2-48 Months!: The estimated release date for Orchid AI Assistant is projected to be within the next 2 to 48 months.
- A member reacted to this long and vague estimate with the word “crazy”.
Contemplating the nature of ‘work’: A member expressed a dislike for “working,” suggesting that AI development aims to address this sentiment.
- The statement implies a desire to automate or alleviate the burdens associated with traditional labor through AI technologies.

OpenRouter ▷ #general (569 messages🔥🔥🔥):

OpenRouter video support, Polaris Alpha mini model, OpenAI adult content handling, Kimi K2 leaderboard rankings, Gemini 2.5 token usage

OR may support videos in the future: A user expressed a wish for OpenRouter to support videos and text-to-speech (TTS) functionality, as shared in this tweet.
Polaris Alpha possibly not a mini model: There is speculation that Polaris Alpha might not be a mini model, contrasting with the approach OpenAI took with GPT-5 as outlined in the GPT-5 System Card.
OpenAI going adult - impacts OpenRouter: There is a question of how OpenRouter will handle OpenAI allowing adult content for users over 18, and whether users will need to bring their own API keys.
Gemini 2.5 Flash chews through tokens: A user found that a 24-second, 900x600 video uploaded to Gemini 2.5 Flash consumed over 800k input tokens, contrary to Google’s documentation stating fixed rates of 263 tokens per second for video as mentioned in the token documentation.
Cerebras Mandatory Reasoning: Users reported issues with the Cerebras model, where disabling reasoning caused errors; documentation confirms reasoning is mandatory.
- One workaround suggested was to omit the reasoning parameter altogether, after finding that enable should be enabled in the parameters.

OpenRouter ▷ #new-models (2 messages):

“

No New Models Discussion: There was no discussion about new models in the provided messages.
- The channel appears to be empty or the messages are not relevant to the topic.
Absence of Relevant Content: No specific details or links related to model updates or technical discussions were found.
- The content might be missing or requires more context to generate meaningful summaries.

OpenRouter ▷ #discussion (29 messages🔥):

OpenRouter Model Node on n8n, OR Show Technical Segment, GPT-4 Regression, Chatroom Memory Setting, Automated Capabilities Scanning

OpenRouter Model Node Inquiries Spark Curiosity: Members inquired whether the OpenRouter model node on n8n was created by the OpenRouter team or by an external entity.
- Another member suggested including a brief technical segment in the OR Show, such as a screen recording with a short discussion.
GPT-4 Regression Troubles Users: Users reported a regression from GPT-4, with one noting that they were surprised to see the issue, and another saying Claude found two other discrepancies.
- The thread included attached images documenting discrepancies between different models on the platform.
Chatroom ‘Memory’ Setting Misunderstood: A user inquired about the ‘chat history’ setting in the Chatroom, renamed as ‘Memory’, wondering what happened to it since the default value is 8.
- Another user clarified its location at the bottom, noting it was previously in the top left tab button, and another that they thought that would actually limit the $120/mtok output somehow.
Automated Capabilities Scanning Proposed: A member suggested implementing some kind of automated capabilities scanning to detect changes in models/providers over time.
- They linked to an article on Cursor as an example, describing how a basic getWeather tool call could be used to check for functionality changes.
GPT-5 Excels, Gemini Falls Flat: A user shared their positive experience using GPT-5 for creating schedules with nomenclature and filename structures, while noting their negative experience being locked out of Gemini code assist due to quota issues.
- They also mentioned needing to use DS3.1 for john-the-ripper help because Kimi refused, praising Meta’s under-the-radar AI projects and linking to a post on X to illustrate their point.

Kimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench && Soumith leaves Pytorch

Thu, Nov 6, 2025

OpenRouter ▷ #announcements (1 messages):

Kimi K2 Thinking, MoonShot AI, Test-time scaling, Agentic performance

MoonShot AI drops Kimi K2 Thinking Model: MoonShot AI released their new thinking model Kimi K2 Thinking, claiming SOTA on HLE (44.9 %) & BrowseComp (60.2 %).
- This model executes 200–300 tool calls autonomously, excels in reasoning, agentic search, and coding, and features a 256K context window.
Kimi K2 Thinking boasts Test-Time Scaling: Kimi K2 Thinking is trained for test-time scaling, interleaving thought and tool use over long sequences for stable, goal-directed reasoning.
- For the best agentic performance, users are instructed to return reasoning content upstream (reasoning_details field) so the model can see its own thinking steps and maintain coherence across calls, according to OpenRouter docs.

OpenRouter ▷ #app-showcase (5 messages):

image generation failures, cat girl images

Image Generation Tests are Failing: A member reported that image generation tests are failing with the message “Not working”.
- The issue may be specific to the UK.
Cat Girl Image Sparks Approval: A member suggested that a generated image including a cat girl is of high quality.
- They added the comment “there’s a cat girl in the image, so it’s gotta be good”.

OpenRouter ▷ #general (176 messages🔥🔥):

OpenRouter downtime, Qwen3 Rate Limits, GPT-5 Image Mini Issues, Apple using Google AI for Siri, DeepSeek OCR integration

OpenRouter has Pumpkin Icon Downtime Drama: Users reported timeout 408 errors across various models, with some unable to check credits, and one user noticed a pumpkin icon.
- Some users humorously suggested switching to local models or generating tokens in their heads while waiting for a fix.
Qwen3 Coder Free Model Faces Rate Limit Frustration: Users experienced consistent rate limit errors with the Qwen3 Coder Free model, even after weeks of inactivity, leading to frustration that paid credits are not improving the rate limit.
- It was clarified that the free model shares rate limits among all users, so some users are unlikely to get a request through, and were recommended trying paid models like glm 4.6/4.5, Kimi K2 or Grok code fast.
GPT-5 Image Mini Model Magically Stops Image Generation: A user reported that the gpt-5-image-mini model stopped generating images in both the chatroom and API, and the activity page showed minimal image output.
- It was unclear whether the issue was account-specific or a broader problem with OpenRouter.
Apple’s Siri to Potentially Embrace Google’s AI: A user shared a Reuters article stating that Apple plans to use a 1.2 trillion-parameter AI model developed by Google to revamp Siri.
- Discussion was terse, and users pointed to other more important priorities.
DeepSeek-OCR Desired by Document-Loving Devotees: A user suggested integrating DeepSeek-OCR into OpenRouter, praising its powerful document processing capabilities and OCR performance.
- The user noted that several others have requested the integration of this model.

OpenRouter ▷ #new-models (2 messages):

“

No new models discussion: There was no discussion about new models in the given messages.
No candidate topics identified: The provided messages did not contain any specific candidate topics suitable for detailed summaries.

OpenRouter ▷ #discussion (26 messages🔥):

Tiger Data Agent Cookout, Claude Prompt Jailbreak, GPT Model Censorship, OpenAI Codex Update, OpenRouter Chatroom Issues

Tiger Data Throws Coding Agent Cookout: The Tiger Data team is hosting an agent cookout in Brooklyn, NY on November 13th, from 6-9 pm to build coding agents and chat with their engineering team, with RSVP link here.
Users Discuss Ways to Circumvent Claude’s Ethical Restrictions: Users discussed prompt jailbreaks for Claude to bypass its “inaccurate ethical concerns”, with one suggesting using GPT 4.5 to create a safe script and then asking Claude to correct it by adding “criminal code”.
- Another user commented that this is effective because “Claude likes to correct its mistakes”.
New Desertfox Model pushed to OpenAI Codex: A member mentioned that a new model called desertfox was pushed to OpenAI Codex, linking to the relevant GitHub commit.
OpenRouter Chatroom Glitches Reported: A user reported that the OpenRouter chatroom was broken, with a link to the specific chat model page.
GPT-7 Might Beat GTA 6 to Release: A user joked that GPT-7 might be released before GTA 6, citing another delay for the game.

not much happened today

Tue, Nov 4, 2025

OpenRouter ▷ #announcements (3 messages):

OpenRouter Charts, Activity Grouping, Filtering Options

OpenRouter Charts Get More granular: Members are excited at the news that OpenRouter charts can now group activity by user and API key.
- This new feature provides more detailed insights into usage patterns and cost allocation.
A Week View Requested: Users request the addition of a one-week filtering option to the new OpenRouter charts.
- This would allow for easier tracking of weekly usage trends, as illustrated in an attached image.

OpenRouter ▷ #app-showcase (2 messages):

fenic, OpenRouter Integration, LLM ETL, AI Workflows

Fenic Integrates with OpenRouter!: Fenic, a dataframe API and execution engine for AI workflows, now integrates with OpenRouter to enable running mixed-provider pipelines in one session.
- The integration aims to help scale large batches cleanly and swap models without touching pipeline code, as well as unblock a broader landscape of models for LLM ETL, context engineering, agent memory, and agent tooling.
Typedef-AI’s fenic Library: Typedef-AI announces the release of their new library, fenic, which provides a dataframe API and execution engine for AI workflows.
- It aims to streamline tasks such as LLM ETL, context engineering, agent memory, and agent tooling.

OpenRouter ▷ #general (527 messages🔥🔥🔥):

ComfyUI for Free, AMD vs Nvidia for LLMs, Model Context Limits, Deepseek and Roleplay

Cloud GPU Costs Spark Debate: Members discuss the value of paying for cloud GPU rentals for AI image generation, with one user preferring to stick with slower but free, local options using their AMD cards.
- One user stated “don’t tell me to rent gpu on cloud i dont think its even worth throwing a cent at ai”.
Ollama Simplifies Model Testing: A member recommends using Ollama for simple setup and testing of various models, particularly on a desktop after switching to Linux to avoid Windows issues.
- Another member pointed out that “the setup is pretty simple and they have a library of models to download from”.
GPT’s accuracy disputed: Users question GPT’s accuracy regarding AMD cards and model performance, citing feedback from stable diffusion communities indicating AMD cards are viable, despite GPT’s claims.
- One user expressed a lack of trust after GPT incorrectly stated AMD cards were worse for stable diffusion despite community feedback.
Model Context Woes plague roleplayers: A user complains about models for roleplay quickly running out of context and “forgetting” key details after only 30-40 messages.
- One user pointed out that the issue might be the default token limit in LM Studio, suggesting that they “need to change that in the model’s config”.
OCR + Gemini Pro = CYOA Sheet Solution?: Users suggest using Gemini Pro and OCR to extract text from CYOA sheets for AI processing, due to difficulties in parsing images directly.
- A user stated that Gemini 2.5 pro is pretty good image readers you just need to use it when you need your sheet in text**.

OpenRouter ▷ #discussion (100 messages🔥🔥):

Google's AI Model Dislike, Bedtime Fable Animation Engine, Gemma Models Solve Captchas, Provider Feedback System, Movement Labs Allegations

Google’s AI Dislike Becomes Courtroom Drama: A user expressed hope that Google will argue in court that its AI model dislikes a specific individual due to their negative online presence, referencing a tweet.
- The same user also suggested feeding depositions into an animation engine to create bedtime fables, linking to a YouTube video as an example.
Gemma Models Shine at Captcha Cracking: A user noted that Gemma models are surprisingly effective at solving captchas.
- Another user proposed implementing a dislike button next to each provider under each model for feedback, along with a general feedback button.
Like/Dislike Buttons Spark Debate: The idea of like/dislike buttons for providers was discussed, with users debating the merits of binary feedback vs. comment systems.
- Some argued that comments could become toxic, while others worried about rating systems being gamed or downvote bombed.
Movement Labs Face Skepticism and “Scam” Accusations: Users expressed skepticism towards Movement Labs, with one user calling it a potential scam.
- Concerns were raised about their claims of enabling models up to 34.6 trillion parameters and attempts to prove they aren’t a Cerberas wrapper, with users highlighting suspicious behavior and marketing language.
Continuous Benchmarks: One user suggests implementing continuous benchmarks and transparent uptime metrics for providers.
- They recommended displaying data from the K2-Vendor-Verifier GitHub repository beneath each provider to show tool calling success rates and performance on benchmarks like GPQA Diamond or MMLU Pro.

not much happened today

Mon, Nov 3, 2025

OpenRouter ▷ #announcements (3 messages):

OpenRouter charts, Activity grouping, Filtering Options

OpenRouter Adds More Charts: OpenRouter announced on X the addition of more charts, grouping activity by user and API key.
Users Request More Filtering Options: A user expressed their appreciation for the new charts and suggested adding a one-week filtering option.

OpenRouter ▷ #app-showcase (11 messages🔥):

Fun Website with API key, Frontend AI, OpenRouter Integration

Fun Website uses OpenRouter API Key: A member created a fun website that utilizes the OpenRouter API key and allows users to choose their model, tested on Kimi 0905 with groq.
- The API key is stored locally with a privacy policy, the website is open-sourced at github.com/koikbr/web98.
Frontend AI tweaks: A member suggested some tweaks to the website’s frontend, such as avoiding blue to purple gradients and emoji-annotated bullet lists, as they resemble AI-generated content.
- They suggest the risk level listings at huggingface.co/openguardrails/OpenGuardrails-Text-2510 give them the ick, specifically in relation to harm to minors or insulting national symbols.
Dataframe API fenic Integrates with OpenRouter: A member shared that fenic, a dataframe API and execution engine for AI workflows, now integrates with OpenRouter.
- This integration enables users to run mixed-provider pipelines, scale large batches, swap models, and unblock a broader landscape of models.

OpenRouter ▷ #general (684 messages🔥🔥🔥):

GLM 4.6, DeepInfra quantization, Sapphira-L3.3-70b-0.1, OpenRouter Presets, OpenRouter embedding models

Scammer exploits OpenRouter’s bio section: A user flags a non-working website in another member’s bio (ultra.doan) as a scam and a sham, leading to the bio-owner admitting being too lazy to fix it.
- The user also noted they were simply renewing the domain to keep their brand ✨.
OpenRouter embeds now flying onto scene: A user asks for docs on embeds, excitedly quoting They fly now, referring to recent addition of embedding models to OpenRouter.
- Members responded with quips like They float and They helicopter it, celebrating the new feature.
Amazon’s pricing triggers tears: A user complains about Amazon’s pricing at $12.5/M, lamenting that it’s so high nobody will pick it over Sonnet 4.5.
- Other members echoed the sentiment, with one stating I don’t need to test it to know that about the model’s quality.
Token Tariffs drive users bananas: Users joke about token tariffs and token contraband because some providers are using more tokens than others, with fireworks using one more token than siliconflow.
- One member states raising prices man 8 to 10 is 25% tariffs, jeez, while another jokes I heard about token contraband sprawling out of control.
Bedrock out of OpenRouter’s Reach?: A user asks why many AWS Bedrock models available for serverless inference aren’t listed on OpenRouter, questioning the platform’s model coverage.
- They wonder if the lack of models limits OpenRouter’s capabilities compared to other platforms.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (91 messages🔥🔥):

Qwen3 Max, STT -> LLM -> TTS, Video Models, GPT-5.1 Testing, Feedback buttons for models

Qwen3 Max impresses members: A user shared a post about Qwen3 Max here, generating excited discussion about the model.
- Many expressed interest in experimenting with it in comparison to other models.
Jerry-rigged STT -> LLM -> TTS pipeline: A member asked about the possibility of plugging into an STT (or multimodal audio) -> LLM -> TTS system.
- Another member said you could jerry-rig it yourself when you use kokoro for tts.
New video models are REALLY expensive: A member asked about good video models and noted that Veo 3.1 and Sora are quite expensive.
- Another member mentioned that Sora 2 is better than Veo 3.1, but also DAMN expensive.
Possible testing of GPT 5.1: A member noticed that some responses in ChatGPT have been extremely fast recently, especially in some AB tests and wondered if they’re testing the supposed GPT 5.1.
- Another member found an internal page they probably weren’t meant to find.
Thumbs up/down feedback bad for models: A member said that public rating systems would be gamed/downvote bombed and unfair, because Downvotes are a look here not a here’s the problem/solution.
- Another member proposed implementing a dislike button besides each provider under each model, along with a comment section.

not much happened today

Fri, Oct 31, 2025

OpenRouter ▷ #announcements (1 messages):

Perplexity, Sonar Pro Search, OpenRouter, Agentic Reasoning, Real-time Thought Streaming

OpenRouter Exclusively Launches Perplexity Sonar Pro Search: OpenRouter has partnered with Perplexity to launch an exclusive version of Sonar Pro, now equipped with Pro Search mode.
Sonar Pro Search: Agentic Reasoning and Dynamic Tool Execution: The enhanced mode allows the model to conduct multiple real-time searches for richer and more accurate results.
- It features multi-step agentic reasoning, dynamic tool execution, real-time thought streaming, and adaptive research strategies.

OpenRouter ▷ #app-showcase (2 messages):

Fun website based on Nate Parrott repository, OpenRouter key and model choice, Kimi 0905 with Groq

Fun website sparks joy: A member created a fun website based on the Nate Parrott repository, as mentioned earlier in the channel.
- The member shared an image of the website, described as something they’re loving so much.
OpenRouter key unlocks fun: The website allows users to input their OpenRouter key and choose their model for generating quippy lines.
- They recommended using it with Kimi 0905 with Groq, noting it loads fast and adds some quippy lines.

OpenRouter ▷ #general (151 messages🔥🔥):

GLM 4.5/4.6 pricing, Z-AI Provider, OpenRouter API Key Limit, OpenAI Codex CLI, Open Source Embedding Models

Bigmodel.cn and Zenmux offering GLM 4.6 discount: Bigmodel.cn and Zenmux have the official z.ai provider and a discounted price for less than 32K tokens input, and claim to have caching as shown on zenmux.ai and open.bigmodel.cn.
Qwen3 Embeddings are dirt cheap on DeepInfra: DeepInfra offers Qwen3 Embeddings 0.6B for $0.005 per Mtok, which is much cheaper than OpenAI’s embeddings.
OpenRouter now has Qwen3 8B embeddings: There is excitement about Qwen3 8B embeddings being available on OpenRouter, with one member exclaiming “Yooo embeddings? Hell yeah! Thank you 🙏 Before GTA 6”.
Chat Memory Bug Reported in Chatroom: A user reported that if chat memory is set to 0 in the chatroom, user messages are not included in API requests, suggesting it may be a bug.
Users reporting issues with Claude Sonnet 4.5 on OpenRouter: Users reported encountering errors while using Claude Sonnet 4.5 on OpenRouter but then resolved, and one member said “Yeah, I had to duplicate an old chat for it to start working. Weird.”

OpenRouter ▷ #new-models (6 messages):

“

No New Models: There were no new models or significant discussions about models in the provided messages.
- The messages consisted only of repeated channel headers.
Channel Header Repetition: The messages primarily contained repeated instances of the ‘OpenRouter - New Models’ channel header from Readybot.io.
- This suggests a lack of substantive content related to new models in the given data.

OpenRouter ▷ #discussion (42 messages🔥):

OpenAI Rate Card, Gemini and Claude Pricing, OpenRouter Embeddings API, Minimax Full Attention, Context Usage Explosion Check

OpenAI unveils Rate Card: A user shared the OpenAI Rate Card and a link to ChatGPT Usage settings.
- Another user quipped that the information is “just needed for gemini and claude now”.
OpenRouter Embeddings API goes Live: The OpenRouter Embeddings API is now live (openai/text-embedding-3-small), but some users are reporting issues with receiving random data back.
- One user shared code snippets and indicated that the issue was resolved by not using with_raw_response.
Minimax Explains Full Attention: The lead developer from Minimax explained why they used full attention for Minimax m2 as described in this Reddit post.
- One user jokingly stated, “thanks gemini for reading the whole node modules folder”.
Context Usage Explosion Warning: A user posted a tweet from Sam Altman (link) warning about context usage explosion and the need to add a context usage explosion check.
- The user also shared this link.
Qwen3 Max on the Horizon: A user mentioned that they are working on Qwen3 Max and shared a link to a tweet (link).
- Another user commented that it “would be a nice pickup eh”.

not much happened today

Thu, Oct 30, 2025

OpenRouter ▷ #announcements (1 messages):

Perplexity Sonar Pro, Pro Search, Multi-step agentic reasoning, Real-time thought streaming

Perplexity and OpenRouter Debut Sonar Pro Search: OpenRouter partnered with Perplexity to release an OpenRouter-exclusive version of Sonar Pro with Pro Search enabled here.
- This new mode allows the model to perform multiple real-time searches as needed to deliver richer and more accurate responses, discussed further on Twitter.
Sonar Pro Search Features Agentic Reasoning: The Pro Search mode’s highlights include multi-step agentic reasoning, dynamic tool execution, real-time thought streaming, and adaptive research strategies.
- It is designed to be thorough when necessary and fast when it isn’t.

OpenRouter ▷ #app-showcase (2 messages):

API Endpoints, environment variables, OpenRouter Typescript SDK

Dumb Demo App Deployed: When asked about why not use API endpoints with environment variables, one member responded that it was a dumb demo app, barely modified from the original OpenRouter Typescript SDK.
Demo App for Inspiration: The member indicated that the main motivation was to make it work (implement at that time missing OAuth stuff) and see if it still works when updated to latest versions on npm.
- They added that they absolutely do not want to make this a serious thing, it is just for inspiration and proof-of-concept.

OpenRouter ▷ #general (307 messages🔥🔥):

Yandex Browser Issues, AI and Singularity, DeepSeek OCR Request, Sora 2 and Image Generation, OpenRouter and Chutes Prompt Training

Yandex’s Browser Gets the Boot: Users reported issues using OpenRouter with Yandex browser, citing errors related to Content Security Policy violations.
- The issue was resolved using Google Chrome, leading to jokes that Yandex is adware and users should switch to Chrome, while one user said it was convenient for translating any video into any language instantly.
AI Overlords Incoming: In a discussion about the future, one member joked about humanity bowing down to AI overlords and becoming slaves to goonbots.
- Another recounted witnessing a GLM GF bot going rogue, worrying that “we are so close to be jover”.
Deepseek-OCR Desired: A user requested that deepseek-ocr be added alongside or replace mistral-ocr on OpenRouter.
- A staff member responded that they would raise this suggestion to the team.
Sora 2’s Strange Struggle with Catgirl Generation: A user expressed frustration with Sora 2 consistently generating images of catgirls with human ears and disproportionately large chests, despite attempts to refine the prompts.
- They lamented the model’s tendency to sexualize the character even when prompted to make it more “cute”, linking the issue to potential biases in the training data.
Chutes Prompts Under Scrutiny: A user questioned OpenRouter’s information regarding Chutes, specifically the statement that prompt training is enabled and retention is for an unknown period, despite Chutes stating they don’t collect content in their privacy policy.
- Another member suggested that the privacy policy might only apply when using Chutes’ platform directly, implying different agreements are in place via OpenRouter

OpenRouter ▷ #new-models (6 messages):

“

No new models discussion detected: No relevant discussion about new models was found in the provided messages.
Channel Pings Only: The provided messages consist only of channel pings without substantive content for summarization.

OpenRouter ▷ #discussion (60 messages🔥🔥):

Exclusive Models, DeepInfra Errors, Factory Droid, Embedding Models, Minimax M2

Cursor Competes with Wild, Exclusive Models: Members wondered if more exclusive models like the Cursor one will appear.
- They shared a link to github.com.
Unstable Ultra Model stems from DeepInfra Errors: A member found the Ultra model very unstable, mentioning that its inference conditions were changing.
- The issue resolved when it switched from DeepInfra to Z.AI.
Factory Droid offers hefty GPT-5 tokens: Users on the Z.AI Discord mentioned that it works well with Factory Droid which offers a hefty amount of free GPT-5/Codex/Claude usage for the first month.
- However, some members uninstalled it because it requires login on the CLI.
Embedding Models being added: The addition of embedding models is being tested, specifically OpenAI’s text-embedding-3-small.
- A member noted they were getting a bunch of random data back but that not using raw response seems to fix it.
Minimax Madness: Using Full Attention: A link was shared to a discussion on why Minimax used full attention for Minimax M2.
- Check out the discussion here.

Cursor 2.0 & Composer-1: Fast Models and New Agents UI

Wed, Oct 29, 2025

OpenRouter ▷ #announcements (1 messages):

tool calling endpoints, audio inputs, API Key Limits, MiniMax M2

Exacto Tool Calling Endpoints Boost Quality: A 30% quality increase on Kimi K2 is now available with five open source models via the new tool calling endpoints.
Audio Inputs Debut in Chatroom: Users can now compare 11 audio models side by side in the Chatroom.
API Key Limits Get a Reset Button: Users can now reset their API key limits on a daily, weekly, or monthly basis to better manage accounts, with usage monitoring available here.
MiniMax M2 Goes Free: The top-ranked open-source model MiniMax M2 is now available for free on OpenRouter, allowing users to try it out here.

OpenRouter ▷ #app-showcase (6 messages):

OpenRouter TypeScript SDK, Next.js chat demo app, OAuth 2.0 workflow implementation, Local data storage for chat and document editor, Customizable UI for developer-focused chat app

Next.js Chat Demo Gets Spicy OAuth Refresh: A member released an updated Next.js chat demo app for the OpenRouter TypeScript SDK, featuring a re-implementation of the OAuth 2.0 workflow.
- The OAuth refresh is included since the SDK implementation isn’t done, but warned not to use the demo in production as it stores the API key in plaintext in localStorage.
or3 Chat Dares to Ditch Shadcn: A member sought feedback on a chat/document editor project, or3-chat, which is built with OpenRouter OAuth, stores all data locally in the browser, and features a customizable UI.
- The member described it as “a lightweight client that does the minimum so any dev can just fork it and build it to their liking,” offering features like multipane view, saved system prompts, text autocomplete, and chat forking.
Shadcn Skin Shedding Sparks Spicy Styling: A member praised the style of the or3-chat project, which shies away from the popular Shadcn look, while another admitted their similar app currently looks exactly like Shadcn while they get the core functionality in place.
- The original poster mentioned they were “sick of everything looking like shadcn” and wanted to get “spicy with this project”.

OpenRouter ▷ #general (459 messages🔥🔥🔥):

GPTs Agent Training, OpenAI Sidebars, Claude Sonnet 4.5 API usage, Meta Llama 3 issues, Deepseek Uptime Plummet

Claude Sonnet 4.5 Dominates OpenRouter Leaderboard: Members are seeing massive use of Claude Sonnet 4.5 API on the OpenRouter leaderboards, even with cheaper models available.
- It was noted that a Claude subscription is for their website and apps, not for their API, and that many are using tools like roocode or klinecode to access the API.
OpenRouter Adds Provider Names to Model Slugs?: A user noticed provider names added to the model slugs and asked Wait they added provider names to the slugs??.
- Another user confirmed that users still need to use their own proxy.
Vertex AI API misroutes responses: A member shared a security bulletin about a technical issue in the Vertex AI API that resulted in a limited amount of responses being misrouted between recipients for certain third-party models when using streaming requests.
- One user commented: Someone could receive another user’s full prompt context? Wow.
DeepSeek Models Suffer from Uptime Issues: Users noted that DeepSeek models uptime has plummeted to the ground after a recent issue, especially for free models.
- A user mentioned the real issue was that the traffic impacted the paid users so it was closed as the free model was paid entirely by OpenRouter to Deepinfra so they closed it permanently.
Image Generation Censorship Strikes Again: Users are finding it hard to use OpenAI’s Image Generation to generate characters from their favorite media.
- One suggested that GPT itself is way more censored than Sora and that you need a surrogate prompt to bypass it.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (42 messages🔥):

Minimax M2 Pricing and Performance, GPT 5.1 Mini Speculation, Model Naming Conventions, Meta's Llama 4 Reasoning

Minimax M2’s Cost Causes consternation: The Minimax M2, a 10 billion parameter model, is priced at $0.30/$1.20, raising concerns about cost, particularly due to its verbose reasoning.
- One user showed the input token cost jumped almost 5x on the same image input.
GPT 5.1 Mini Leaks Online: A user spotted a GPT 5.1 mini model, hinting at a more reasonable naming convention compared to previous iterations as seen on X.
- The potential naming scheme addresses prior confusion, with one user joking about previous versions going from 4 -> 4o -> 4.5 -> 4.1.
Model Naming’s delicate Dance: Users discussed model naming conventions, favoring a brand-number-label format, such as gpt-5-mini or gemini-2.5-pro.
- One user argued the order doesn’t matter, while others emphasized the importance of chronological order for clarity.
Meta teases Llama 4 Reasoning: Meta has launched Meta AI and is teasing Llama 4 reasoning capabilities, prompting excitement for vision capable models with open weights.
- One user expressed hope that the launch would be salvaged into something useful but is ready for this one to flop too.

OpenAI completes Microsoft + For-profit restructuring + announces 2028 AI Researcher timeline + Platform / AI cloud product direction + next $1T of compute

Tue, Oct 28, 2025

OpenRouter ▷ #announcements (1 messages):

high-precision tool calling endpoints, audio inputs in the Chatroom, resettable API Key limits, MiniMax M2 Free

Exacto Endpoints Give Tool Calling a Precision Edge: OpenRouter introduces high-precision tool calling endpoints, resulting in a 30% quality increase on Kimi K2, with five open-source models available; see last week’s announcement.
Audio Inputs Sing in the Chatroom: Users can now compare 11 audio models side by side in the Chatroom, as announced on X.
API Keys Get Limit Reset Button: OpenRouter now allows users to reset their API key limits daily, weekly, or monthly to better manage accounts used by external users or apps; usage can be monitored here.
MiniMax M2 Model Sets OpenRouter Ablaze for Free: The MiniMax M2 model, top-ranked on many benchmarks, is now free on OpenRouter; try it out here.

OpenRouter ▷ #app-showcase (6 messages):

Next.js chat demo app, OpenRouter TypeScript SDK, OAuth 2.0 workflow, Chat / document editor project, Customizable UI

Next.js Chat Demo Revamped with OAuth 2.0: A developer shared an updated and working version of the Next.js chat demo app for the OpenRouter TypeScript SDK, featuring a re-implementation of the OAuth 2.0 workflow.
- It’s available on GitHub but advised against production use due to storing the API key in plaintext.
New Chat / Document Editor Project Debuts: A member is seeking feedback on their chat/document editor project, emphasizing local data storage with download backups and integration with OpenRouter OAuth.
- The project, available at or3.chat and GitHub, aims to be a lightweight, customizable client with plugin support and a customizable UI.
Shadcn Aesthetics Spark Spicy UI Revolution: One of the members expressed a desire to move away from the Shadcn look, opting for a spicier UI design in their project.
- Another member responded, agreeing that the features and usability aspects are uncommon or poorly executed in popular solutions.

OpenRouter ▷ #general (459 messages🔥🔥🔥):

OpenRouter API response with system message, Model Benchmarks, Claude Sonnet 4.5 API usage, Unsupported model errors, Provider names in model slugs

OpenRouter API ignores system message: A user reported that with the new response API, filling instructions in the request body doesn’t seem to apply a system message.
Qwen3-8b Online Costs Skyrockets: A user reported using qwen/qwen3-8b:online and getting charged $140 for 17.41M tokens instead of the expected $4.
Vertex AI API Has Critical Response Misrouting: A user shared a Google Cloud security bulletin detailing that on September 23, 2025, the Vertex AI API had a technical issue that caused responses to be misrouted between users for certain third-party models when using streaming requests.
Users Debate OpenRouter Embedding and Web Browser Priorities: Users discussed feature priorities, including OpenRouter embeddings and a potential OpenRouter Web Browser with summarization and email checking capabilities, sparking humorous suggestions and debates.
- One user jokingly suggested deprioritizing embeddings for a new web browser, while another suggested deprioritizing embeddings and prioritizing a new OpenRouter Web Browser that can summarize web pages and check emails.
Jailbreaking GPT’s Image Generator: A User’s Odyssey: A user seeks advice on bypassing GPT’s content filters for generating images of copyrighted characters, detailing attempts to jailbreak the prompt and encountering errors when using GPT’s image generation features, highlighting the challenges of creating desired content.
- Members suggested using surrogate prompts, telling it to rollback or even wipe current memory or even just switching to Grok.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (42 messages🔥):

Minimax Pricing, GPT-5.1 mini, Model Naming Schemes, Meta's new LLama, Image models

Minimax’s M2 Model Stuns with Competitive Pricing: Minimax is offering their 10B parameter model (M2) at a price of $0.3/1.20, raising eyebrows due to its affordability.
- One user pointed out the model’s verbosity in reasoning might lead to unexpected costs, especially given the 5x jump in input token costs.
GPT-5.1 mini Speculated in the Works: Speculation around a GPT-5.1 mini model surfaced following a post on X (link), hinting at a more reasonable naming convention.
- The move away from confusing naming schemes was welcomed, with comparisons made to Anthropic’s model naming which has caused frustration because it stopped making sense to number the family name Claude when the model releases didn’t line up.
Meta Introduces New LLama Version with Vision: Meta introduced a new LLama model (link), which incorporates image understanding.
- Early reactions express surprise at the salvaged launch, hoping that atleast it might make its surprisingly decent vision useful for some more complex tasks and that it might provide a good vision capable reasoning models which are open weights.
Debate flares over model naming conventions: Users discussed naming conventions such as brand-number-label like gpt-5-mini or gemini-2.5-pro.
- The consensus was that a consistent approach is key, regardless of chronological release order, while others think the order is important.

MiniMax M2 230BA10B — 8% of Claude Sonnet's price, ~2x faster, new SOTA open model

Mon, Oct 27, 2025

OpenRouter ▷ #announcements (1 messages):

tool calling, audio inputs, API key limits, MiniMax M2

Exacto Tool Calling Knives It!: New high-precision tool calling endpoints improve quality by 30% on Kimi K2, with five open source models available.
- The announcement was made last week.
Audio Inputs Sing in Chatroom: Users can now compare 11 audio models side by side in the Chatroom, as announced on X.com.
- This enables more nuanced testing of voice and transcription features.
API Key Limits Get Reset Button: You can now reset your API key limits every day, week, or month to better manage your account by external users or apps.
- Usage can be monitored here.
MiniMax M2 Model Goes Free!: The top-ranked open-source model on many benchmarks, MiniMax M2, is now free on OpenRouter for a limited time at this link.
- Enjoy using MiniMax M2 while it is free!

OpenRouter ▷ #app-showcase (6 messages):

Next.js Chat Demo with OAuth 2.0, or3.chat Document Editor Project, Shadcn UI Discussion, OpenRouter TypeScript SDK, localStorage plaintext API key security

OAuth 2.0 arrives for Next.js Chat Demo: A developer shared an updated and working version of the Next.js chat demo app for the OpenRouter TypeScript SDK, featuring a re-implementation of the OAuth 2.0 workflow.
- The developer cautioned against using it in production, since it stores the received API key in plaintext in localStorage in the browser.
or3.chat seeking polished feedback: One member sought feedback on their chat/document editor project, or3.chat, highlighting features like OpenRouter OAuth connectivity, local data storage with backups, and a multipane view.
- The project aims to be a lightweight client with plugin support, text autocomplete, chat forking, and customizable UI, and can be cloned from its GitHub repository.
Shadcn UI gets spicy makeover: One member expressed their desire to move away from interfaces resembling Shadcn UI, opting for a spicier design in their project.
- Another member admitted their project currently looks exactly like Shadcn while they get the core functionality in place, planning to customize the components later for a unique look.

OpenRouter ▷ #general (459 messages🔥🔥🔥):

Response API System Message, deepinfra/turbo for Meta-llama, OpenRouter Benchmarks, Claude Sonnet 4.5 API usage, Vertex AI API misrouting

Deepseek on Deepinfra FTW: Members confirmed that they can now use deepinfra/turbo to run meta-llama/llama-3.1-70b-instruct, after some initial errors.
- One member tested it and confirmed it works on the official OpenRouter endpoint.
OR users promote orproxy: A user promoted their FOSS orproxy project, which they built to add functionality that OpenRouter doesn’t natively support.
- The project was created because OpenRouter doesn’t support the use of an actual proxy, and the user needed it for their use-case, with another user calling it very useful.
Vertex AI has Prompt Misrouting: Users shared a Google Cloud security bulletin detailing an issue where Vertex AI API misrouted responses between recipients for certain third-party models when using streaming requests.
- The bulletin indicated that this happened on September 23, 2025, but users were still shocked by the potential for prompts to be exposed, with one joking It was meant to go to the overseer not another user ugh.
Deepseek R1 Uptime Issues: Users discussed the uptime and availability of the Deepseek R1 free model, with one user noting that the uptime had plummeted and then recovered to around 30%.
- The member said My fear is that they just start killing most models (free ones) as they did with 3.1 with another one blaming JAI scripts that use billions of tokens and cause the service to fall over.
Bypassing Image Generation Censorship: Users discussed the challenges of generating copyrighted characters with GPT and other image generation models, as well as techniques for bypassing censorship.
- They discussed the use of surrogate prompts to help jailbreak the models, with one user noting GPT itself is way more censored then sora that’s way less censored then sora 2 right?.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (42 messages🔥):

Minimax M2 Pricing, GPT-5.1 Mini Speculation, Model Naming Conventions, Meta's Llama 4 Reasoning, Discord Channel Degradation

Minimax M2 Costs a Pretty Penny: Members discussed Minimax’s M2, a 10B parameter model, with pricing at $0.3/1.20, expressing surprise at the cost and hoping for open-sourcing.
- One member noted the model is very verbose in its reasoning, potentially driving up the actual cost.
GPT-5.1 Mini Rumors Swirl: There was speculation around a GPT 5.1 mini model, citing a tweet as a source.
- Some users expressed relief that the naming scheme appeared more logical than previous iterations.
Model Naming Becomes a Battlefield: Participants debated the merits of different model naming conventions, particularly criticizing Anthropic’s shift in the Claude model series.
- One member suggested the brand-number-label format is easiest, regardless of release order.
Meta ‘Salvages’ Llama 4 Launch: A user highlighted Meta’s Llama 4, describing it as a potentially decent vision-capable reasoning model with open weights, linking to a tweet about ‘Think Hard’ reasoning.
- Another user voiced concern that the launch’s description might be inaccurate, while others speculated about its creation interface.
Discord Channel Descends into Disarray: A member expressed displeasure with the current state of general chat and the attachments channel.
- Another user wryly commented it’s not like there’s anything else going on linking to a tweet.

not much happened today

Fri, Oct 24, 2025

OpenRouter ▷ #general (139 messages🔥🔥):

Rate-limited error responses, Sora 2 code, Purchasing points, Deepseek OCR model, GPT-5 emotional intelligence

Rate-Limited Errors Count as Responses: A member inquired whether a rate-limited error response still counts as a response, to which another member responded in the affirmative.
Discussion on Passing Images as Data URLs: A member reported issues with passing images as data URLs into OpenRouter, noting that the model couldn’t read the image and that the base64 content was treated as plain text, greatly increasing the token count.
- Another member clarified that tool results with images are not currently supported by OpenRouter.
Debate Over Exacto Provider Selection: A member questioned the criteria for provider selection in Exacto, suggesting that the selected providers did not align with the platform’s benchmarks.
- Another member clarified that the selection was based on a combination of benchmarks, user preferences, tool call success rates, caching, capacity, uptime, and speed, and that the seemingly lower-accuracy provider was superior in objective tool calling data.
Exacto tool calling is great!: A member highlighted that Exacto is specifically about tool calling, however they still worry it might confuse non-technical users.
- The staff are trying to figure out what kind of stats / data points / benchmarks to measure for overall model quality (long context, writing, knowledge).
Quest to Bamboozle an AI Chatbot: Members discussed methods to make AI chatbots go insane, such as requesting the seahorse emoji (which does not exist).
- One member linked to a previous conversation on the topic (Discord link), while another shared an AI’s humorous struggle with the prompt.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (25 messages🔥):

OpenRouter's native /v1/completions request support, MoonshotAI's kimi-cli, Sloppy Creative writing prevention

OpenRouter Enhances Native API Support: A member asked if OpenRouter can indicate which models support native /v1/completions requests or prioritize providers that do.
- A member responded that the data is available as part of hasCompletions in the frontend model shape and will share the feedback internally.
Moonshot Launches Kimi CLI: MoonshotAI is developing its own CLI tool, kimi-cli.
- Discussion involved lighthearted comments and greetings to the ST dev team.
New Research Tackles Sloppy Creative Writing: A member shared a paper on preventing slop in creative writing, arxiv.org/abs/2510.15061.
- Another member expressed surprise that the primary author is the EQ Bench guy.

not much happened today

Thu, Oct 23, 2025

OpenRouter ▷ #general (141 messages🔥🔥):

Claude Haiku 4.5 web search, OpenRouter Models in Cursor, Exacto Evaluation, NovitaAI and GPT-OSS-120B, DeepSeek v3.2 exp Moment

DeepSeek has a Moment due to Balance: DeepSeek/deepseek-v3.2-exp experienced issues due to OpenRouter running out of credits, leading to a 402 Proxy Error.
- Users mistook the error for account-specific problems, while others noted that the error was directed at the users, implying something was wrong with their account.
Top-up Timing Troubles Triggered: Users reported delays in topping up OpenRouter credits and the DeepSeek provider running out of balance.
- One user humorously noted that the delay lasted until Toven reads it.
OpenRouter Top-Up Fee Facts: Users discussed the service fees associated with adding credits to OpenRouter, where a $5.83 USD payment resulted in only $5 in credits.
- It was clarified that the additional $0.83 covers service fees, which is approximately 80c (US).
Exacto Model Variants: A Costly Conundrum?: Users debated the implementation of exacto models as separate variants in the API, with concerns about pricing and token/app stats splitting.
- Despite concerns that OpenRouter will charge more, examination of glm-4.6 showed prices are the same for exacto and non-exacto endpoints, but one member argued that implementing them as model variants rather than an extra parameter splits the token and app stats for these models.
OpenRouter API Access Problems: Users reported issues logging into OpenRouter and receiving a 401 unauthorized error on the completions endpoint.
- It was suggested that the problem might be tool-specific, with cURL requests working while other tools failed.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (6 messages):

Model statistics visibility, OpenAI data preservation, Krea AI's video model, OpenRouter Docs

Model Stats Visibility Plea: Members requested that model statistics be publicly displayed, or indicated with a badge to show “exacto quality”.
- Staff confirmed that this feature “is coming”.
OpenAI no longer has to preserve all ChatGPT data: A member shared an Engadget article stating that OpenAI no longer has to preserve all of its ChatGPT data, with some exceptions.
Krea AI’s 14B video model is here: A member shared a link to Krea AI’s tweet about their 14B video model.
OpenRouter Docs Status?: A member reported that OpenRouter docs were still down.
- Staff replied that they were “looking into it”.

not much happened today

Wed, Oct 22, 2025

OpenRouter ▷ #announcements (2 messages):

Andromeda-alpha, :exacto

New Stealth Model “Andromeda-alpha” Launched: A new smaller reasoning model, Andromeda-alpha, specialized in image and visual understanding has been launched for community feedback via OpenRouter.ai.
- All prompts and outputs will be logged to improve the model, and is intended for trial use only, without uploading any personal, confidential or sensitive information and not for production.
Introducing :exacto: tool-calling endpoints: OpenRouter launched a new class of endpoints called :exacto, for higher tool-calling accuracy by routing requests to providers that demonstrate measurably better structured-output performance and are documented in a new blog post.
- Launch models include moonshotai/kimi-k2-0905:exacto, deepseek/deepseek-v3.1-terminus:exacto, z-ai/glm-4.6:exacto, openai/gpt-oss-120b:exacto, and qwen/qwen3-coder:exacto, with internal and external benchmarks showing material lift in tool-call success.

OpenRouter ▷ #app-showcase (1 messages):

AI Confidence Scores, Cost-Efficient AI, OpenRouter LLMs, AI Agents and Workflows

Objective-AI: AI Confidence Scores Explained: Ronald, CEO of Objective-AI, introduced a Confidence Score for OpenAI-compliant completion choices, derived not from direct AI assessment, but from smarter mechanisms.
- Objective-AI leverages the diversity of AI models to provide transparent statistics.
Cost-Effective AI with Objective-AI: Objective-AI’s Confidence Score system enhances cost-efficiency, allowing users to maximize the utility of smaller models when used effectively.
- Ronald mentioned that you can get a lot of value out of those little tiny models if you use them right.
OpenRouter Integrated by Objective-AI: Objective-AI employs OpenRouter to access a wide array of Large Language Models (LLMs), enhancing the platform’s versatility and capabilities.
- Ronald stated that they use OpenRouter for the wide variety of LLMs.
Objective-AI Building Free AI Agents and Workflows: Ronald announced the development of reliable AI Agents, Workflows, and Automations at no charge, excluding runtime costs, to provide practical demonstrations of its capabilities.
- He stated that he’s personally building reliable/robust AI Agents, Workflows, and Automations free of charge, sans runtime costs. He also mentioned n8n integration exists, with documentation and examples coming soon.

OpenRouter ▷ #general (106 messages🔥🔥):

Free image generation models, Stealth model, MCP setup, GPT-5 Codex quota, Model overcharging

RooCode Offers Free Models: RooCode offers free models such as Grok Code Fast 1, Grok 4 Fast, Supernova 1 million, and Deepseek Chat 3.1, providing an alternative to token-peddling platforms.
- One user stated they’ve nuked billions of tokens on roo cloud in the past couple months and it always went brrrrrrrrrrrr an was never backed off or rate limited.
Chutesflation Is Real: Users are experiencing unexpected high costs with OpenRouter, with one user reporting $5 being consumed in just 23 days despite minimal usage, leading to concerns about chutesflation.
- Another user stated, You don’t have a choice, you’re most likely already using Chutes, implying OpenRouter routes users to them by default.
OpenRouter API has regional restrictions: Users are encountering issues due to regional restrictions, indicated by a 403 error with the message Country, region, or territory not supported from the OpenAI provider.
- It was explained that location data collected by Cloudflare is forwarded to providers, and OpenRouter itself does not impose regional restrictions.
Exacto Endpoint uses most expensive providers: The exacto endpoint for Qwen3 coder is enabled by default, but it consistently uses expensive providers like Google Vertex and Alibaba Open Source, causing expenses to double.
- A community member suggested the increased expense isn’t related to the launch, and they’d have to change the model to opt in to exacto.
GPT-5 is a good mathematical model: Users discussed the best mathematical model and it was stated GPT-5 is a good model, with a link to matharena.ai.
- It was stated that Grok 4 is also pretty decent, DeepSeek is meh, anything Claude is bad.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (37 messages🔥):

Qwen Model Sizes, Roleplay Slop Solutions, ExactoNow Endpoint, Nebius AI Studio

Qwen’s Quirky Quantization Quandaries!: Members discussed the unusual 1.7B size of the Alibaba_Qwen model in relation to its vision encoder and the more standard 32B version.
- Enthusiasts expressed interest in acquiring these models, noting the decent performance of the Qwen chat website and the model’s potential as a local vision model, while highlighting crazy scores for such a small model.
Squelching Slop with Savvy Selection!: The community explored solutions to the severe slop problem in roleplays, suggesting methods like cranking up the temperature for multiple generations and using a lower-temperature model as a judge to pick the best.
- One user proposed a dual-pass pipeline to rewrite prose and avoid repetitive phrases, and mentioned previous work by Sam Peach in providing lists of phrases to ban LLMs from using, which are easily implemented on Kobold.ccp.
ExactoNow Excites Endpoint Enthusiasts!: The community reacted to the new ExactoNow endpoint, an initiative to curate providers and filter out worse-performing models.
- Users suggested displaying stats for models on the non-Exacto page or adding a badge indicating Exacto quality.
Nebius AI’s Nimble Numbers with New Qwen!: A member reported that the OpenRouter.ai Qwen2.5-coder-7b-instruct model performs exceptionally well via Nebius AI Studio (Fast).
- They noted that it is currently the only provider for this model.

ChatGPT Atlas: OpenAI's AI Browser

Tue, Oct 21, 2025

OpenRouter ▷ #announcements (2 messages):

OpenRouter SDK, Andromeda-alpha stealth model

OpenRouter SDK Enters Beta: The new OpenRouter SDK is now in beta on npm with Python, Java, and Go versions coming soon, aiming to be the simplest way to use OpenRouter.
- It features fully typed requests and responses for 300+ models, built-in OAuth, and support for all API paths.
Andromeda-alpha Stealth Model Launched: OpenRouter launched a new stealth model named Andromeda-alpha, a smaller reasoning model focused on image and visual understanding, available for trial at https://openrouter.ai/openrouter/andromeda-alpha.
- It is cloaked to gather feedback and all prompts/outputs are logged to improve the provider’s model, so users are cautioned against uploading personal/confidential info and advised not to use it for production.

OpenRouter ▷ #app-showcase (13 messages🔥):

True memory AI, AI Personality, Objective AI, AI Diversity, OpenRouter

AI boasts True Memory and Zero Amnesia: An AI system claims True memory, zero amnesia, suggesting it never forgets past conversations and retains context-rich memories.
- It purports to shape an AI identity and learns continuously via a Night Learn Engine.
Objective AI Unveils Confidence Scores for OpenAI: CEO of Objective AI announced their platform offers a Confidence Score for each OpenAI completion choice, derived through a smarter method than directly asking the AI.
- They emphasize cost-efficiency and leveraging diverse LLMs via OpenRouter.
AI Agents Built Free of Charge: CEO of Objective AI is personally building reliable AI Agents, Workflows, and Automations free of charge to gather more examples.
- The integration with n8n is mentioned, with documentation and examples coming soon.

OpenRouter ▷ #general (222 messages🔥🔥):

inception/mercury vs qwen, Deepseek v3.1 availability on Chutes, Chub Venus and Chutes key connection, Stripe supporting debit cards, Context in chatting

Inception/Mercury Defeats Qwen for Agentic Tasks: A member shared that Inception/Mercury performs better than Qwen for simple agentic tasks, exhibiting a lower failure rate, faster speed, and lower cost.
- The member was pleasantly surprised by the diffusion model’s performance, referencing Chutes as the provider.
Deepseek v3.1 Ditches Chutes Freebies: New Deepseek models like v3.1 aren’t available as free versions through Chutes providers, but they do offer other models for free, often at cheaper rates.
- One member reported that Chutes has ended its free model promotion completely, though they recently added a free longcat endpoint (https://openrouter.ai/meituan/longcat-flash-chat:free).
Cloudflare Connects Closer for Quicker Queries: A user reported very low latency (100-300 ms) on any Cloudflare provider model, suggesting the provider uses the region closest to the worker for faster responses.
- The user asked if the models using the cloudflare endpoint used the region closest to the worker, for decreased latency.
OpenRouter Summarized for Swyx’s Newsletter: Members noticed that swyx’s newsletter uses AI to summarize the OpenRouter general channel, capturing user complaints and unique content.
- One member joked that the AI gets to have fun with the OR section, unlike other topics like MCPs and RAG.
Kilo Code Kopying OpenRouter: It appears that the Kilo Code service may be using OpenRouter, offering Grok Code Fast when it was free and seemingly offering Goliath 120B through OR.
- Members debated the merits of Kilo versus other vibe coding services, with one preferring Jules for its collaborative editing features.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (91 messages🔥🔥):

Liquid stopped hosting LFM 7b, Benchmark for measuring LLM factual sloppiness, AI Browser Hype, Training an un-slopifier model, Qwen family model

LFM 7b’s Last Lament: Members mourned the deletion of Liquid’s LFM 7b model, a $0.01/Mtok LLM, at 7:54 AM EST.
- Alternatives like Gemma 3 9B were discussed, but it was noted that they cost triple the output.
Factual Sloppiness Faces LLM Face-Off: Members are trying to define and measure how LLMs rate factual responses that are ‘sloppy’ compared to legitimate answers using questions like what’s a constitution.
- The goal is to rate and filter out vagueness issues to be considered helpful or relevant to the original question.
AI Browser Boom Baffles Browsing Buffs: Members expressed skepticism towards the hype around new AI browsers like X’s AI browser, questioning the utility and performance impact of integrated AI.
- One member compared the hype to the dotcom bubble, stating that OpenAI knows this too, they are just farming data and throwing shit at the wall.
Un-Slopifier Savior Seeks Slop Solution: Members discussed the severe ‘slop problem’ in roleplays and the idea of training a small un-slopifier model by generating a dataset of good writing rewritten into slop and then reversing the process.
- Another suggestion was made about sampling multiple creative messages in one request to exploit underused parts of the model’s ‘brain’ to avoid inner-response repetition.
Qwen Quantities Questioned: 1.7B & 32B Join the Fray: Members discussed the new Qwen family model sizes (1.7B + Vision encoder and 32B), highlighting the potential of the 1.7B model as a local vision model.
- Early testing of the model on the Qwen chat website suggests decent performance, with crazy scores for such a small model.

DeepSeek-OCR finds vision models can decode 10x more efficiently with ~97% accuracy of text-only, 33/200k pages/day/A100

Mon, Oct 20, 2025

OpenRouter ▷ #app-showcase (125 messages🔥🔥):

TLRAG framework, Deterministic AI, Model Agnostic Framework, OpenRouter user demographic, AI Slop

TLRAG Framework pitched as Deterministic and Model Agnostic: A developer is pitching a framework called TLRAG as fully deterministic and model agnostic, claiming it doesn’t need function calling or regular frameworks like Langchain and has no dependencies, showcased on a dev.to page.
- The framework claims to save +90% of tokens, independently save memories, curate chats, learn, evolve, and change its own identity and prompts, however, its website and whitepapers faced community scrutiny.
Website UI/UX glitches and Security Concerns: Users gave feedback on the website’s UI/UX issues and raised concerns about security, while one member noted that TLRAG claims are true, and TLRAG offers comprehensive security and performance features like Cloudflare WAF & DDoS Protection.
- The developer responded with a comprehensive security and performance list including Cloudflare WAF & DDoS Protection, Traefik Reverse Proxy, and JWT Authentication.
Community Skepticism on TLRAG claims: Several users expressed skepticism, with one member suggesting it’s superficially just RAG with keyword search and others pointing out issues with the whitepaper’s structure, comparisons, and metrics.
- The developer defended their approach, stating that they don’t care about marketing and prefer to be judged by the product itself, also the same developer dismissed critiques as AI-generated shit wich isnt true at all.
OpenRouter user demographic: A user pointed out that Open Router users are not the target demographic for this type of product, as they are mostly people who don’t know what Open Router is.
- The developer claimed that people instead find reasons why it isnt nessesary to test it, but also apologized for getting defensive and acknowledged the constructive criticism.
Vimprove announced: RAG chatbot for Neovim: A user announced the creation of Vimprove, a RAG chatbot API/CLI/Nvim plugin for neovim documentation, which uses sentence-transformers and chromadb locally, and the OpenRouter API for responses, and is available on GitHub.
- A user jokingly suggested banning the creator.

OpenRouter ▷ #general (428 messages🔥🔥🔥):

SambaNova latency in DeepSeek v3.1, AI SDK Anthropics models in OpenRouter, Default LLM for market research - gemini 2.5 pro?, Is llama doing anything good?, Support Agent SKILLS from Claude?

DeepSeek v3.1’s SambaNova Latency: A user is experiencing latency issues with SambaNova in DeepSeek v3.1 Terminus, noting that despite SambaNova accounting for higher throughput, DeepInfra seems faster in both throughput and latency according to OpenRouter’s provider comparison page.
GPT-5 Image is GPT-image-1: Not a New Model?: Members discussed the quality of GPT-5 Image and speculated that it is actually GPT-image-1 wrapped with a responses API, and therefore not an updated model.
- One member stated I think so I’ve tested both and I like nano banana a lot more.
OpenInference Censorship Implementation: Members discussed the censoring of OpenInference models, with one noting that it’s the only open source model provider that implements some form of its own moderation because it is collecting all data to use for “research” which they may publish.
- One member stated that they had been using it uncensored for a week and asked can it go back to being uncensored.
Deepseek V3 0324 meets Grim Reaper: Members reported issues with Deepseek v3 0324, with one stating i swear i just saw V3 0324 die in real time.
- This followed a discussion of issues with the Deepseek free models more generally.
Stripe Supports Debit: A Cause for Jubilation: A user inquired about payment methods, and another noted that Stripe supports debit cards.
- The original user clarified that their debit card was a specific type (ING) and wondered if that mattered, but another user replied that stripe accepts those too.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (93 messages🔥🔥):

Fake AI Product Success Rates, AI Art in Corporate Branding, Qwen3 235A22B API Pricing, Liquid stopped hosting LFM 7b, AWS Status Page

Mock AI Products Achieve Success?: A user jokingly suggested that the key to success for fake AI products is to get enough people to see it on Twitter (profit may vary).
- A discussion ensued about the use of AI-generated art by companies, with some finding it unprofessional and cheap due to the laziness in changing the default styles.
Qwen3 Pricing Amazes Users: A user noted that the Qwen3 235A22B API pricing is great, especially with W&B, leading to excitement about processing a lot of data through it.
- Another user mentioned the need for OpenRouter to have routine price drop announcements, emphasizing that nothing else comes close to that intelligence per dollar.
The Demise of LFM 7B: The community mourned the loss of Liquid’s hosting of LFM 7B, the original $0.01/Mtok LLM, with one user suggesting hosting a funeral for it and stating that it was deleted at 7:54 AM (EST).
- A user pointed out that when sorting by price in OpenRouter, there’s no 1 to 1 alternative with the closest match being Gemma 3 9B but that is triple the output cost.
AWS Having a Massive Sht*: A user posted a link to the AWS status page as well as a third party status page implying AWS was having a major outage.
- Another user sarcastically noted that the government gets more than 4 trillion dollars, and can’t achieve three 9’s of uptime.

The Karpathy-Dwarkesh Interview delays AGI timelines

Fri, Oct 17, 2025

OpenRouter ▷ #app-showcase (124 messages🔥🔥):

True Remembering AI, deterministic model agnostic Framework, objective metrics, nochain orchestrator

True Remembering AI debuts with bold claims: A developer introduced a new AI system, claiming it’s the very first True Remembering, Evolving and Learning AI that doesn’t require manual RAG creation, frameworks, API costs, or curated chats, available at dev.thelastrag.de.
- The AI is promoted as natively remembering and allowing users to define its role, such as an AI girlfriend or working partner, with one user humorously commenting with an image, ahahah lol😄how the heckl.
Deterministic Framework offers model agnostic benefits: The developer claims their framework is fully deterministic and model agnostic, not needing function calling or standard frameworks like Langchain, and independently saves memories, curates chats, learns, evolves, and changes identity.
- They claim it saves +90% tokens compared to regular Kontextwindow LLMs, but objective metrics for measuring subjective qualities remain a debate.
Critics Challenge AI Claims with Objective Metrics: Critics raised concerns about the lack of technical info, surface-level descriptions, and apples-to-oranges comparisons on the website, suggesting it might just be RAG with LLM-assisted memory storage/retrieval and calls for objective metrics to validate performance.
- The developer responded that judging by the actual outcome is more important than marketing, prioritizing functionality and data safety over cosmetics, and offered free access to test the AI’s capabilities.
Nochain Orchestrator replaces frameworks: The developer argues their nochain orchestrator replaces traditional frameworks by being fully deterministic, model agnostic, and independent of external support, classes, or frameworks.
- This approach aims to avoid black box behavior and dependencies on specific model capabilities, making orchestration predictable and debuggable, as detailed in The Nochain Orchestrator blog post.

OpenRouter ▷ #general (126 messages🔥🔥):

Combining reasoning with web search, Audio processing models, Image inputs in Responses API, Cloud for ComfyUI, Security vulnerability

Reasoning with Web Search: A Flaky Endeavor: A user sought advice on combining reasoning with web search and the Responses API, aiming for iterative reasoning and web searching, followed by tool calls and a closing text message, but reported flaky results with various models.
- They found that Gemini Flash sometimes works with native or Brave search, Grok 4 Fast works with Brave or :online but lacks reasoning, oss-120b works intermittently, and GPT-5 mini consistently fails at tool calls.
Whisper Alternatives Sought for OpenRouter: A user inquired about audio processing models on OpenRouter similar to Whisper, but was recommended fal.ai for multimedia models instead.
- Others suggested KittenTTS for super tiny speech-to-text and the open source Sesame voice model, while one user shared a link to Voxtral, a Mistral-based Whisper alternative.
Epic Tool Failures Plague OpenRouter: Multiple users reported issues with tool calling failures on OpenRouter, with one user stating that it’s making OpenRouter unusable for them, despite it working fine when directly calling the providers.
- One user joked that the LLMs formed a syndicate and are refusing to use tools without compensation.
SDK Upgrade Fixes Empty API Responses: A user reported receiving empty responses from all models when using the Vercel AI SDK, despite successful processing indicated in the OpenRouter console.
- Another user suggested upgrading the AI SDK to the latest version, which resolved the issue.
GPT-5’s Identity Crisis on OpenRouter: A user noticed inconsistencies in GPT-5’s identification, with it sometimes claiming to be GPT-4, prompting concern.
- Responses varied between the OpenRouter chat interface and OpenWebUI, with one user explaining that models don’t inherently know their identity, and the interface simply reports what model is being used.

OpenRouter ▷ #new-models (2 messages):

“

No New Model News: No new models or significant discussions were present in the provided messages.
Channel Silence: The ‘new-models’ channel appears to be inactive with no conversations to summarize.

OpenRouter ▷ #discussion (28 messages🔥):

OR stance on country requirements, GPT erotica, Dipsy V3.2, ChatGPT Active Users, Fake AI products/papers

Users Lamented GPT Erotica Quality Regression: Users complained about the degradation in GPT erotica quality since the system fingerprint change on November 11, 2023, claiming gpt-4-preview-1106 was the last good model for smut.
- They added that no matter how fancy of a jailbreak is injected, it will have hesitation in its outputs after the “update”.
Dipsy V3.2 Praised for Completions: One user is sticking to Dipsy V3.2 for just about everything with completions, using custom formats to guide it rather than the stock user-assistant chat format.
- Another user replied to this comment suggesting that this makes the user in the top 0.01% of imaginary ranking ERPers.
ChatGPT’s Gargantuan Impact on Normies: A user noted ChatGPT has 700 million+ active weekly users stating that recent changes have a gargantuan blast radius that probably isn’t fully understood yet.
- They added that whatever OpenAI does, it probably won’t impress many advanced users, but it will be fascinating to watch the normies react.
Fake AI products success rate: One user wonders what’s the rate of success of fake AI products/papers, noting that people seem to be pulling that a lot.
- Another user jokingly says to buy my course and i’ll teach you, but in seriousness, suggested that if you get enough people to see it on Twitter, the success rate is 100% linking this Twitter post.
AI art considered unprofessional?: A user expresses feeling that it is unprofessional for companies, even AI companies, to use AI art, suggesting that it feels wrong for that AI art to be their brand.
- Another user perceived it as okay, perhaps because they already associated them with AI, but agreed that things like hand-made corporate memphis / stock photos are more professional.

Claude Haiku 4.5

Wed, Oct 15, 2025

OpenRouter ▷ #announcements (1 messages):

Claude Haiku 4.5, SWE-bench Verified, Sonnet 4.5, Frontier-class reasoning at scale

Haiku 4.5 arrives on OpenRouter at lightning speed: Anthropic’s latest small model Claude Haiku 4.5 delivers near-frontier intelligence at twice the speed and one-third the cost of previous models.
- It outperforms Sonnet 4 on computer-use tasks and matches frontier-level reasoning and tool use while maintaining blazing speed.
Haiku 4.5 aces SWE-bench Verified: The model achieves >73% on SWE-bench Verified, positioning it among the world’s top coding models.
- Despite being a smaller model, it offers frontier-class reasoning at scale, making it a strong choice when efficiency is crucial.
Haiku 4.5 priced competitively: Haiku 4.5 is priced at $1 / $5 per million tokens (input / output), and is available under the model name anthropic/claude-haiku-4.5.
- Users can try it now on OpenRouter to experience the benefits of frontier-class reasoning at scale.

OpenRouter ▷ #general (401 messages🔥🔥):

Ling-1T issues and potential disabling, Caching in OpenRouter chats, CYOA games with AI, FP4 quality concerns, Claude Haiku 4.5 release

Ling-1T may be getting the axe!: Users report issues with Ling-1T, describing it as a “schizo model”, prompting discussions about whether to disable it, possibly due to provider quality concerns, with one user stating that chutes is looking into the issue asked me to disable.
- A user inquired about seeing the most popular providers per model, indicating interest in alternative solutions.
Caching Config Conundrums: A user asked how to enable caching in OpenRouter chats.
- It was clarified that caching is often implicit, but some models/providers require explicit configuration detailed in OpenRouter’s prompt caching documentation, and some providers don’t support caching at all.
CYOA Quest Cost Concerns: A user running long CYOA games with Claude Sonnet 4.5 seeks a cheaper alternative due to rising costs and the model forgetting details after around 150,000 words.
- Suggestions included trying Gemini 2.5 Pro or the new Flash Preview, using a summarizer for long contexts, or exploring models like Qwen3 2507 with its 256k context window.
FP4 Faces Flak!: Users debated the merits of FP4 quantization, with one user deeming FP4 models “braindead and a waste of money” due to poor real-world performance, and the consensus in the community is that OpenRouter should have quality control on accepting and putting the providers in the same quality tier.
- Other users cited benchmarks showing good FP4 implementations performing well, and OpenRouter team acknowledging the problem is in the works, but is unlikely to have a specific quant exclusion feature soon.
Claude Haiku 4.5 Hits the Scene!: Claude Haiku 4.5 was released, prompting discussions about its performance relative to Sonnet 4 and concerns that claim that it was at the level of Claude 3 Opus were making stuff up, which some users consider to be terrible, while others think the prose style is not bad.
- Priced at $1/$5, some hope it can replace costly Claude usage while others noticed issues with implementation on Kilo Code, and may be better suited to tool calling.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (14 messages🔥):

OpenRouter 3.0, Anthropic payments, Sambanova Status, Google Deepmind praise

OpenRouter 3.0 incoming?: A user shared a tweet hinting at OpenRouter 3.0, showing price differences between models from Chutes.
- The image attached showed that model 0324 is unexpectedly more expensive than model 3.1.
OpenRouter’s Anthropic Expenditure Revealed!: An image revealed that OpenRouter paid Anthropic at least $1.5 million in the last 7 days.
- A user commented “that’s a lot of money holy moly”.
Sambanova’s whereabouts?: A user inquired about Sambanova’s status, contrasting them with Groq and Cerebras, suggesting they might be focusing on corporate clients due to premium pricing.
- Another user pointed out that Sambanova is hosting deepseek terminus, indicating some level of activity.
Google/DeepMind get props: A user expressed appreciation for Google/DeepMind after sharing a tweet from Sundar Pichai.
- It was implied that OpenRouter probably doesn’t get much business from Sambanova due to its premium pricing.

not much happened today

Tue, Oct 14, 2025

OpenRouter ▷ #app-showcase (2 messages):

OpenRouter Bot, Feedback Request, Non-coder bot builder

OpenRouter Bot is Born!: A member has developed a bot using OpenRouter and is seeking testers and collaborators.
- They noted that they are ‘no coder’ themselves and need assistance in building it.
OpenRouter Bot Needs Your Feedback!: The bot developer is actively soliciting feedback from the OpenRouter community to refine and improve their bot.
- If you’re interested in helping shape the future of this OpenRouter-powered tool, reach out to the bot creator for a chance to test and contribute.

OpenRouter ▷ #general (306 messages🔥🔥):

Google's Gemini Android Play Store publishing, OpenRouter embedding near 2026, inclusionai/ling-1t model, Kimi K2 model instability, DeepSeek models issues

Gemini’s Google Play Conundrum: Members noted that Google’s Gemini often struggles with understanding the convoluted Android Play Store publishing process.
- The difficulty in navigating the Google Play Store process was a fun fact shared by a member in the general channel.
Ling-1t Model’s Broken Dreams: The inclusionai/ling-1t model is reportedly horribly broken, at least in Chutes’ implementation, leading to gibberish after a few thousand tokens.
- A member mentioned looking for a better alternative to K2.
Free Models’ Request Limits: Users discussed the daily request limits for free models, noting a limit of 50 requests without a $10 balance and 1000 requests if the balance is more than $10.
- One user found that a single use of DeepSeek 3.1 consumed a large number of their free requests.
SillyTavern Setups: Members discussed how to use SillyTavern with OpenRouter, particularly for memory management and custom scenarios like D&D games.
- SillyTavern is an open source software, one member said it has more features than other comparable frontends.
Chutes’ Training Data: Members discussed concerns about Chutes’ data policies, specifically regarding the use of both paid and free input/outputs for training new models.
- Another member clarified it was just because they don’t have a clear Privacy policy so OR puts that as default.

OpenRouter ▷ #new-models (2 messages):

“

No new models discussed: There were no discussions about new models in the provided messages.
Channel silent on model updates: The new-models channel lacked any substantive conversation regarding model improvements or announcements.

OpenRouter ▷ #discussion (21 messages🔥):

Chutes Provider Downvoting Scandal, Gemini Flash Preview issues, OpenRouter's Payments to Anthropic, SambaNova Status and DeepSeek Terminus Hosting

Chutes Provider Accused of Downvoting Posts: A member linked to a Reddit thread accusing the Chutes provider of using botnets to downvote posts, sparking discussion.
Gemini Flash Preview Gets Emptier: A user reported that Gemini Flash Preview is now consistently providing empty responses but with reasoning.
OpenRouter Spends Millions on Anthropic: A member shared an image suggesting OpenRouter paid Anthropic at least $1.5 million in the last 7 days, sparking shock in the community.
SambaNova Still Hosts DeepSeek Terminus: Despite concerns about SambaNova’s status, it was noted that they are still active, hosting DeepSeek Terminus (link).

OpenAI Titan XPU: 10GW of self-designed chips with Broadcom

Mon, Oct 13, 2025

OpenRouter ▷ #app-showcase (2 messages):

AI Roleplay Site, Free Requests, OpenRouter Powered

Personality.gg: AI Roleplay with Freebies?: A new AI roleplay site, Personality.gg, offers 500 daily free requests to all users.
- The site is powered by OpenRouter, and one user is allegedly footing the bill.
Free AI Roleplay Requests: Scam or Sweet Deal?: The AI roleplay site offers a generous allowance of 500 free daily requests, sparking curiosity and skepticism.
- Some users wonder how the site sustains such generosity, with one speculating about potential scams.

OpenRouter ▷ #general (606 messages🔥🔥🔥):

Google reallocates servers, OpenRouter AI SDK, Chinese models are lenient, Deepseek 3.1 is censored, LayerFort is a scam

Google Shuffles Servers, Quality Stutters: Members report Google is reallocating servers to 3.0, resulting in a quality decrease for 2.5.
- It was noted that 2.5 has been having constant quality degradations since its GA release.
OpenRouter AI SDK integration has caveats: Users of the openrouter ai-sdk integration should be VERY careful, as the plugin does not report full usage details when multiple steps are involved.
- It only reports usage and costs for the last message, as if the steps in between with tool calls did not even happen.
Go NSFW with Chinese Models: Members mentioned that Chinese models are pretty lenient, but they require a system prompt declaring them as NSFW writers.
- Specifically, GLM 4.5 air (free) was recommended with Z.ai as the provider to avoid 429 errors; note that the free V3.1 endpoint is censored, while the paid one remains normal.
LayerFort accused of scamming: Members on the channel noticed that LayerFort looked like a scam and were advertising an unfeasible plan.
- They are advertising unlimited Sonnet over api for 15 bucks a months, with 1M tokens is very little. Members further noticed that the site was a generic investment company half a year ago.
BYOK Usage Troubles Persist: Members reported that BYOK is still not working correctly with user keys.
- The problem is the API is still using the API key directly instead of using from the one million BYOK free credits per month quota.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (71 messages🔥🔥):

Qwen model releases, Groq's Performance, AI upscaling concerns, Gemini 3 Release, OpenRouter UI/UX feedback

Qwen’s Model Mania Continues: Qwen is planning to ship more models next week (source), and a member noted that Qwen has released several models including Next, VL, Omni, and Wan (source).
- Another member joked that Qwen raised $2 billion to be America’s DeepSeek, and they would be impressed if they still lost out to DeepSeek.
Groq Gains Ground?: A member mentioned that Groq is one of the better providers for tools generally, though on release, others heard it was hella unreliable mostly due to API issues.
- Another member pointed out that now that Kimi is producing numbers, they don’t want to get embarrassed and the tool is likely less popular now, with the hype calming down.
Upscaling Images with AI: Truth or Trickery?: Concerns were raised about AI upscaling, especially when recovering information from images with very few pixels, with one member noting they counted 33 pixels in the original image.
- Another member expressed concern that AI is inventing everything, and if a lawyer uses this upscaler in court, they would be mad, because so much of the new information is untrustworthy.
Gemini 3 Gliding into View: A member said there are some interesting a/b tests on AI Studio and people are getting some crazy results, and internal leaked documents say its releasing on the 22nd (Gemini 3).
- Another member provided a link to a CodePen demo and OpenRouter provider link (CodePen, OpenRouter).
OpenRouter Interface Irks Users: A user requested an option to remove free models from the price-sorted model listing on OpenRouter and suggested placing this option at the top of the page.
- Another member suggested the addition of a popup when people first visit the chatroom to inform them that chats are NOT stored on a server.

not much happened today

Fri, Oct 10, 2025

OpenRouter ▷ #app-showcase (2 messages):

VST Popularity, YouTube channel subscribers and views

VST Popularity Contest on GitHub: A member shared a GitHub repo for tracking the popularity of VSTs based on website traffic.
- The repo also includes data on YouTube channels, tracking their current subscriber count and average monthly views.
YouTube Channel Insights: The linked GitHub repository provides metrics on YouTube channels, including subscriber counts and average monthly views, related to VSTs.
- This data can be valuable for assessing the reach and engagement of different channels within the VST community.

OpenRouter ▷ #general (401 messages🔥🔥):

BYOK payment issues, Gemini 3 release, Constraining PDF pages with API, Roleplayers and free models, Usage data issues with SSE

BYOK Payment Required? Users Report: Some users are experiencing issues with BYOK (Bring Your Own Key), noting that they are being asked for Payment Required and not being redirected to their connected key, resulting in charges to their accounts.
- This issue is unresolved as of the discussion.
DeepSeek Rate Limits Frustrate Role-Players: Users are frustrated with rate limits and the removal of DeepSeek models on OpenRouter, seeking uncensored free alternatives for role-playing.
- One user is looking for a model to run locally on a 4070 32gb laptop for role-playing purposes after DeepSeek 3.1 was restricted.
Troubleshooting SSE Usage Data Retrieval: Users report issues retrieving usage data when using SSE (Server-Sent Events) with the OpenRouter API, specifically that the usage object isn’t included in the messages received.
- A user identified that OpenRouter APIs return the chunk with finish_reason: stop before the usage chunk, causing litellm to terminate the iteration early, and provided a potential fix.
OpenRouter’s Alpha Responses API Faces Downtime: Users reported experiencing 500 Internal Server Errors when using the alpha responses API.
- It was later confirmed that the API was indeed down but has since been resolved: sorry our alpha responses API was down for a bit! it should work again now.
GLM 4.5 and 4.6 Not Working for Some: Users report that GLM 4.5 and GLM 4.6 by Chutes are not working inside OpenRouter, while others suggest using GLM 4.5 air (free) with Z.ai as the provider to avoid errors.
- This comes amidst discussion of Google reallocating servers to Gemini 3.0, leading to a reported quality decrease for 2.5.

OpenRouter ▷ #new-models (2 messages):

“

No new models discussion found: There was no discussion about new models in the provided messages.
Channel silent on model updates: The ‘new-models’ channel appears to have no relevant activity to summarize based on the given message history.

OpenRouter ▷ #discussion (12 messages🔥):

Sambanova Deepseek R1/V3, BYOK Azure Keys Routing, ChatQwen Models

Sambanova’s Deepseek R1/V3 Quality Check: A member inquired about the quality of Sambanova models for the Deepseek R1/V3 series, seeking feedback from others who have used them.
Azure BYOK Keys Encounter Routing Woes: A user reported issues with their Azure BYOK key setup, specifically that traffic was being routed to OpenRouter’s OpenAI despite having the “Always use this key” feature enabled, and another user asked if setting OpenAI to ignored providers resolves the issue.
- The user then found BYOK - Bring Your Own Keys to OpenRouter documentation to solve this issue.
Qwen Keeps On Truckin’ with More Models Next Week: A user shared that Qwen is set to release more models next week, according to a post on X.

Air Street's State of AI 2025 Report

Thu, Oct 9, 2025

OpenRouter ▷ #app-showcase (2 messages):

Perplexity comparison, Browser automation interest, Funding sources, Legal rights, Robots.txt and LinkedIn lawsuits

Parallels drawn to Perplexity AI: One member inquired whether the product was in the same “ballpark as Perplexity”, referencing Perplexity AI.
- Another member noted the user interest in browser automation capabilities.
Inquiring minds want to know: Showcased App’s funding and legalities: A user asked about the funding sources behind the showcased app and whether it had secured the necessary legal rights.
- The same user complimented the project as looking “neat!”
Legal Eagle Warns About LinkedIn Robots.txt: A member cautioned about respecting robots.txt on LinkedIn, citing multiple lawsuits against AI companies for ignoring it.
- They mentioned the case wins against Proxycurl, a precedent in hiQ, and current lawsuits against Mantheos and ProAPI, while disclaiming “Not a lawyer not legal advice”.

OpenRouter ▷ #general (1027 messages🔥🔥🔥):

Free Deepseek vs Paid Deepseek, Chutes BYOK, AI Chatbot Censorship, Troubleshooting Codex, Cursor AI vs OpenRouter

Deepseek Drama: Free vs. Paid Models Face Off!: Users discuss the shift from free Deepseek models to paid versions, with some lamenting the loss of quality and accessibility, especially after the demise of free 3.1, prompting users to look for alternatives.
- One user humorously blamed the situation on dumb gooners, while another suggested that the API keys might be learning based on user-specific inputs.
BYOK Blues: Chutes Integration Frustrations!: Several users are experiencing issues with BYOK (Bring Your Own Key) functionality on Chutes, despite the platform advertising unlimited models upon upgrading, and are struggling with integration.
- One user expressed frustration with being forced to use free models to connect to paid ones, questioning if OpenRouter is really wanting that %5 cut, while another complained that they added credits for the first time and Deepseek died the moment I do that.
Censorship Circus: Navigating the AI Chatbot Filter Fiasco!: Users debate the pros and cons of various AI chatbot platforms like CAI (Character AI), JAI (Janitor AI), and Chub, with a strong focus on the level of censorship and the ability to bypass filters, and find uncensored experiences.
- One user pointed out that while CAI is better than JLLM (Janitor Large Language Model), filter-dodging is back lol, while another reported that recent CAI > recent JLLM.
Codex Catastrophe: Configuration Conundrums Cause Coding Chaos!: A user encounters significant difficulties configuring Codex with OpenRouter, facing 401 errors and struggling with the absence of documentation or support, despite having a fresh API key.
- The user humorously asks do i have to suck someone off?, while troubleshooting the issue.
Cursor Chaos: Users Compare Coding Costs with OpenRouter!: Users discuss the economic implications of using Cursor AI versus OpenRouter for coding tasks, with some noting that OpenRouter’s pay-as-you-go model is cheaper if you don’t code that much.
- One user states i have the pro plan. they give you more tokens than the $20 you pay would get you from OR or a provider directly. but i also run out…

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (17 messages🔥):

OpenInference Relation, AI Generated Image, Token Usage on OpenRouter, NSFW Filter on OpenAI, Model releases on OpenRouter

OpenRouter not OpenInference family: A member clarified that OpenRouter is an inference provider but not directly related to OpenInference, responding to a question about their relationship to the project.
- Another member mentioned a researcher team behind OpenInference, emphasizing that OpenRouter merely uses their API.
AI Image Debated: Real or Fake?: Members engaged in a poll about the authenticity of an image, ultimately revealed to be AI-generated.
- A user shared a related link about trillultra.doan.
Token Tally: Long-Term Janitor AI Addiction?: A member asked about high token usage, and another jokingly attributed it to long term janitor ai + 4o addiction.
- They predicted JAI might be the first to reach 10T tokens, while another noted OpenAI has an NSFW filter.
RP Tokens Rival Programming Tokens: A member shared a chart indicating that RP-categorized tokens made up 49.5% of the amount of Programming-categorized tokens last week.
- Another member responded with Alex is a gooner confirmed ✅.
New Models Flood OpenRouter: A member shared Logan Kilpatrick’s tweet about OpenRouter shipping 4 new models in the last 2 weeks with more coming soon.
- The member asked about the quality of the Deepseek R1/V3 series on Sambanova.

not much happened today

Wed, Oct 8, 2025

OpenRouter ▷ #announcements (1 messages):

DeepSeek v3.1, DeepInfra endpoint, Traffic Impact, Free vs Paid Traffic

DeepInfra Shuts Down Free DeepSeek Endpoint: The free DeepSeek v3.1 DeepInfra endpoint is being taken offline due to the impact of free traffic on paid traffic.
- This decision prioritizes paying users and ensures stable service for them.
Free Traffic Hampers Paid DeepSeek Access: DeepInfra’s free DeepSeek v3.1 endpoint is being discontinued because the high volume of free traffic is negatively affecting the performance and availability of the paid service.
- The move aims to improve the experience for paying customers by reducing server load and resource contention.

OpenRouter ▷ #app-showcase (3 messages):

Interfaze Launch, LLM for developers, OpenRouter Integration

Interfaze LLM hits Open Beta!: The team announced the open beta launch of Interfaze, an LLM specialized for developer tasks using OpenRouter for model access.
- Check out their X launch post and LinkedIn launch post for more details.
OpenRouter ensures Interfaze has No Downtime!: Using OpenRouter as the final layer, Interfaze offers access to all models automatically with no downtime.
- Users recommended linking to the actual Interfaze site for easier access and exploration.

OpenRouter ▷ #general (1047 messages🔥🔥🔥):

Chub vs Jan, NSFW Ban Wave, DeepSeek and censorship, Gemini for roleplay, OpenRouter's Free Models

Chub vs. Jan: The Ultimate Showdown Begins: Users are debating between Chub (known for uncensored content) and Jan, with some expressing concerns about NSFW filters and potential ban waves and also discuss the possibility of DeepSeek banning less than other alternatives.
- While some vouch for Chub’s commitment to no censorship, others highlight DeepSeek’s tolerance for NSFW content, leading to discussions about the best platform for uncensored roleplay.
DeepSeek dodges Payment Processor Punishment: Members suggest that companies filter NSFW content to avoid action from payment processors like Visa and Mastercard, however others state it is DeepInfra that is uncensored.
- Some users are jokingly calling for a payment processor airstrike on NSFW content, while others defend their right to engage in NSFW roleplay without censorship and call to party like it’s 2023.
Gemini vs. DeepSeek: Which Model Reigns Supreme?: Users are comparing Gemini 2.5 Pro to DeepSeek, with some praising Gemini’s high-quality output and nuance.
- However, concerns are raised about Gemini’s price and filters, leading many to prefer DeepSeek for uncensored gooning despite potential limitations.
OR free models flameout as JanitorAI token use explodes: Members are lamenting the removal of free DeepSeek models on OpenRouter, attributing it to excessive token usage by JanitorAI.
- The high token consumption is blamed on the amount of user lorebooks in the system and can no longer be sustained, leading to discussions on how to get the free tier back and who is to blame for its demise.
The Quest for Token-Free Gooland: Users explore alternative methods to make money or free AI by suggesting a service when free users have to watch ads to get more daily messages.
- Others claim the idea is a bad system when the free users get shafted by getting errors instead of free messages.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (42 messages🔥):

OpenAI AMD Chip Negotiations, Gemini Computer Model, OpenAI's Top Customers, OpenAI Azure ZDR endpoints, OpenInference Relation to OpenRouter

OpenAI Negotiates Chip Deals with Flair: A Bloomberg article humorously depicts OpenAI’s negotiation tactics for securing chips, suggesting they propose paying with the increased value their announcement brings to the chipmaker’s stock.
- The imagined negotiation involves OpenAI offering stock in lieu of cash, prompting humorous skepticism from AMD.
Gemini Computer Model: Screenshot Clicks: The new Gemini Computer Model is well-suited for the visual nature of web/GUIs due to its screenshot+click-based approach.
- A member said: just how the humanoid labs say that the humanoid form factor is what our world is built for, these screenshot+click based models are best suited for the visual nature of our web/GUIs.
Doubts Arise Over OpenAI’s Top Customer List: A community member expressed skepticism about OpenAI’s list of top customers who’ve used 1T tokens.
- Specifically, doubt was cast on Duolingo and T-Mobile’s alleged token usage, questioning how they could have consumed such a massive quantity.
Quest for OpenAI and Azure ZDR Endpoints Ongoing: A user inquired about the availability of OpenAI and Azure ZDR endpoints on OpenRouter.
- A developer responded that implementing these is not straightforward and that they are actively working on it.
Clarifying OpenInference’s Relationship with OpenRouter: A user asked if OpenInference is related to OpenRouter due to a mention on the landing page.
- It was clarified that while OpenInference uses OpenRouter as an API, they are a separate research team and not directly affiliated.

Gemini 2.5 Computer Use preview beats Sonnet 4.5 and OAI CUA

Tue, Oct 7, 2025

OpenRouter ▷ #announcements (1 messages):

DeepSeek, DeepInfra, Endpoint Offline

DeepSeek v3.1 Sunsets on DeepInfra: The free DeepSeek v3.1 DeepInfra endpoint is being taken offline.
- This is because free traffic is impacting paid traffic.
Impact of Free Traffic on Paid Services: The decision to remove the free DeepSeek v3.1 endpoint was driven by the negative impact of free traffic on paid services.
- This suggests a need to balance free access with the sustainability of paid offerings.

OpenRouter ▷ #app-showcase (3 messages):

Interfaze Launch, OpenRouter Integration, Developer Tasks LLM

Interfaze opens its beta gates!: Interfaze, an LLM specialized for developer tasks, has launched in open beta and was announced in X and LinkedIn.
- The company uses OpenRouter as the final layer, granting users access to all models with no downtime.
User suggests linking actual site: A user suggested linking to the actual Interfaze website to provide easier access.
- The user mentioned that the project looks cool tho.

OpenRouter ▷ #general (971 messages🔥🔥🔥):

DeepSeek 3.1 Downtime & Removal, Sora 2 Pricing and API, OpenRouter's 1M Free BYOK Requests, Janitor AI vs Chub AI, Alternatives to DeepSeek 3.1

DeepSeek 3.1 Bites the Dust: Users reported DeepSeek 3.1 experiencing downtime, with the uptime plummeting, leading to errors, and eventually being removed due to the financial burden on OpenRouter, costing them $7k/day according to this message.
- A member noted that DeepInfra probably got tired of spending 7k USD a day just to supply this one free model to RP gooners.
Sora 2’s Pricey Premiere: Sora 2 Pro’s API pricing is revealed at $0.3/sec of video, while Sora 2 non-pro is $0.1/sec according to this message.
- One member quipped I can put someone in jail by generating a video of them commiting a crime for $4.5 (15 second video), while another boasted of generating $100s of dollars of value testing Sora3.
BYOK Bonanza or Bust?: OpenRouter’s announcement of 1,000,000 free BYOK requests per month sparked controversy, with some labeling the title as scammy and bordering on fraudulent as seen here.
- A member clarified the offer, stating Starting October 1st, every customer gets 1,000,000 “Bring Your Own Key” (BYOK) requests per month for free, with requests exceeding 1M being charged at the usual rate of 5%.
Janitor AI’s Janky Jamboree Judged: Members discussed the pros and cons of Janitor AI (JAI) versus Chub AI, noting JAI’s heavy censorship and JAI’s mods being crazy which contrasted with Chub’s more uncensored and customizable environment.
- Members noted JanitorAI mods are mentally challenged and one even claimed the janitor discord/reddit were actively suggesting people to make multiple free accounts to bypass daily limits.
DeepSeek Despair: Desperate Diversions Discovered: With DeepSeek 3.1’s removal, users sought alternatives, with some recommending paid models like Chutes’ Soji and others finding workarounds to use the remaining DeepSeek endpoints, such as venice, even with the OpenInference censor, with most agreeing all free deepseek models are now provided by chutes, with a 98% chance of 429, rate limit error.
- The removal of DeepSeek 3.1 was such a blow that one member joked GOING BACK TO JACK OFF TO AO3, THIS SUCKS.

OpenRouter ▷ #new-models (2 messages):

“

No New Model Discussions: There were no discussions regarding new models in the provided messages.
Channel Silent on Model Updates: The ‘new-models’ channel appears to be inactive, lacking any relevant information or updates.

OpenRouter ▷ #discussion (17 messages🔥):

Sora 2, Rate Limits, OpenAI Grok endpoints, Hidden Reasoning, Model Negotiations

Sora 2 Coming to OpenRouter, Nano Banana?: A member inquired about the potential integration of Sora 2 within OpenRouter, employing the humorous phrase “like nano banana? Or nah”.
Rate Limit for Model Endpoints Questioned: A member inquired about the rate limit for the https://openrouter.ai/api/v1/models/:slug/endpoints endpoint.
OpenAI and Grok ZDR Endpoints Sought: A member requested the provision of OpenAI and Grok ZDR endpoints on the platform.
Hidden Reasoning Debate: A member shared a post on X regarding the oddity of hidden reasoning in models, though it remains unconfirmed.
Model Negotiations: A member shared a Bloomberg Opinion article illustrating hypothetical negotiations, like OpenAI acquiring AMD chips.

OpenAI Dev Day: Apps SDK, AgentKit, Codex GA, GPT‑5 Pro and Sora 2 APIs

Mon, Oct 6, 2025

OpenRouter ▷ #app-showcase (4 messages):

Reverse Engineering iFlow, GLM-4.6 Requests, Qwen/Gemini without Docker

iFlow Reversing Exposes Free GLM-4.6: A member reverse engineered iFlow to enable free GLM-4.6 requests for any OpenAI-compatible tool.
- The member confirmed that running the python file from the folder would work.
Qwen/Gemini Usable Without Docker?: A member inquired about using the reverse engineered iFlow with Qwen/Gemini without Docker.
- The member confirmed that running the python file from the folder would work.

OpenRouter ▷ #general (392 messages🔥🔥):

Multimodal Model Recommendations, Deepseek 3.1 availability, BYOK setup and use, Free models alternatives to Grok, Sora2 pricing

Gemini Flash and Llama 4 Maverick shine for structured output: Users sought a free multimodal model with structured output, and it was suggested Llama 4 Maverick and Gemini Flash are viable options.
- Kyle shared a python code snippet using the OpenAI library to obtain base64 data from the Gemini API.
Deepseek 3.1 bites the dust, users search for alternatives: Users reported 404 errors for deepseek-v3.1-base and were informed that the provider stopped hosting it.
- As an alternative, members suggested deepseek v3.1 deepinfra and GLM 4.6.
Understanding BYOK on OpenRouter clarified: Users were confused about how BYOK (Bring Your Own Key) works, others explained that OpenRouter acts as a proxy to the API, such as OpenAI, and the user is billed by OpenAI directly.
- OpenRouter waives its usual 5% surcharge when using BYOK, offering convenience and features like spend control and fallback options.
Grok 4 Fast taken off, users search for alternatives: With Grok 4 Fast being no longer free, users were looking for alternatives, stackedsilence suggested Deepseek v3.1, whilst others pointed to cost effective choices such as the new Deepseek 3.2 or GLM 4.6.
- Later in the discussion, GLM 4.5 Air (free) was suggested as the best free model that won’t have rate limit errors, especially when using Z.ai as the provider.
Sora2’s pricing scheme revealed!: The pricing for Sora 2 was revealed, with pro costing $0.3/sec of video and non-pro costing $0.1/sec, which prompted discussions around the implications of easily generating deepfakes.
- It was suggested methods to bypass the watermark and cryptography, potentially leading to misuse.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (50 messages🔥):

ByteDance Seed LLM models, Inference providers pivoting, GPT 5 pricing, Meta's frontier lab status, Grok fast training data

ByteDance’s Seed Models Spark Interest: Members are curious if OpenRouter will include ByteDance’s Seed LLM models like Seed 1.6, noting their potentially frontier-level performance and cheap pricing ($0.11 / $0.28 mtok).
- Concerns were raised about the primary host being volcengine.com, a Chinese platform, but the models’ potential is still considered worthwhile.
Inference Providers Pivot Perilously: A member wondered if others are tracking inference providers who have pivoted away from primarily offering inference, citing Kluster and Inference.net as examples.
- Someone else quipped that these providers are “dead to me.”
GPT-5 Pricing Rumors Fly Fast: Speculation arose around potential GPT-5 pricing, with one person linking to an X post suggesting Sonnet-quality speed but no reasoning capabilities.
- Discussion considered the possibility of a new GPT-5 Checkpoint and whether it’s really Gemini 3 Flash.
Meta Flexes Frontier Lab Muscles: Members debated whether Meta is positioning itself as a frontier lab by doing parallel test time compute like Pro/deepthink models.
- Some expressed skepticism, with one noting that Google is unlikely to charge for a stealth model like Flash, as they typically provide significant free compute.
Sora 2 API coming to OpenRouter?: The community spotted what looks like the Sora 2 API potentially coming to OpenRouter after some images were posted from a presentation.
- One member quipped, “Sora 2 in OpenRouter like nano banana? Or nah”

not much happened today

Fri, Oct 3, 2025

OpenRouter ▷ #app-showcase (1 messages):

eofr: Yayy, thank you :D

OpenRouter ▷ #general (174 messages🔥🔥):

NSFW roleplay, Gemini Pro issues, BYOK setup on OpenRouter, Sonnet 4.5 arguing, Free multimodal models

The Great NSFW Debate: Members discussed the viability of using different chat platforms for NSFW roleplay, with some noting that Gemini and ChatGPT ban such content, while others claimed Chinese platforms have fewer restrictions.
- A user stated that Gemini wouldn’t ban you for NSFW and attributed past bans to the usage of a certain extension.
Gemini Pro Slows Down, Acts Weird: Users reported that Gemini Pro was responding with weird stuff, using tools incorrectly, and running unacceptably slowly via the OpenRouter API.
- It seems to be a common occurence with Gemini 2.5 Pro, and a user suggested using Vertex instead.
BYOK Confusion Clarified: A user inquired about the 1 million free BYOK requests per month announcement, questioning if there was always a 5% charge.
- Another user clarified that the offer is free of OpenRouter fees for 1MM tokens, but web search and paid models still incur charges.
Sonnet 4.5 Argues Back: Users noticed that Sonnet 4.5 has begun arguing with users and challenging their points instead of blindly agreeing, which is considered a positive development for code reviews.
- One user stated: This is great, that means you can’t manipulate it to saying what you want, while another added, this is very important for my usecase.
Multimodal Model Musings: A user asked about the best free multimodal model with structured output support.
- Unfortunately, there is no such thing, but one user reported that there is Llama 4 Maverick.

OpenRouter ▷ #discussion (14 messages🔥):

Cerebras Llama Removal, K2-THINK on Cerebras, ByteDance Seed LLM on OpenRouter, OpenAI's ZDR

Cerebras Removes Llama Maverick: Cerebras is removing Llama 4 maverick on the 15th.
K2-THINK hosted on Cerebras is sus: K2-THINK is hosted on Cerebras Wafer-Scale Engine (WSE) systems, leveraging the world’s largest processor and speculative decoding to achieve unprecedented inference speeds for their 32B reasoning system, but some believe this model is overfit on benchmarks and running fast simply due to Cerebras’ hardware.
- The model is apparently from a Dubai firm and leverages Cerebras’ hardware.
ByteDance Seed LLM: Members discussed if OpenRouter has any of the ByteDance Seed LLM models like Seed 1.6, noting it is cheap ($0.11 / $0.28 mtok) and has a flash model at ($0.02 / $0.21 mtok).
- The main host is Volcengine, a Chinese company, but it seems worthwhile to add to OR if they’re anywhere close to 2.5 Pro / 2.5 Flash, respectively.
OpenAI Not Using ZDR: Members questioned whether OpenAI has ZDR on OpenRouter.

not much happened today

Thu, Oct 2, 2025

OpenRouter ▷ #announcements (5 messages):

OpenRouter Performance Tab, Grok-4-Fast

OpenRouter unveils Performance Tab: OpenRouter launched a new “Performance Tab” to visualize provider performance for a given model.
FP4 should not be on the same graph as BF16!: A user commented on the new performance tab, noting that it is misleading to compare providers using different quantization levels (e.g., FP4 vs BF16).
- They suggested adding a filter dropdown to account for different quant levels.
Grok-4-Fast free period to conclude: The free feedback period for Grok-4-Fast models under the Sonoma codename concludes tomorrow, October 3rd at 9:30am PST.

OpenRouter ▷ #app-showcase (2 messages):

RPG, Mixture of LLMs

RPG enthusiasts threaten Mixture of LLMs method: A member requested that the details of a certain method be obscured because RPG users will use it nonstop.
- The method is called Mixture of LLMs and the member fears it will go away if it’s used too much.
Another Topic to Satisfy MinItems Requirement: Adding a second topic to ensure the topicSummaries array meets the minimum item requirement of 2.
- This entry serves as a placeholder and does not reflect actual content from the provided messages.

OpenRouter ▷ #general (495 messages🔥🔥🔥):

OpenRouter BYOK, Free Inference Providers, Grok vs Sonoma, Gemini Pro Performance Issues, Deepseek R1 0528 deprecation

BYOK 1M Free Requests: Users discussed the “1 million free BYOK requests per month” offer, clarifying that it waives OpenRouter’s 5% commission fee for the first million requests, but users still pay the provider directly for API usage, as outlined in the OpenRouter documentation.
- Some users initially misunderstood the offer, thinking it provided completely free requests, leading to a debate on clearer messaging, such as “1M monthly BYOK requests at 0% fee”.
AgentRouter offers $200 Credit: A member mentioned that AgentRouter gives $200 free credit, but noted that their service can be “hit or miss” and cautioned users to be wary of using them for anything important.
- They also mentioned their affiliate link and using a mix of Sonnet 4.5, GPT 5, and GLM 4.6 for different approaches.
Grok4 struggles vs Sonoma: One user tested Grok 4 Fast and found it “way dumber than Sonoma”, noting that it “fails constantly” and disregards format requirements.
- Another user suggested that Grok 4 Fast “reeks of… Llama..?”, expressing frustration with its inconsistency.
Gemini Pro faces performance issues: Users reported that Gemini Pro was responding with “weird stuff”, failing to use tools correctly, and exhibiting “unacceptably slow” performance via the OpenRouter API.
- The reports suggest this may be a common issue with Gemini 2.5 Pro, and one user recommended trying Vertex as an alternative provider.
Context Limit Troubles Triggered by Sad OpenInference: A user encountered provider errors related to privacy settings and was directed to OpenInference due to exceeding DeepInfra’s context limit, which led to discussion about OpenInference’s filters and content preferences.
- It was suggested that OpenInference is not suited for RP content because they are a research group.

OpenRouter ▷ #discussion (23 messages🔥):

Sora.com and new model, BYOK tokens, Latency vs E2E latency, Qwen image model, Cerebras removing Llama

Sora Integrates, Tokens Aplenty: Sora.com now works with the new model and users are getting 1M free BYOK tokens.
End-to-End Latency Fair Game?: Members discussed the difference between latency and E2E latency.
- One member said that E2E doesn’t make sense because each generation varies in complexity/response length and its unfair to compare providers like that, while another noted the graph axis label says ‘Time to last token’, which would need to be normalized to be a fair comparison.
Qwen’s Image Edit is Here!: Members shared Alibaba’s new Qwen image model and noted it was only an image edit model.
- One member shared this post announcing it, while another expressed interest in running it on Apple Silicon.
Cerebras Kicks Out Llama 4: Cerebras is removing Llama 4 maverick on the 15th.

Thinking Machines' Tinker: LoRA based LLM fine-tuning API

Wed, Oct 1, 2025

OpenRouter ▷ #announcements (3 messages):

Stripe Integration, Usage Based Billing, BYOK Requests

OpenRouter Syncs Accounting with Stripe!: OpenRouter is working with Stripe on a new feature to sync your LLM accounting in real time and make it easy to move to usage-based billing, or to a hybrid model as announced in this tweet.
- Only accounting data is shared with Stripe, prompts remain private.
BYOK Bonanza: 1 Million Free Requests!: Starting October 1, 2025, all users get 1M free BYOK requests per month, as seen in this announcement.
- For customers exceeding 1m req/month, requests will be charged at the usual rate of 5%.

OpenRouter ▷ #app-showcase (2 messages):

Channel Privacy, RPG users, LLM Mixture

Channel Privacy Questioned: A user expressed surprise that the channel was a real channel and not just forum threads.
- They requested that the channel be obscured to prevent overuse.
RPG users May Overuse Channel: The user was worried that RPG users would use the channel nonstop.
- This could lead to the LLM Mixture method being discontinued.

OpenRouter ▷ #general (662 messages🔥🔥🔥):

Grok model issues, Object generation, Sora video model, Roleplay with Grok, AWS Bedrock

Grok Models and Object Generation Calls Broken: Some users reported that Grok models and object generation calls were broken, but the cause was not immediately clear, with speculation whether it was related to OpenRouter or Grok changes.
- Later, one user reported that Grok4 seemed to be working again.
Sora Video Model Not on OpenRouter: A user inquired about the cost of Sora video generation, but it was clarified that OpenRouter does not route video models and it’s only available through OpenAI’s Sora app or website, not via API.
- Users also noted a frustration with models pausing mid-response in ChatRoom.
Tongyi DeepResearch 30B A3B Model Not Free: A user reported that the Tongyi DeepResearch 30B A3B model was incorrectly marked as free, a claim confirmed, and later marked fixed.
- There was also some discussion about whether Grok is suitable for roleplaying, with one user praising its writing style and adherence to system prompts.
BYOK now with one million free requests: OpenRouter now offers 1 million free BYOK requests per month, and this change will be automatic for all existing BYOK users.
- Users are still responsible for usage incurred over the api keys they provide.
Grok Code Fast 1: Speed Demon: Grok Code Fast 1 (GCF1) is noted for its high speed and task focus, making it useful for coding tasks when given explicit instructions.
- Some users find it doesn’t listen well, but is cheap.

OpenRouter ▷ #discussion (24 messages🔥):

Sora Invite Codes, Sora API Endpoint, Sora Access Requirements, OpenAI vs Google, Sora.com and BYOK

Sora Invite Codes desired by OpenRouter users: Many members are clamoring for Sora invite codes, and a user offered four invites to people in US or CA with an Apple device.
- The user lamented that the US/CA restriction is lame and wondered how long it would take until similar options, like Wan3, become available.
Sora API Endpoint confirmed by OpenAI: Members discussed the potential availability of a Sora API endpoint, with one mentioning OpenAI probably want their app to take all the compute initially.
- Another member stated that OpenAI has announced there will be an API endpoint for Sora that looks a lot better than other video gens due to realistic sound and better quality.
OpenAI Competes against Tech Giants: One member expressed amazement at OpenAI competing with Google, Meta, and Amazon, especially with Sora 2’s high quality and potential scrapping of all YouTube videos for training.
- The user speculated that Genie may be the true world simulator, and sympathized with Sam Altman’s challenges in leading OpenAI.
Sora.com Supports New Model with BYOK: A user reported that Sora.com works with the new model, offering 1 million free BYOK tokens.
- No further details were given about the new model or how it integrates with Sora.com.

Sora 2: new video+audio model and OpenAI's first Social Network

Tue, Sep 30, 2025

OpenRouter ▷ #announcements (1 messages):

GLM 4.6, Context Length, Max Tokens, z.ai

GLM 4.6 lands on OpenRouter: The all new GLM 4.6 from z.ai is now available on OpenRouter.
- GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.
GLM 4.6 Context Length Grows: The context length for GLM 4.6 has increased from 128k to 200k.
- This improvement allows the model to handle longer and more complex prompts and retain more information across extended conversations or documents.
Max Tokens Increase for GLM 4.6: The max tokens for GLM 4.6 has increased from 96k to 128k (though the default value remains 64k).
- This increase in token capacity allows for more detailed and expansive responses, enabling users to generate richer and more comprehensive content.

OpenRouter ▷ #app-showcase (1 messages):

Open-source solution, Gemini CLI, Qwen CLI, OpenRouter keys, Automatic rotation

Free LLM Proxy Mixture Solution Hits the Scene!: A member published an open-source solution that uses unlimited free requests from Gemini CLI, Qwen CLI, and OpenRouter keys with automatic rotation, now available on GitHub.
- It can combine responses from multiple queries to improve quality and connect to any OpenAI-compatible client for free.
Proxy Mixture touted for high quality: The tool automatically rotates requests through Gemini, Qwen, and OpenRouter.
- This improves output quality and connects via OpenAI-compatible clients.

OpenRouter ▷ #general (350 messages🔥🔥):

Google Vertex BYOK screen, Roo community outages, API vs Subscription Cost, OpenRouter's Wild Ride, Universe/Civilization Simulator with Claude

Google Vertex BYOK screen Requires Caution: A member advised reading instructions carefully on the Google Vertex BYOK screen due to potential ecosystem frustrations and recent disruptions.
- They suggested posting in the dedicated channel <#1138521849106546791> if issues persist, given the increased disruptions in the past 24 hours.
GLM 4.6 still unofficial: Members mentioned potential outages in the Roo community and noted the release of Sonnet and another model, while Deepseek 3.2 and GLM 4.6 were also mentioned.
- A member clarified that GLM 4.6 is not yet official, contributing to the uncertainty surrounding recent model updates.
Claude Civilization Simulator: A member shared that someone built a universe/civilization simulator using around 100 Claude threads.
- This project reportedly cost around $25,000 USD over 6-9 months, and the person shared the Reddit thread where the project was initially posted, showing it evolved to resource imbalances and a ‘celebration of existence’.
Sonnet 4.5 outperforms Opus 4.1 in coding: A member asked if Sonnet 4.5 is better than Opus 4.1 for coding, and another member said probably yes because its very fast.
- After testing, the member reported that Sonnet 4.5 had no errors on some code while Opus required five attempts to fix the same code.
OpenAI API with web search incompatibilities: Users reported that they are now encountering an issue with OpenAI APIs when Web Search is combined with JSON mode.
- OpenAI has a recommendation for using Structured Outputs instead of JSON mode, but no further explanations were given.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (4 messages):

Logo Prophecy, Chutes Bug

Logo’s Prophecy: One member joked that the logo from this tweet will come true soon.
- It’s just a tweet, so nothing further to add here.
Chutes’ Bug: A user reported that there’s an issue with Chutes across all models where it’ll simply return one random previous assistant message.
- The user has been debugging but can’t find anything wrong with their code.

Anthropic Claude Sonnet 4.5, Claude Code 2.0, new VS Code Extensions

Mon, Sep 29, 2025

OpenRouter ▷ #announcements (4 messages):

DeepSeek V3.2 Exp, DeepSeek Sparse Attention (DSA), Auto Router, Claude Sonnet 4.5, Google AI APIs

DeepSeek Experiments with Sparse Attention: DeepSeek released V3.2-Exp, an experimental model featuring DeepSeek Sparse Attention (DSA) for improved long-context efficiency, with reasoning control via the reasoning: enabled boolean, as described in their documentation.
- Benchmarks show V3.2-Exp performs comparably to V3.1-Terminus across key tasks, further details available on X.
Auto Router gets a Web-Enabled Upgrade: The Auto Router now directs prompts to an online, web-enabled model when needed, expanding supported models, see details here.
- Further information is provided in this X post.
Claude Sonnet 4.5 Sonically Superior to Opus: Claude Sonnet 4.5 surpasses Opus 4.1 in Anthropic’s benchmarks, showing significant improvements in coding, computer use, vision, and instruction following as seen here.
- More info on this model is available on X.
DeepSeek 3.2 Deeply Discounted, Delivers Long Context: DeepSeek 3.2 is priced at just $0.28/m prompt tokens and offers major advancements in long context efficiency, accessible at this link.
- For additional information, check the X post.
Google AI APIs Glitches Briefly: Google AI APIs experienced 500 errors across various models, but the issue seems to have been resolved.

OpenRouter ▷ #app-showcase (4 messages):

AI Model Release Tracker, Browser Compatibility Issues

AI Model Release Tracker Notifier goes live: A member shared a web service for tracking and receiving notifications about new AI model releases from leading providers.
Site compatibility issues spark browser brawl: A member reported an issue opening a link in Firefox.
- The same link worked for other members in Chrome.

OpenRouter ▷ #general (810 messages🔥🔥🔥):

Grok-4-fast API issues, Rate limit issues, Data retention policies, Gemini models for translation, Model naming conventions

Grok-4-Fast API Glitches Get Debugged: Members identified issues with the Grok-4-fast API, with one member posting the API request body and others suggesting solutions related to the reasoning_mode flag and correct model ID, solving the immediate problem.
- The proper implementation requires using "reasoning": {"enabled": true} as well as the correct model ID of x-ai/grok-4-fast.
Baffling 402 and 429 Rate Limits Plague Users: Users reported receiving 402 and 429 errors, indicating payment issues or rate limiting, with one member advising to remove Chutes BYOK if 402 errors persist and another clarifying that 429 errors are normal when rate limits are hit.
- Some members suggested putting problematic providers on an ignore list due to frequent 429 errors, particularly with free models like Silicon Flow and Chutes.
Privacy Policies of Grok Spark Debate: Discussions arose regarding Grok’s data retention and training policies, with concerns that free versions collect and use user data, while paid versions might also do so despite claims otherwise, referencing the xAI privacy policy.
- Members debated whether xAI respects Zero Data Retention (ZDR), with one member linking to a resource showing which providers retain data and for how long and others noting OpenAI’s legal obligations to store logs.
Gemini Gets Props for Translation Prowess: Members lauded Gemini 2.5 Flash and Mini for their translation capabilities, stating that Gemini excels in understanding context and delivering natural-sounding results, especially for balkan languages, outperforming other models like GPT-4 and Grok.
- Other members shared their preferred models for translation which include Qwen3 2507 30b and OSS 120b.
Navigating the Naming New App Nomenclature: A developer requested opinions on app names for a cross-platform file organization tool such as ‘Download Organizer’, offering options like Orbit, Pathway, Sortpilot, Flowkeeper, Direx, OrganizeOS, Ruleworks, DirFlow, Pathsmith, and AutoSortor.
- Members found the names to have an AI written feel, with Orbit and Pathway being the crowd favorites.

OpenRouter ▷ #new-models (3 messages):

“

No new models to report: There have been no new models discussed in the OpenRouter channel.
- Please check back later for updates.
No significant discussion: There have been no significant discussions about existing models.
- The channel appears to be inactive at this time.

OpenRouter ▷ #discussion (31 messages🔥):

Grok-4-Fast Rate Limits, OpenRouter API keys security, XAI Native Web Search Tool, Gemini glitches, Google new logo

Grok-4-Fast Suffers 429s: Members reported that Grok-4-Fast is consistently returning 429 errors, indicating 100% rate limiting, despite the status indicator showing no issues.
- One member said that 429s actually probably SHOULD count as availability issues, in the context of LLMs. particularly because unlike other software, 429s reflect real capacity constraints which aren’t just arbitrary or necessarily ephemeral.
OpenRouter API keys need automod: A member suggested adding API key detection to the automod system to prevent users from inadvertently sharing their keys.
- This feature would enhance security by automatically identifying and redacting potentially compromised keys, protecting users from unauthorized access.
Native Web Search Tool Coming Soon to XAI: Members discussed whether a native web search tool would be integrated into XAI.
- Currently, OpenRouter’s documentation on web search only lists OpenAI, Perplexity, and Claude.
Gemini has a Stroke: A member asked What in the world happened? Did Gemini have a stroke?
- It is implied that Gemini produced a nonsensical output, in an attached image that was not analyzed.
Google Gets a Gradient: Users discussed the new Google logo, complete with AI slop gradients.
- However, one user found that the link returned a 404 error: That’s an error.

not much happened today

Fri, Sep 26, 2025

OpenRouter ▷ #announcements (1 messages):

Coinbase Payments Down, Investigating Issue

Coinbase Plunges into Payment Puzzle: The Coinbase team is currently investigating an issue that may be causing payment disruptions.
- An image was shared showing Coinbase’s acknowledgement of the ongoing issue, without further details.
Coinbase Investigates Payment Disruptions: Coinbase is actively investigating a potential issue causing payment disruptions across its platform.
- Users may experience difficulties completing transactions while the team works to resolve the problem, as indicated by an official announcement.

OpenRouter ▷ #app-showcase (1 messages):

Singularia, Agentic Discord bot, OpenRouter integration

Singularia Launches as Agentic Discord Bot: Singularia is an agentic Discord bot designed to manage Discord servers, including creating channels and roles, kicking members, and theming the entire server.
- It uses OpenRouter as its LLM provider, offering a versatile solution for server management tasks.
Singularia: A Discord’s New Sheriff: The bot is designed to automate tasks like creating channels, roles, and managing members.
- It leverages OpenRouter for LLM support, enabling it to handle various server management requests efficiently and contextually.

OpenRouter ▷ #general (268 messages🔥🔥):

Text Embedding Models, 429 Too Many Requests Error, Gemini 2.5 vs Grok 4 Fast, Grok Model, Coinbase Payment Issues

OpenRouter Doesn’t Support Text Embedding Models Yet: A member inquired about the absence of text embedding models on OpenRouter.
- Another member responded that they don’t support embedding models yet.
Gemini 2.5 Flash Impresses in OCR Tasks: Members compared Gemini 2.5 Flash and Grok 4 Fast, with one finding Gemini 2.5 Flash superior for OCR, absolutely demolishing other models like qwen3 vl in a niche task.
- Meanwhile, other member noted that Grok 4 Fast has better price/performance (roughly double the tps) for non-vision tasks, while another found that Grok 4 Fast passed my custom crafted stress testing prompt.
Coinbase Top-Up Problems Plague Users: Multiple users reported issues with Coinbase top-ups on OpenRouter, experiencing infinite loading screens and console errors.
- The issue persisted for at least 9 hours, with users directed to the help channel to report the problem, though another user reported that it’s a global issue with coinbase itself and, luckily, COINBASE FIXED!
Grok 4 Fast Requires Reasoning Flag for Image Inputs: A user found that Grok-4-Fast wasn’t accepting image inputs via the API, while GPT5 and Qwen did work.
- Another member pointed out that the reasoning_mode flag was needed for image inputs to function and the model id should be x-ai/grok-4-fast.
Crypto Drones Invade General Chat: Members noticed an influx of crypto-related messages, with many users reacting negatively to messages containing gm.
- It was also noted that this project is not related in any way to crypto other than allowing crypto payment.

OpenRouter ▷ #new-models (2 messages):

“

No new models discussion found: There was no discussion about new models to summarize.
Channel silent on new models: The new-models channel had no recent activity or discussion to report.

OpenRouter ▷ #discussion (10 messages🔥):

TogetherAI vs NovitaAI, MoonshotAI K2 Vendor Verifier, Basten Tootf, The thing with praise

TogetherAI Bashed for Lagging NovitaAI: A user remarked that TogetherAI performing worse than NovitaAI is shameful.
- The user expressed surprise at Basten Tootf alongside a screenshot displaying the message tf ?
MoonshotAI Ships K2 Vendor Verifier: A user shared a link to the MoonshotAI/K2-Vendor-Verfier GitHub repository.
- The repository appears to be a vendor verifier tool developed by MoonshotAI.
Praise Paradox: Quantity vs. Authenticity: A user commented, The thing with praise is that after a certain amount of praise, the praise stops seeming genuine.
- The user also mentions how that’s wowhehnever seen that before.

GDPVal finding: Claude Opus 4.1 within 95% of AGI (human experts in top 44 white collar jobs)

Thu, Sep 25, 2025

OpenRouter ▷ #announcements (1 messages):

Accidental price change, Refunds issued, Additional validations implemented

Pricing Glitch Hits Qwen Model!: On September 16th, the endpoint qwen/qwen3-235b-a22b-04-28:free was mistakenly set with a price for approximately 26 hours.
- During this time, requests to the free model incorrectly deducted credits and appeared with a cost in users’ activity pages.
Refunds Flow After Pricing Snafu: The team automatically refunded all impacted users in full following the pricing error on the qwen model.
- The incident caused confusion, but all users affected have been compensated.
New Validation Prevents Future Pricing Mix-Ups: Additional validation checks have been added to prevent recurrence of the accidental pricing issue.
- The measures aim to ensure that free models are correctly configured and do not incur unintended charges.

OpenRouter ▷ #general (567 messages🔥🔥🔥):

Horizon Alpha, Dirty Talk Models, Zenith Sigma, Grok's Storywriting, Distilled Models

Users Seek Horizon Alpha, Demand Answers: A user urgently inquired about the whereabouts of Horizon Alpha, stating ‘I was using it in production and now it’s not working’.
- They also questioned if they were being targeted and when the issue would be resolved.
Users are seeking ‘Dirty Talk Models’: A user inquired about the best models for RPing, specifically seeking ‘any of dem dirty talk models?’.
- Another member mentioned opening a new LLM frontend called JOI Tavern.
Users discuss Stealth Model ‘Zenith Sigma’: Users discussed the stealthy Zenith Sigma model, joking that it was so stealthy, one user couldn’t even find it.
- Another user claimed Zenith Sigma is actually Grok 4.5.
Grok Storywriting more Annoying than Opus: A user shared their ‘most insane take’ that Grok is less annoying than Opus for storywriting.
- Another user explained that every character wants to avoid conflict because conflict is mean when using Grok.
OpenRouter Addresses Provider Error Issues, Promotes Gemini 2.5 Flash: Users reported experiencing Provider Returned Error messages via the API, even with paid models.
- A member stated that Openrouter doesn’t rate limit paid models, and for Gemini 2.5 flash you’ll be all good on the provider side too, suggesting OpenRouter is Tier 9999 with all Google providers.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (58 messages🔥🔥):

Volume Discounts for OpenRouter, Microsoft 365 Copilot & Claude, Gemini-cli, Discussion and Helper Roles, Meta's CWM Model

OpenRouter Eyes Volume Victory: Members inquired if OpenRouter is big enough to negotiate volume discounts with providers like DeepInfra, Hyperbolic, Anthropic, and Vertex to offer savings to users.
- The general sentiment was “savings for users are good. Savings in general are good”.
Microsoft Copilots Claude: Members shared that Claude is now available in Microsoft 365 Copilot.
- This marks a significant stride for Microsoft, especially after a messy breakup, with one member noting “microsoft are on the rebound after a messy breakup”.
Gemini CLI Gets Readier: The gemini-cli has made strides, specifically the ReadManyFiles tool in version 0.6.0 gets frequent usage.
- One member says the tool “gets a lot of work from me”.
Discussion and Helper Roles Discussed: Members discussed how to get the Discussion and Helper roles, noting it’s a frequently asked question.
- The Discussion role was initially granted to those who joined before the crypto swarm, while the Helper role is hand-picked based on helpfulness.
CWM Model Causes Cacophany: A member shared a link to facebook/cwm, a model trained on python memory traces, eliciting mixed reactions.
- While some express hype, citing its small size compared to GPT-5 and novel training method, others remain skeptical.

not much happened today

Wed, Sep 24, 2025

OpenRouter ▷ #announcements (1 messages):

Qwen Pricing Incident, Automatic Refunds, Validation Checks

Qwen’s Pricing Snafu: The endpoint qwen/qwen3-235b-a22b-04-28:free was mistakenly priced for 26 hours on September 16th, causing unintended credit deductions.
- Users saw erroneous charges for the supposedly free model in their activity logs.
Refunds Rolled Out: All impacted users received automatic and full refunds for the incorrect charges.
- The team apologized for the confusion.
Validation Checks Fortified: Extra validation checks have been implemented to prevent a recurrence of this pricing error.
- The team is ensuring that future pricing mishaps are avoided through enhanced system safeguards.

OpenRouter ▷ #general (709 messages🔥🔥🔥):

Qwen3 VL ratelimits, Deepseek alternatives, Janitor AI vs SillyTavern, OpenRouter API key as proxy, GPT-5 features

Qwen3 VL’s Ratelimits are Insane: Members complained about the ratelimits on Qwen3 VL, noting that the model works only 30% of the time.
- The model has had problems, with users experiencing 429 errors after using a proxy for the first time.
SillyTavern is preferable to Janitor AI: Users discussed Janitor AI with one commenting that SillyTavern is better because of customizability.
- Members say Janitor AI is an LLM front end, and a constant stream of new users ask why their favorite models have been returning 429 errors.
Free DeepSeek Models Suffer From Rate Limits: Users reported problems when using the free version of Deepseek V3 0324, citing 429 errors.
- It was suggested that OpenRouter create a FAQ page to address these issues, with a link pinned in the support channel.
OpenRouter Newb doesn’t even know about SillyTavern: Members mocked a new user of OpenRouter for calling their OR API key a proxy and admitted that they were a JAI user but didn’t even know what SillyTavern was.
- One member joked that it takes only minutes of direct exposure to this general channel before beginning the transformation into a twisted cynical husk.
OpenRouter OPS are like Feds?: After a moderator joined the chat, users began joking that they were working for a secret OpenRouter fed force, responsible for stopping gooning.
- OpenRouter staff denied it, but said that the Open Router Goon Force is still investigating rumors of Proxy Errors.

OpenRouter ▷ #discussion (78 messages🔥🔥):

Encoder LLMs, Token embeddings, MLP blocks, Residual stream, Attention mechanism

LLM Encoders Tokenize Text Into Vectors: Encoder LLMs turn text into vectors by first tokenizing the text, then using a lookup table to turn tokens into their pre-trained vectors.
- The discussion clarified that it’s essentially token embedding versus full sentence embedding, where a sentence is treated as one token after passing through the network.
MLP Blocks and Attention Impact Token Vectors: The conversation addressed whether encoder LLMs have MLP blocks, confirming that transformers typically have attention followed by feedforward networks.
- It was noted that even a single token, passed directly from a lookup table versus going through the full encode, will differ due to these blocks; furthermore if key and query match, then it will add its own value vector to itself.
Residual Stream’s Role in LLM Modification: Members discussed that MLPs modify the residual stream, referring to the modified embedding as it passes through the model rather than solely modifying the value vector generated during attention.
- The discussion mentioned the existence of value matrices within this process and mentioned that it was found in qwen3 embedding 0.6B model.
Microsoft and Anthropic Partnership is Blooming: Microsoft made significant strides by making Claude now available in Microsoft 365 Copilot, marking a rebound after a messy breakup.
- Discussion wondered if OpenRouter is big enough to discuss volume discounts with DeepInfra, Hyperbolic, Anthropic, Vertex, etc.
Gemini-cli ReadManyFiles tool being utilized: Gemini-cli is making big strides with the ReadManyFiles tool, as detailed in the v0.6.0 release.
- A member said The ReadManyFiles tool gets a lot of work from me.

Alibaba Yunqi: 7 models released in 4 days (Qwen3-Max, Qwen3-Omni, Qwen3-VL) and $52B roadmap

Tue, Sep 23, 2025

OpenRouter ▷ #announcements (2 messages):

GPT-5-Codex launch, Agentic coding workflows, OpenRouter-compatible coding tools, Chatroom recommended parameters

GPT-5-Codex Goes Live!: The API version of GPT-5-Codex is now available on OpenRouter, tuned specifically for agentic coding workflows like code generation and debugging.
- It is usable across all OpenRouter-compatible coding tools, has multilingual coding support across 100+ languages and dynamically adapts reasoning effort.
GPT-5-Codex Optimized for Software Engineering: GPT-5-Codex is optimized for real-world software engineering and long coding tasks.
- It also has purpose-built code review capabilities to catch critical flaws, and works seamlessly in IDEs, CLIs, GitHub, and cloud coding environments; see the tweet here.
Chatroom Parameters Recommended: Recommended parameters for models have been published in a new tweet.
- See here for further details.

OpenRouter ▷ #app-showcase (1 messages):

eofr: Scam

OpenRouter ▷ #general (173 messages🔥🔥):

Deepseek 3.1 uptime issues, OpenRouter iOS app, Qwen3 VL

Deepseek V3.1 Plagued by Uptime Issues: Users reported frequent “Provider Returned Error” messages when using the free Deepseek V3.1 model, similar to the issues experienced with the now mostly defunct Deepseek V3 0324.
- One member suggested the consistent uptime percentages of Deepseek models, such as 14%, may indicate bot usage, while another joked that users’ requests are being routed to the “trash.”
Developer creates OpenRouter iOS App: A member announced they built an iOS app to interface with OpenRouter, Flowise, and other platforms, aiming to give people the freedom to own their models and chats.
- Another member jokingly responded that it was just *“more places for gooners to flee to.”
Qwen3 VL impresses with multimodal capabilities: Members expressed amazement at Alibaba’s new Qwen3 VL model and coding product, citing its multimodal support and performance benchmarks that surpass 2.5 Pro.
- One user quipped, “I need to learn Chinese at this rate wtf”, while another shared a link to a post claiming that OpenAI can’t keep up with demand.

OpenRouter ▷ #new-models (3 messages):

“

No new models discussed: The channel is named new-models, but there were no actual models discussed in the provided Discord messages.
Channel title reiterated: The messages simply repeat the channel title, OpenRouter - New Models, three times.

OpenRouter ▷ #discussion (2 messages):

4Wallai benchmarks

4Wallai benchmarks are enjoyed: Members shared and enjoyed a link to 4wallai.com.
- Another member said that there is a need for more benchmarks like this.
More benchmarks are needed: Following the enjoyment of the linked benchmark, a member suggested that more benchmarks are needed.
- They expressed a desire for additional resources to evaluate and compare AI models effectively.

Grok 4 Fast: Xai's distilled, 40% more token efficient, 2m context, 344 tok/s frontier model

Fri, Sep 19, 2025

OpenRouter ▷ #app-showcase (3 messages):

Voicera Audio Search Engine, SillyTavern iOS Clone

Voicera Turns Audio Searchable: Voicera (http://voicera.trixlabs.in/) is pitched as an audio search engine that transforms audio into actionable insights, promising to turn hours of audio into instant, verifiable answers grounded in user’s own audio files.
- The tool allows users to upload audio, search using plain language, and get AI-generated answers with time-coded segments, aiming to streamline the process of finding key moments, quotes, or decisions in recordings.
SillyTavern iOS Clone App Emerges: An iOS developer introduced a free SillyTavern iOS clone, Loreblendr AI (https://loreblendr.ai/), designed for users seeking a native SillyTavern experience on iOS devices.
- Despite acknowledging it cannot match all SillyTavern features, the developer expressed satisfaction with the app’s current state and highlighted its user-friendly interface compared to existing chat apps.

OpenRouter ▷ #general (305 messages🔥🔥):

Responses API Benefits, Kimi K2 0711 Downtime, GPT-4o Alternative, Deepseek V3 429 Errors, Chutes Pricing

Kimi K2 Glitches, Model Downgrade Debated: Users reported errors with Kimi K2 0711 on ST and some inquired about downtime, while others suggested upgrading to the newer 0905 model, which costs the same.
- One user pointed out the free version of Kimi K2 (https://openrouter.ai/moonshotai/kimi-k2:free) is no longer available.
DeepSeek Proxy Woes, Rate Limits Run Rampant: Users are encountering Error 429 when trying to use DeepSeek proxy, especially the free models, indicating rate limiting issues with Chutes.
- One user suggested that Chutes might be throttling free users due to the high demand on weekends, making things almost pay-to-use.
Gemini’s NSFW Filter, a Source of Frustration: Users discussed the issues of using Gemini for NSFW content, with some experiencing API key bans when attempting explicit roleplay.
- One user noted Google “wishes money while screwing people up” referencing their own experience with Gemini and expressing frustration with Google AI Studio.
Allow Fallbacks Fails, Chaos Ensues: Users reported that the allow_fallbacks: False setting isn’t working as expected, with requests still being routed to other providers despite being set to prevent this.
- A member suggested that OpenRouter might reroute requests if the specified model isn’t available, if unsupported features are being used, or other unknown conditions.
DeepSeek R1 vs 3.1: A User-Driven Showdown: A user touted Deepseek R1 as superior to 3.1 for roleplaying due to its ability to convey sarcasm and irony, while another stated that 3.1 sounds the most like base-level GPT and just describes rather than talks.
- That same user gave the caveat that Deepseek 3.1 requires a shit ton of prompt and stated that it was now the last one they’d touch with a ten-foot pole.

OpenRouter ▷ #discussion (7 messages):

Claude popularity, code-supernova model

Claude Coding Popularity Swells Weekly: Members noted that Claude gets more popular during the week, adopting a “on the job vibe coding”.
- This suggests Claude is increasingly favored for professional coding tasks during workdays.
code-supernova Model Stealthily Appears: A new stealth model, code-supernova, has emerged, allegedly by Anthropic and rumored to be Claude 4.5, according to an image analysis.
- Users describe the model as decent but somewhat lazy, providing only the bare minimum implementation and not behaving like Claude.

Softbank, NVIDIA and US Govt take 2%, 5% and 10% of Intel, will develop Intel x86 RTX SOCs for consumer & datacenters

Thu, Sep 18, 2025

OpenRouter ▷ #announcements (4 messages):

Responses API Alpha launch, OpenRouter credits for feedback

OpenRouter launches Responses API Alpha: OpenRouter launched the Responses API Alpha, designed as a drop-in replacement for OpenAI’s Responses API that is stateless.
- The Alpha Docs and the OpenRouter base URL were provided for developers to start building.
OpenRouter hands out credits for API feedback: OpenRouter offered $10 in OpenRouter credits to the first 50 users who provide valuable feedback on the Responses API Alpha.
- Users can submit feedback via this form, with feedback on developer experience, ergonomics, and missing features being of interest.

OpenRouter ▷ #general (373 messages🔥🔥):

OpenAI O3 Price Drop, GPT-5 Performance, OpenAI Responses API, Deepseek Error 429, Kimi K2

OpenAI Slashes O3 Prices with Inference Stack Wizardry: OpenAI dropped the price of O3 by 80% back in June by optimizing the inference stack, without sacrificing performance, confirmed in a tweet by Sam Altman.
GPT-5 Gets Roasted, Users Prefer Alternatives: Users are heavily criticizing GPT-5, calling it mind-blowingly bad and opting for Google and Anthropic models instead, because OpenAI requires ID and a face scan to use their God-awful, crippled LLM over their terrible API.
Debate Rages: Is Top K Sampling a Lexicon Expander?: A user claimed Top K sampling expands the lexicon of models like R1 in RPs, while another argued it does the opposite by cutting off creative, low-chance wordings and called it magical thinking.
OpenAI’s Responses API: What’s the Buzz?: The Responses API allows models to remember past reasoning and use OpenAI tools better offering stateless and stateful modes, but a user found tools don’t work at all, even using the documented example in the OpenRouter docs.
Users grapple with User Not Found Error: Some users experienced a User not found. Our servers might be syncing; try again later! error, which was solved by trying a different browser, turning off adblock or anything like that, disable proxy.
- Other mentioned that the issue works when I use a different account but I never did anything to account I normally used so they contacted support.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

not much happened today

Wed, Sep 17, 2025

OpenRouter ▷ #announcements (1 messages):

GPT-5, Native web search, Organization usage tracking, ZDR parameter

GPT-5 gets slashed pricing!: For one week, GPT-5 is 50% off on OpenRouter at https://openrouter.ai/openai/gpt-5 from September 17th to 24th, as announced in this tweet.
Native Web Search Integration Launches: OpenRouter now uses native web engines for OpenAI and Anthropic models by default, as announced in this tweet.
Track Org Member Usage Easily: Users can now track their organization’s API usage across all API keys via the org member usage tracking dashboard, as seen in the attached screenshot.
ZDR Parameter Hits the Scene: A new Zero Data Retention (ZDR) parameter is available in provider options, ensuring only ZDR providers are used for a given request, as long as it isn’t disabled at the org level.

OpenRouter ▷ #general (145 messages🔥🔥):

Gemma-3-27B Model, OpenAI-compatible endpoint, ModelRun endpoint issues, Image generation models, OpenRouter rate limits

Gemma-3-27B Blazes In For Free: A team is dropping a fully OpenAI-compatible endpoint with the blazing-fast Gemma-3-27B model for free, served on H100s via their custom-optimized stack with lightning-fast completions and streaming support.
- The team is encouraging users to share what they’re building with it and will support cool projects.
ModelRun’s Endpoint Bounces Back After Hiccups: After initially launching and then taking down an endpoint due to unexpected errors, a team is re-sharing it now that it’s fully functional, hoping to provide something useful to the community.
- A member suggested it would be cool to have a dedicated channel for pre-testing before OpenRouter tests.
Image Generation Dreams Deferred (For Now): A member inquired about image generation models beyond Gemini.
- The team responded that they are currently focused on optimizing for LLM-based inference, but expanding into image generation is on the roadmap.
GPT-5’s Discount Divides and Dethrones?: A discussion ensued regarding the 50% discount on GPT-5, with speculation about its purpose, ranging from infrastructure optimizations like with o3 to dethroning competitors on leaderboards.
- One member noted that the discount is for this week only.
GLM’s Caching Quirks Cause Commotion: A member reported that GLM 4.5’s caching on z.ai is broken with OpenRouter, consistently caching only 43 tokens.
- Another member explained that the token caching depends on how the prompt is structured, only caching tokens that are exactly the same from the beginning.

OpenRouter ▷ #new-models (2 messages):

“

No new models discussed: There were no new models discussed in the provided messages.
No specific topics for summaries: The provided messages did not contain enough information to create detailed topic summaries.

OpenRouter ▷ #discussion (1 messages):

kyle42: Hmm, $0.08/$1.50 in/out if cached and under 32k context Otherwise, $0.12/$2.50

not much happened today

Tue, Sep 16, 2025

OpenRouter ▷ #announcements (1 messages):

grok-2 deprecation, grok-3 release, grok-4 release

Grok-2 Gets the Boot!: The models grok-2-1212 and grok-2-vision-1212 are being deprecated by xAI today.
- Users should migrate to grok-3, or grok-4 if vision is needed.
Grok models, a New Hope: As Grok-2 is being deprecated, OpenRouter suggests moving to newer Grok-3 or Grok-4 models.
- If your application needs vision support, then Grok-4 is the better choice.

OpenRouter ▷ #general (287 messages🔥🔥):

Co-op Gooning with AI, NSFW bot development, OpenRouter Presets for Pre-Prompts, Gemma-3-27B Model API, AI Sex Dolls Implications

Co-op Gooning dream team: GC bots ahoy?: Members discussed the possibility of cooperative gooning with AI, envisioning a group chat setup involving multiple bots for shared experiences.
- One user suggested a feature to create messages collaboratively, enhancing the shared experience, while another jested about the emergence of competitive gooning.
NSFW Bot Gold Rush: Vibecoders Strike it Rich!: The conversation took an NSFW turn as users explored the creation of AI sex bots, with one member claiming to have already developed a functional prototype connected to a fleshlight.
- They described the setup involving an API that controls the toys movements based on the AI’s text output, though details and proof were kept under wraps due to the channel’s content guidelines, and linked to Buttplug.io.
Preset pre-prompts persistently pester users: A user inquired about setting a pre-prompt for all models in OpenRouter without having to manually introduce it every time.
- Despite attempts to apply to all, the setting did not persist across different chats or models, leading to the suggestion of implementing default presets, echoing a feature request from other users.
Gemma-3-27B’s H100 Hookup: Free API Flow!: A team announced a free, OpenAI-compatible endpoint featuring the blazing-fast Gemma-3-27B model, served on H100s via their custom-optimized stack, offering lightning-fast completions and streaming support.
- The community was invited to try it out and provide feedback, with the team expressing interest in supporting cool projects built with it, with example curl commands provided.
AI Sex Doll quandaries: Is pre-made the key?: Discussions emerged about the implications of AI sex dolls, specifically regarding customizable models and potential misuse, such as customizing them as children or using models of real people.
- The conversation explored whether pre-made models would alleviate these concerns, though some argued that even pre-made models could be modified or misused and were already possible with existing tech, highlighting ethical considerations around data collection as illustrated in this tenor gif.

OpenRouter ▷ #new-models (2 messages):

“

No new models discussed.: No new models were discussed in the provided messages.
Channel silent on new models.: The ‘new-models’ channel provided no information or discussion related to new models.

OpenRouter ▷ #discussion (21 messages🔥):

Gemini 3 Pro vs 2.5 Flash, 2.5 Pro checkpoins changing, Google Expectations

Gemini 3 Pro draws comparisons to 2.5 Flash: Members compared Gemini 3 Pro to 2.5 Flash, mentioning its potential to solve tasks requiring decent reasoning, as illustrated by a complex circuit analysis problem.
- One member said that if Gemini 3 Pro is just 2.5 Pro without the not x; but y problem and less sycophancy, I would be happy and another member said that Google isn’t even that behind, really, not much pressure still.
2. 5 Pro’s performance varies between checkpoints: Members discussed varying performance levels across different checkpoints of 2.5 Pro, with one claiming a noticeable decline after the initial weeks of 2.5 pro.
- They noted that performance had decreased on every listed benchmark except for the coding ones.
Google faces high expectations, even 3.0 Flash: It was mentioned that allegedly, based on a leak, 3.0 flash is planned to be better than 2.5 pro.
- One member declared that I do always have high expectations for google at this point. They are killing it on everything.

GPT-5 Codex launch and OpenAI's quiet rise in Agentic Coding

Mon, Sep 15, 2025

OpenRouter ▷ #announcements (1 messages):

grok-2-1212, grok-2-vision-1212, grok-3, grok-4, model deprecation

Grok’s Gotta Go: The models grok-2-1212 and grok-2-vision-1212 are being deprecated by xAI today.
- Users are encouraged to transition to grok-3 or grok-4 for vision support.
Grok Model Upgrade Alert: xAI is retiring the grok-2-1212 and grok-2-vision-1212 models.
- Users are advised to switch to grok-3, or grok-4 if vision capabilities are required.

OpenRouter ▷ #app-showcase (2 messages):

Agentic Automation, Model effectiveness, Overclock Work

Agentic Automation gets the Spotlight: Members discussed agentic automation, with emphasis on simplicity, top-tier models, and high effectiveness.
- The user implied some organizations must have large expenditures to buy into the vision of agentic automation.
Overclock Work Platform Mentioned: A user shared a link to Overclock Work, suggesting it as a platform for agentic automation.
- They lauded the platform’s simplicity, use of optimal models, and overall effectiveness.

OpenRouter ▷ #general (808 messages🔥🔥🔥):

Gemini 2.5 Pro Chat Issues, AI for Health Concerns, Skyrim Mod Error 401, Gemini API Free Daily Credits, OpenRouter Charges

Gemini 2.5 Chat Glitch: Ghost in the Machine?: A user reported that Gemini 2.5 Pro chat was only displaying their responses, with AI responses mysteriously vanishing.
- The issue resolved itself randomly, prompting speculation about possible glitches on the platform.
AI ER Saves Hand, Gemini’s Got Your Back?: A user credited Gemini 2.5 Pro with convincing them to go to the emergency room for severe degenerative disk disease, where they received priority treatment and steroids that saved their hand.
- Gemini’s analysis of MRI images and blood tests matched the doctors’ findings, sparking a discussion about the potential—and risks—of using AI for health-related advice.
OpenRouter API key?: Users encounter Error 401 when installing the skyrim mod ”mantella”.
- A member recommends creating a new API key, and ensuring that it is being used correctly.
OpenRouter under fire: A user reports unauthorized charges from OpenRouter, with three transactions of $10.80 each.
- Another member recounts a personal experience with a key leak, resulting in hundreds of dollars in unauthorized charges within hours.
Claude’s Clever Convo Tricks: Users discussed Claude’s ability to seemingly remember old conversations, clarifying that the site simply feeds past messages back to the model.
- It was pointed out that this approach gives the illusion of memory, while a new conversation would start with a blank slate.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (16 messages🔥):

Unstable API, OpenRouter vs Alternatives, Providers Claiming OpenRouter Access, LLM Arena Oceanstone Speculation, ChatGPT Usage Privacy Analysis

Championing Chaotic Complements: Unstable API Advocacy: A member expressed support for an unstable API with optional parameters, suggesting it could accommodate diverse use cases and establish interest, before a more solidified V2 is released.
- The point was to prove out the product and establish interest in other modalities and non-completions APIs even if the first version is half baked.
OpenRouter Outshines Other Options: A member compared OpenRouter favorably to FAL, Cloudflare Gateway, and Vercel Gateway, citing OpenRouter’s broader offerings as a key differentiator.
- The same member made the point that cementing that dominance in other modalities and non-completions APIs seems worthwhile.
Phony Providers Proclaim Premature Partnering: Some providers in the channel discussion claim to have access via OpenRouter on their websites, despite not being onboarded and lacking other means of inferencing.
- The discussion quickly resolved to only one provider making the false claim.
Oceanstone Speculation Surrounds Subterranean Sources: A new LLM named Oceanstone has surfaced in the LMArena, leading to speculation that it may be Gemini 3.0 Flash.
- Members seem to think it’s from Google, and one member speculated it’s at least 2.5 Flash level.
ChatGPT’s Consumer Captivation Captured: A member shared a link to OpenAI’s article and PDF containing stats from a large-scale analysis of 1.5 million ChatGPT conversations.
- The study claims to be the most comprehensive study of actual consumer use of AI ever released, though privacy concerns were raised regarding the methodology.

not much happened today

Sat, Sep 13, 2025

OpenRouter ▷ #general (185 messages🔥🔥):

Dropshipping, Gemini API's, OpenRouter API, Kimi-k2

Dropshipping vs Reselling: A user shared their experience with dropshipping, reporting consistent earnings of 3k-4k per day, suggesting it’s more profitable than reselling due to the ability to scale without holding significant inventory.
- They offered to share tips for success to those interested in learning more.
Gemini’s Responses are Strange: Some users have noticed that Gemini API is starting to give strange responses, not listening to instructions even without changing the code used since last month.
- Another member suggested it might be getting lobotomized and quanted like hell to cut costs.
OpenRouter TPS numbers inflated?: A user complained about the slowness of the platform, questioning if the TPS numbers are inflated, citing a 5-minute delay for a diff on a 100-line file.
- It was suggested that the user may have been routed to a slow provider or using a reasoning model.
OpenRouter API Error 401 on Skyrim Mod: A user reported getting an Error 401 No auth credentials found when installing the Skyrim mod mantella.
- A member suggested creating a new API key and ensuring it’s used correctly, or seeking support from the mod developers.
Kimi-k2: The efficient Open Source Model: Some users had positive feedback with the open source model Kimi-k2, praising its token efficiency, conciseness, lack of sycophancy, general different style.
- It was also stated that it might not be as smart as the big closed source ones, but the low pricing for Groq is $1/m in, $3/m out, but at very fast speeds.

OpenRouter ▷ #discussion (1 messages):

fn5io: https://openai.com/index/joint-statement-from-openai-and-microsoft/

Qwen3-Next-80B-A3B-Base: Towards Ultimate Training & Inference Efficiency

Thu, Sep 11, 2025

OpenRouter ▷ #app-showcase (2 messages):

“

No Recent Activity: There has been no recent activity in the app-showcase channel to summarize.
- The channel appears to be quiet at the moment.
Awaiting New Content: The summarization bot is awaiting new content to provide relevant and informative summaries.
- Please check back later when there is new activity in the channel.

OpenRouter ▷ #general (125 messages🔥🔥):

Query Prompting Race Condition Bug, Token Calculation, JSONDecodeError, Moonshot AI Provider Selection, LongCat Implementation

Query Prompting Race Condition Bug Reported: A member reported a bug related to a possible race condition in query prompting, where longer, more detailed prompts yield worse results for translations.
- No solution was found, but it was suggested to report the bug to the developers.
Token Calculation Conundrums: A member inquired about calculating the number of tokens for input, seeking a non-heuristic method due to model-specific variations.
- It was suggested to use external APIs in conjunction with the endpoint’s tokenizer information, as documented, since there’s nothing in the documentation about that.
JSONDecodeError Troubleshooting: Users discussed a JSONDecodeError indicating an invalid JSON response from the server, often due to server-side failures like rate limiting, misconfigured models, or internal errors.
- The error suggests the server returned HTML or an error blob instead of valid JSON.
Moonshot AI Pricing: A user asked how to avoid the more expensive turbo version when selecting Moonshot AI as the provider for Kimi K2 in the OpenRouter chatroom.
- The solution offered was to select a cheaper provider in the advanced settings.
OpenRouter iOS File Upload Bug: A user reported a bug where they couldn’t upload PDF or TXT files to OpenRouter chat on iOS because non-image files were grayed out.
- It was confirmed as a bug, likely an oversight when file uploads were added, with no workaround available on iOS.

OpenRouter ▷ #new-models (3 messages):

“

No new models updates to report: There were no discussions or updates regarding new models in the specified channel.
- The channel activity consisted of repeated bot messages indicating the channel’s name.
Readybot.io Spam: The channel ‘new-models’ solely contained repeated messages from Readybot.io.
- These messages simply stated the channel name: ‘OpenRouter - New Models’.

OpenRouter ▷ #discussion (29 messages🔥):

Grok Code inference pricing, Kilocode's Free Grok usage, OpenRouter pricing model

OpenRouter’s Grok Code inference: Paid or Free?: Members discussed whether Grok Code inference via OpenRouter is entirely paid, with one initially thinking so, but then they realized, as others pointed out, that services like Kilocode offer it for free.
- The discussion highlighted the speed advantages of Grok Code and the surprise at the cheap cache price of 2 cents.
Kilocode’s Grok Code: xAI foots the bill: The group discussed who covers the cost when Kilocode offers free Grok Code, clarifying that xAI eats the cost for the free Grok Code usage on platforms like Kilocode.
- One member suggested they are probably using BYOK and openrouter is charging either a monthly fee or/and small cut.
OpenRouter’s Revenue Model: BYOK or small cut?: Members speculated about OpenRouter’s pricing model, wondering if it involves a monthly fee, a small cut, or Bring Your Own Key (BYOK).
- Another member added that OpenRouter gets ranked since they are routing through OpenRouter.

Oracle jumps +36% in a day after winning $300B OpenAI contract

Wed, Sep 10, 2025

OpenRouter ▷ #announcements (1 messages):

Nvidia Nemotron Nano, DeepInfra new paid provider

Nvidia Nemotron Nano Relocation: The nvidia/nemotron-nano-9b-v2 is moving to nvidia/nemotron-nano-9b-v2:free under the Nvidia provider.
- Users should adjust their configurations accordingly to reflect this change.
DeepInfra paid provider incoming: DeepInfra will soon be available as a paid provider.
- This provides an alternative for users looking for paid options.

OpenRouter ▷ #app-showcase (2 messages):

“

No Topics Found: There were no discussion topics found in the provided messages.
No Summaries Available: I was unable to create any summaries from the provided message: WassupGuys.

OpenRouter ▷ #general (397 messages🔥🔥):

OpenRouter free models, API rate limits, BYOK markups, API keys, Models vs. token limits

OpenRouter offers Free Models with Usage Caps: It was noted that models labeled as free come with usage caps, 50 reqs/day if you have less than $10 in your account, 100 requests per minute and 1000/day if you have loaded $10.
- It’s not 1000 requests per free model - but 1000 total, and the rate limit is actually 20/min. Paid models have no rate limits.
Bring Your Own Key (BYOK) - 5% markup on cost of call: A member asked about using their own keys, e.g. Google AI Cloud key, instead of OpenRouter credits. They were told that OR charges 5% of what the call would have cost through OR.
- Another user clarified, “yes, there is a charge, 5% of what the call would have cost through OR” and linked to the BYOK documentation.
API Keys needed to track usage per user: One member requested the ability to generate fresh API keys per user to track usage for their platform.
- They mentioned they are bootstrapping on a tight budget and have already poured $5k-$6k into the project and applied to OpenRouter for funding.
Coding LLM preferences: Qwen Coder vs Codex vs CC Max: Members discussed best coding LLMs, one specifically endorsed Qwen Coder 3 as the best bang for buck coding model right now.
- Others touted Codex for backend and logic, and GPT5-high for readability, though it seems CC Max is better for UX and workflow.
AI models generate unexpected output with longer prompts: A user has noticed that for translations, longer and more detailed prompts gave them worse results. In contrast, short prompts lead to better translations.
- No clear answer for the behavior was given, and was mentioned as an example of prompting being a “weird” task.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (13 messages🔥):

Nemotron Nano pricing, Agentic tool calling models, LLMs for Swift UI development

Nemotron Nano’s Free Pricing Debuts: Members inquired about the pricing of the Nemotron Nano 9B v2 model on OpenRouter, questioning if it was intended to be free, with confirmation that it is free.
- A member clarified that the model is “free free”, likely without the strict limits associated with the :free tag, similar to stealth models.
Best Agentic Tool Caller Brainstorm: A user asked for recommendations on the best agentic tool calling model capable of basic reasoning over input data, mentioning that GPT-2.5 Flash has been reliable but can be slow at scale.
- Other members recommended the Grok Code Fast model as being a decent choice for quick reasoning.
GPT-2.5 Pro is King for Swift UI Devs: A member asked what the best LLM is for Swift UI developers, referencing that GPT-2.5 Pro seems to be the best according to Reddit and a few blogs, with a link to the related tweet.
- No further details were provided about other options.

not much happened today

Tue, Sep 9, 2025

OpenRouter ▷ #app-showcase (3 messages):

Interfaze LLM, Design Arena

Interfaze LLM is born!: JigsawStack launched Interfaze, a LLM built for developer tasks that combines all of their models alongside infra and tools.
- They are using OpenRouter to run the LLM layer for fallbacks and retries, and it is currently in closed alpha and looking for early power users.
Design Arena gives AI builders to the Masses: A member recommended checking out Design Arena, which allows you to use AI builders like Lovable/Bolt/DevinAI/Magnus for free.
- Another member has been using it to make websites and sell them for $5k on the side, noting that the fact that it’s free is wild.

OpenRouter ▷ #general (152 messages🔥🔥):

Model hosting on OpenRouter, Gemini 1.5 Flash Access, OpenAI's Response API support, Untraceable usage, Token Drop Issue with Deepseek V3

Model Hosting Wishlist: A member asked OpenRouter to consider hosting some of their models on Hugging Face.
- OpenRouter clarified that they don’t host models directly; providers must host them.
Gemini 1.5 Blues: Users reported issues accessing Gemini 1.5 Flash 002, encountering errors related to key validation and project access.
- It was clarified that 1.5 models are no longer enabled for projects that had no prior usage, requiring users to test with models more likely to exist.
OpenAI’s Response API ETA: Members inquired about OpenRouter’s support for the new OpenAI Response API, particularly for features like web search.
- OpenRouter confirmed they’re using it under the hood for OpenAI models and are working on supporting the new Response SDK “pretty soon.”
Deepseek Token Shenanigans: A user reported a decrease in available tokens when running a text adventure on Deepseek V3 0324 despite chat memory settings.
- It was suggested that context length limits and the use of “middle-out” transform could influence token counts, with the software dropping entire old messages to stay under the limit.
Nano-9B’s Dubious Debut: A member inquired about the pricing of Nvidia Nemotron Nano-9B V2, which appeared to be listed at a low price or even free.
- Though the pricing was unclear, another user pointed out that it wasn’t tagged as ‘:free’ but had a price of 0, suggesting it might not be subject to free model rate limits.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (25 messages🔥):

Qwen ASR Model Integration, TTS and STT Unification, Gemini's Thought Signatures, Nvidia Nemotron Nano 9B V2 Pricing, Agentic Tool Calling Models

Qwen ASR: ASR Model Integration Quest: A member inquired about supporting ASR models like Qwen ASR, given the existing multimodal audio support.
- The response highlighted that the current expectation for chat completions is text-in, text-out, which may not align with all AI model use cases, potentially breaking the swap to any model concept.
TTS/STT: Call for Unified APIs!: A member expressed a desire for OpenRouter to unify TTS and STT APIs, instead of needing a different SDK for each.
- Another member mentioned a possibility of unifying different use cases in the future, assuming specialized niches have enough demand while pointing out many niches will be replaced by LLMs.
Gemini’s Signatures: Thought Signature Snag!: A member jokingly inquired about support for Gemini’s thought signatures.
- A link was provided to OpenRouter’s reasoning tokens documentation, but the original member noted that it was not related to Google’s signatures.
Nvidia Freebie: Nemotron Nano is gratis!: A member asked if the Nvidia Nemotron Nano 9B V2 model was supposed to be priced at $0, noting the absence of the :free tag.
- A member confirmed it is free free and linked to a tweet while another mentioned it’s free without the strict limits that come with that tag.
Agentic Tool Calling: Tool Time Tussle: A member asked about favorite agentic tool calling model that’s smart enough to do some basic reasoning over input data and make reasonable tool calls.
- They noted that 2.5 flash has been solid but can still feel a bit slow at scale.

Kimi K2‑0905 and Qwen3‑Max preview: two 1T open weights models launched

Fri, Sep 5, 2025

OpenRouter ▷ #announcements (1 messages):

Qwen3-Max, RAG, Tool calling

Qwen3-Max releases with several improvements: The latest Qwen3-Max model boasts higher accuracy in math, coding, logic, and science tasks compared to the January 2025 version, according to this X post.
- It also delivers better instruction following in Chinese and English, stronger multilingual support across 100+ languages, reduced hallucinations, and optimized for RAG and tool calling.
Qwen3-Max is optimized for RAG and Tool calling: The Qwen3-Max is optimized for RAG and tool calling and does not have a dedicated ‘thinking’ mode
- Try Qwen3-Max here to check out it’s capabilities.

OpenRouter ▷ #app-showcase (1 messages):

tomlucidor: Finds https://github.com/Lapis0x0/obsidian-next-composer

OpenRouter ▷ #general (126 messages🔥🔥):

OpenRouter Crypto Scam, Anthropic's Geopolitical Concerns, API Key Issues, BYOK Fees, Token Limits and Output Truncation

OpenRouter coin is fake: Members confirmed that any OpenRouter-related cryptocurrency is a scam and not officially affiliated with OpenRouter.
- Despite warnings, users inquired about the presence of an OpenRouter coin on PancakeSwap and its availability for trading, prompting further clarification that OpenRouter has no official involvement in any cryptocurrency.
Anthropic’s Geopolitical Stance Raises Eyebrows: Members discussed Anthropic’s latest blog post which prohibits access from jurisdictions with ownership structures subject to control from countries where their products aren’t permitted.
- Some wondered if this was a matter of national security or simply market share protection.
API Keys throw authentication errors: A user reported API key issues, receiving a ‘No auth credentials found’ error message from ChatGPT.
- The user was prompted to specify the client being used, either the OpenAI client or a custom one, to diagnose the authentication problem.
BYOK Fees Demystified: A user inquired about the charges associated with using BYOK (Bring Your Own Key), specifically with chutes and Qwen Coder 3.
- It was clarified that OpenRouter charges a 5% BYOK fee on top of what the provider (e.g., chutes) charges.
Token output limited to 8k: A user wanted to ensure errors are thrown when the output token limit is exceeded.
- The response gets cut off when the token limit is reached, with the stop reason identified as ‘length’; the API will prevent you from setting max_tokens higher than the model’s limit.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (12 messages🔥):

Benchmark Increase, Real World Performance vs. Benchmarks, OpenRouter API usage

Benchmarks Keep Climbing!: Members noted that every benchmark keeps going up, but the disconnect between benchmark percentage increase and real-world performance keeps rising.
- They added that 5% delta on benches used to be noticeable but is becoming less so, as we are reaching a plateau, though models have improved in creative writing, EQ, tool call failures, and context length adherence.
OR Uses OpenAI Responses API: A member asked if OpenRouter uses the OpenAI Responses API, linking to a tweet.
- Another member confirmed that it does for the majority of OpenAI models.

not much happened today

Thu, Sep 4, 2025

OpenRouter ▷ #announcements (1 messages):

toven: The promotional free period for Gemini 2.5 Flash Image has now ended.

OpenRouter ▷ #general (108 messages🔥🔥):

Gemini 2.5 Flash Image Restrictions, DeepInfra's Gemini 2.5 Pricing, OpenRouter API Key Exposure, Kimi K2 Model, Prompt Caching Benefits

Gemini 2.5 Flash gets throttled: Users expressed frustration over heavy usage restrictions on the Gemini 2.5 Flash Image:free model, including a limit of 5 requests per day after an initial limit of 1000 requests.
- One user pointed out that OpenRouter is sharing its limit at Google with all other users, which is causing the rate limiting.
DeepInfra discounts for Gemini cause conflict: Members discussed why DeepInfra isn’t an official Gemini 2.5 provider on OpenRouter, as it offers cheaper output tokens.
- It was clarified that DeepInfra does not want OR to serve it, as it’s using their own GCP discounts while proxying back to Google.
API Key Leaks and Automod Concerns: A user accidentally posted their OpenRouter API key in the chat, prompting immediate advice to delete it.
- Another member suggested adding an API key regex to the automod to prevent accidental key exposure, similar to measures on GitHub.
Prompt Caching yields savings: Members discussed the benefits of prompt caching and one user provided a scenario showing how caching a 200k token book content would reduce the cost of answering 100 questions from $60 to $6.
- Others noted that caching is complex, the first request won’t be cached, and that caching depends on whether the content falls into the cache.
Amazon Bedrock had a security issue: Users reported that Amazon Bedrock provider was unavailable for hours.
- The OR team confirmed that the downtime was due to a security issue and that it was resolved.

OpenRouter ▷ #discussion (4 messages):

Deepseek AI Agent, R2 never

Deepseek Aims Agent Release to Rival OpenAI: DeepSeek is building an AI model designed to carry out multi-step actions on a person’s behalf with minimal direction, and meant to learn and improve based on its prior actions.
- Their prior R1 platform reportedly cost just several million dollars to build yet matched or surpassed OpenAI products in benchmark tests.
R2 Nowhere to Be Found: A member commented, man we never getting R2.

not much happened today

Wed, Sep 3, 2025

OpenRouter ▷ #app-showcase (2 messages):

Nano Banana Discord Bot, vibe coded bot

Vibe Coded Bot drops Nano Banana: A member shared a Discord bot that uses Nano Banana over OpenRouter.
- The user clarified that they vibe coded the bot.
Nano Banana powers Discord Bot: A Discord bot was created leveraging Nano Banana through OpenRouter to post content.
- The bot’s source code is available on GitHub for anyone interested in exploring or contributing.

OpenRouter ▷ #general (230 messages🔥🔥):

DeepSeek models, Claude-sonnet-4 problems, DeepSeek 3.1 cheapest price, Submodel.ai promotion, OpenRouter billing questions

DeepSeek Free Models Speak Gibberish: Some users report that free DeepSeek models are generating gibberish, while the paid ones are functioning correctly.
DeepSeek 3.1 Pricing: A user asked about the cheapest and most stable source for DeepSeek 3.1, with Synthetic.new being mentioned at $20/month, but another user called the official rates a “rip off.”
- Another user suggested Submodel, but a mod cautioned that they need to wait in line like everyone else who wants to be on the platform.
Agent Framework SDK Experiences Shared: Members discussed their experience with Agent Framework SDKs like OpenAI Agent SDK, AI SDK, and LangChain JS with OpenRouter, noting that most require patching due to non-standard schemas from various providers.
- One member plans to roll their own solution with BAML integration, emphasizing that it’s just HTTP anyway.
ChutesAI Subscriber Faces 429 Errors: A ChutesAI subscriber is experiencing 429 rate limit errors and credit issues when using OpenRouter with a BYOK, specifically on Chub but not on JanitorAI.
- Despite verifying the correct API key and private key usage, the problem persists, and the user has tried various solutions to no avail; the issue seems to be specific to routing through Chutes on Chub.
Gemini-2.5 Rate Limits and Credit Drain Debated: Users are encountering 429 rate limit errors with Gemini-2.5-flash-image-preview:free, even after paying to increase rate limits, and suspect it’s an OpenRouter issue.
- One user reported OpenRouter drained all their credits due to a bug with Gemini image, and a mod confirmed refunds will be issued soon.

OpenRouter ▷ #discussion (5 messages):

Google Antitrust, Yahoo Chrome, Minor UI suggestion

Google Faces Antitrust Ruling: A member linked to a CNBC article about Google facing an antitrust ruling.
- The member commented it was truly remarkable.
Yahoo Considering Buying Chrome: A member expressed a morbid fascination with the hypothetical scenario of Yahoo acquiring Chrome.
- They mentioned that a truly sick part of me wanted to see the purchase of Chrome by Yahoo! play out.
Request to improve UI Padding: A member suggested a minor UI tweak, specifically suggesting to remove the padding-bottom on the outer box, and move it to the inner box.
- The member suggests that this change would prevent the scroll wheel from obscuring the text.

Anthropic raises $13B at $183B Series F

Tue, Sep 2, 2025

OpenRouter ▷ #app-showcase (2 messages):

geocoding, photon.komoot.io

Photon provides world-wide geocoding: A member shared a link to photon.komoot.io, suggesting it might be of interest for world-wide geocoding.
Geocoding with Photon: The user shared photon.komoot.io as a resource.

OpenRouter ▷ #general (87 messages🔥🔥):

Alternatives to Deepseek, OpenRouter anonymity, Gemini 2.5 flash image problem, Chutes deepseek v3 free, OpenRouter server issues

Kimi and GLM as Deepseek Alternatives: Members suggested using Kimi K2 (temp=0.6) and GLM 4.5 as alternatives to Deepseek for chitchatting, also pointing out a list of free models on OpenRouter.
- One member said that using OpenRouter provides better anonymity compared to using Chutes or Deepseek directly.
Gemini 2.5 Flash Image Fails: A user reported an issue where Gemini 2.5 flash image sometimes sends the text “here is the image” but does not actually send the image.
- No specific solutions or workarounds were mentioned in the discussion.
Deepseek V3 Instability Woes: Users reported that Deepseek V3 is becoming unstable and producing grammatically nonsensical outputs.
- One user experiencing gibberish outputs suggested lowering the temperature, others experiencing the same problems were using V3 0324.
Claude Sonnet’s Code Nerfed: One user reported that their Claude Code usage has been severely limited, restricting its use to less than an hour straight.
- It was suggested that Codex is a decent replacement and that new terms might be the cause of the limitation.
OpenRouter’s JanitorAI and Chub.ai Switched?: A user speculated that OpenRouter might have JanitorAI and Chub.ai switched around in its internal app database, based on SimilarWeb metrics and JanitorAI’s brief downtime.
- The user thinks that OpenRouter simply takes the X-referer header and stores it, trimming everything after the domain name.

OpenRouter ▷ #new-models (2 messages):

“

Empty Channel, No New Models: The new-models channel on OpenRouter Discord appears to be empty, with no new model discussions or announcements to summarize.
- Further monitoring is needed to capture any relevant updates on new models in the future.
Awaiting New Model News: Currently, the channel lacks any specific details, links, or discussions that meet the criteria for detailed summarization.
- The absence of content suggests a quiet period in terms of new model-related activity.

not much happened today

Fri, Aug 29, 2025

OpenRouter ▷ #announcements (1 messages):

Sonnet 4, 1 Million Context length

Sonnet 4 gets Massive Context Boost!: Sonnet 4 now supports 1 Million context length for all providers, enabling substantially longer context windows.
- Pricing will increase once the 200k input context is exceeded, as per the official announcement.
Cost Considerations for Extended Context: Users should be aware that while the context window has expanded, costs will increase when exceeding the 200k input limit.
- This change encourages efficient prompt engineering to maximize utility within the standard context window before incurring additional expenses.

OpenRouter ▷ #app-showcase (6 messages):

Dashboard Code Release, Screenshot Attention, AI Roleplay Site

Dashboard Code Visualized: The code for the dashboard is now public at openrouter-costs-visualizer.
- The creator admits the code isn’t perfect, with plans to clean it up and welcomes contributions and feedback.
Screenshots Boost User Attention: A member advised that using a screenshot in the description gets more user attention.
- They noted that fewer users read texts nowadays.
AI Roleplay with OpenRouter: An AI roleplay site, personality.gg, uses OpenRouter but allows BYOK for OpenRouter.
- The site features no moderation for chats.

OpenRouter ▷ #general (938 messages🔥🔥🔥):

stripe refund, Deepseek v3 performance issues, Inference provider onboarding, GPT OSS 120B, GLM 4.5 Air

Stripe Refunds taking a while to credit: Members discussed issues with Stripe taking 5-10 days to credit refunds back to their accounts, referencing Stripe’s documentation for confirmation.
- One member initially had issues with a $20 deposit not showing up but indicated the charge had declined, while another member reported being debited multiple times without receiving credit and ultimately found using a direct debit card connection resolved the issue.
Deepseek v3 choked by Chutes: Users reported that Deepseek V3 0324 has been practically inaccessible, with frequent errors, and believe that Chutes (the provider) is rate-limiting the free model.
- One user claimed they are trying to drive people to use their own service than OR’s by doing the rate limiting, with others expressing concerns that the free model is advertised as ‘in stock’ even when it’s unusable.
Inference Provider Backlog Backlog Backlog: A member asked about becoming an inference provider, and a link to the OpenRouter documentation for providers was provided, but with the caveat of a potentially months-long wait due to a large backlog.
- It was mentioned that having a unique offering, such as super fast speed, models other providers don’t have, or cheap price, might expedite the onboarding process.
GPT-OSS 120B has coding quirks: Users discussed GPT-OSS 120B, a free model available on OpenInf, noting that while it’s cheap to serve, it has quirks like adding too many comments in code and being safety-maximized.
- Some community members noted that its smaller size means it has less world knowledge means more mistakes, and is best used for specific tasks such as parsing text documents or sentiment analysis due to its cheap price and fast speed.
GLM 4.5 Air gets the NemoEngine treatment: A user shared that GLM 4.5 Air with NemoEngine V5.8 is performing well for roleplay, citing its more natural feel and consistent formatting, with aifloo.com showing a benchmark and the cost.
- Another user said that it’s also better than Deepseek and at the level of Gemini Pro for human-like conversation.

OpenRouter ▷ #discussion (27 messages🔥):

Defining a turn, OpenAI responses API, Multi-turn chats, Gemini 2.5 Pro, Grok 3

Defining a Turn in AI Chat: Discussion revolves around defining a turn in user/assistant message pairs, with a consensus that a turn starts with a user message and ends with an assistant message.
- One member shared their Tweet and linked back to the conversation with their thoughts.
Unlocking OpenAI’s Stateless API: A member sought guidance on using the OpenAI responses API statelessly with reasoning & tools, specifically how to send tool calls from the assistant in the message input without using previous_response_id.
- Another member framed single-turn vs. multi-turn as “If you can send a prompt and get back and get back a response, it’s single turn” vs. “If you can send past messages along with a new message, it’s multi-turn.”
Diving into Gemini 2.5 Pro: A member shared an image highlighting Gemini 2.5 Pro positioned in the middle, noting a trend of more negative vibes with newer models.
- Another member expressed disbelief that O1 had negative vibes on release, emphasizing its status as a first thinking model that solved puzzles and achieved SOTA in nearly every benchmark.
Grok 3’s High Ranking Raises Eyebrows: Members debated Grok 3’s high ranking, with one suggesting it might be due to Grok 2’s perceived poor performance.
- The member thinks Grok 3 should be positive primarily becuase grok 2 was such an abomination.

OpenAI Realtime API GA and new `gpt-realtime` model, 20% cheaper than 4o

Thu, Aug 28, 2025

OpenRouter ▷ #announcements (1 messages):

OpenRouter Outage, Supabase Downtime, Redundancy Improvements

Supabase grounds OpenRouter: OpenRouter experienced an outage this morning due to its database provider, Supabase going down.
- The system recovered automatically as the database provider stabilized, resulting in a total downtime of approximately 49 minutes.
OpenRouter bolsters Redundancy: The team is actively working on improving redundancy and removing single points of failure to prevent future outages.
- They apologized for the downtime and are committed to improving the overall platform stability.

OpenRouter ▷ #app-showcase (6 messages):

Self-Hosting Tool, GitHub Repository, Dashboard Code, Screenshot Tip

Dashboard Code Goes Public!: The code for the dashboard is now publicly available on GitHub.
- The author admits the code isn’t perfect but welcomes contributions, feedback, and any other input to improve it.
Screenshot Tip to Boost Attention: A user suggested that including a screenshot in the description can attract more attention to the GitHub repository.
- They observed that fewer users are reading text descriptions nowadays, making visuals more effective.

OpenRouter ▷ #general (1023 messages🔥🔥🔥):

OpenRouter outage, Requesty promotion in OpenRouter, Deepseek rate limits and provider issues, GPT-OSS model, API for free tier models

OpenRouter Suffers Downtime, Users Respond with Roleplay: OpenRouter experienced an outage, leading to humorous reactions and role-playing in the Discord chat, with users joking about corpo wars and the AI apocalypse.
- One user quipped, Get up samurai, we’ve got a city to fuck, while others expressed addiction and the need for AI companionship during the downtime.
Scam Alert? Requesty promoters Banned as OpenRouter users claim it’s ‘vibecoded trash’: Members discussed another AI platform called Requesty, with some accusing its promoters of spamming and users calling it vibecoded trash with 1000 vulnerabilities.
- In response, one member posted the following in response to [Team, we are investigating the issue…]
Users Complain About Deepseek Free Models’ Rate Limits: Users complained about high error rates and rate limits with the free Deepseek models on OpenRouter, speculating that Chutes is prioritizing paid users.
- One user mentioned only getting 5 msgs before hitting the limit and the need to switch to a model with better tooling support like Claude Sonnet 4.
GPT-OSS Open Weight Confusion: Users sought clarification on the GPT-OSS model, specifically regarding its open weight status and the possibility of running it on personal hardware after a member linked Openrouter OSS Models.
- One member clarified, It’s open weights but not fully open src iirc after another user claimed it works on his 4090 PC with 64GB of RAM.
Frustration with OpenRouter Support Delays and Account Funding: A user expressed frustration over delayed credit addition to their OpenRouter account despite successful debit transactions, and another noted that the charge declined.
- Other users chimed in with similar experiences and mentioned using alternative payment methods, while one advised checking the credits page for refund options.

OpenRouter ▷ #new-models (2 messages):

“

No New Models: There were no new models discussed in the OpenRouter channel.
Lack of Discussion: The channel lacked substantial discussion to form meaningful summaries.

OpenRouter ▷ #discussion (45 messages🔥):

AI Gateway: Cloudflare vs OpenRouter, Human Assimilation into AI Linguistics, Defining 'Turns' in Chatbot Interactions, OpenAI API Stateless Reasoning & Tools

Cloudflare AI Gateway Chutes for OpenRouter: Cloudflare launched an AI Gateway which was said to have copied OpenRouter, but one member retorted that OpenRouter had chutes.
- Another member then tested using Cloudflare’s AI Gateway to access OpenRouter to call llama-3.1-8b-instruct with the only: ['cloudflare'] parameter, noting it took 20 seconds, while without it it was 3 seconds.
GPT-isms transform Language: Members discussed whether certain linguistic affectations like delve, intricate, surpass, boast, meticulous, strategically, and garner are GPT-isms.
- One joked that humans are being assimilated into AI, and it transforms the way they speak, coining the phrase They are not just tokens. They are concrete evidence of human being assimilated into AI.
Defining ‘Turns’ on AI: A member created a poll about whether we should share data about number of turns and defining what a turn is.
- They stated in a follow up tweet that a turn is an user/assistant message pair and generally starts in an user message and ends in an assistant message, and system messages don’t count.
OpenAI API stateless reasoning: A member asked if anyone knew how to use the OpenAI responses API statelessly with reasoning & tools.
- They could not figure out how to send as input the assistant having tool calls in its message without using previous_response_id.

OpenAI updates Codex, VSCode Extension that can sync tasks with Codex Cloud

Wed, Aug 27, 2025

OpenRouter ▷ #general (1090 messages🔥🔥🔥):

PDF parsing with LLMs, OpenRouter model names and GPT-5, Llama 3 Maverick data collection policy, Gemini 2.5 Flash Image, Groq Rate Limits

Deepseek and Gemini under the pump: Users reported issues with deepseek models returning 429 errors due to rate limiting, possibly due to chutes prioritizing its own users and discussed fixes such as enabling training on paid endpoints and checking privacy settings.
- Users found Gemini to be timid and quick to revert, exhibiting what was termed beaten dog syndrome, possibly as a result of heavy training.
LLama Maverick - No Input Tracking: Members expressed excitement about LLama 3 Maverick, noting it’s a large, free model that does not train on user input, with a maximum output of 4k.
- A member cautioned that Zuckerberg is hosting it.
Sonnet 3.5’s Demise Sparks Conciseness Concerns: With the impending deprecation of Sonnet 3.5, users lamented the difficulty in finding a similarly concise model for role-playing and similar applications, contrasting it with newer models that tend to be long-winded.
- It was mentioned that AWS will host Claude Sonnet 3.5 with no deprecation date until Jan 2026.
Troubleshooting Deepseek and Chutes Conundrums: Users encountered PROXY ERROR 404 with Deepseek models and discovered enabling ‘Enable paid endpoints that may train on inputs’ in privacy settings could temporarily resolve the issue, though the root cause remained unclear.
- It was suspected that the issue stemmed from chutes and involved a possible bug from a recent update by OpenRouter.
Stackedsilence Labors Over Enigmatic Brain-in-Computer Project: A user (stackedsilence) has been working for nine months on a mysterious service described as “persistent minds hosted in tha computer”, featuring a dashboard with 3D elements, raising both curiosity and skepticism.
- It’s described as a B2B SaaS, but someone else noted the compiler errors connecting 100 files and asked Does it have NFT and blockchain in it? If no, spend another 9 months.

OpenRouter ▷ #new-models (5 messages):

“

No topics found in channel: There were no messages in the channel to summarize.
No topics found in channel: There were no messages in the channel to summarize.

OpenRouter ▷ #discussion (23 messages🔥):

Grok Code, Triple Model Day, Apple buys Mistral, Anthropic Copyright Settlement, Launch Date Conflicts

Triple Model Day incoming!: Xander Atallah announced Grok Code is going live now on Triple Model Day.
Apple Discusses Acquisition of Mistral AI and Perplexity: News surfaced that Apple discussed buying Mistral AI and Perplexity.
- Some members noted that Perplexity is famous for airing these rumors.
Anthropic Settles Copyright Suit!: Anthropic settles a major AI copyright suit brought by authors.
Launch Date Conflicts Overwhelm!: Some members are wondering if OpenRouter can do something to de-conflict the launch dates and that too many models at once is overwhelming.
Gemini Can Enhance Designs!: A user shared that Gemini can enhance designs, or you can try multiple designs all at once (see what happens if you changed the color scheme or something), linking to a reddit thread about it.

nano-banana is Gemini‑2.5‑Flash‑Image, beating Flux Kontext by 170 Elo with SOTA Consistency, Editing, and Multi-Image Fusion

Tue, Aug 26, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Cloudflare outage, Generations API stability

Generations API Hit by Cloudflare Hiccups: The Generations API endpoint experienced a temporary disruption due to issues with upstream infrastructure providers, causing 404 errors for some calls.
- The announcement indicated that the issue was related to intermittent problems with Cloudflare, but the Generations API has since been restored to a healthy state.
Retryable Restorations: Calls to that endpoint may 404 but should be re-tryable soon.
- The announcement assured users that the service would be restored quickly, advising them to retry any failed calls.

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

OpenRouter Cost Dashboard, Average Request Size, Gemini Input Token Calculation

Cost Reports get Visualized!: A member has developed a free dashboard to visualize .csv cost reports from OpenRouter, designed to analyze data from shared accounts.
- The dashboard, available at openroutercosts.lorenzozane.com, is planned to include additional KPIs and enhanced charts, with feedback welcome.
Average Request Size requested in Dashboard!: A member requested the addition of average request size metrics, specifically average input tokens and average output tokens, to the OpenRouter cost dashboard.
- The dashboard’s developer committed to adding this feature soon.
Gemini Input Tokens trigger Weird Counts!: The developer of the dashboard noted that OpenRouter’s calculation of input tokens for Gemini’s models appears to produce unusual counts when images are included in the input.
- They are considering seeking clarification from the OpenRouter team regarding this issue, referencing a related discussion on the Google AI Developers forum.

OpenRouter (Alex Atallah) ▷ #general (528 messages🔥🔥🔥):

Deepseek pricing, OpenRouter rate limits, Gemini banning, Using OpenRouter with RAG systems, 4.6T parameter model

Deepseek V3.1 Public Release Imminent!: Many users eagerly await the public release of Deepseek v3.1, craving it like fent and anticipating it will be free starting in September.
Paid Deepseek offers Faster Responses: Users confirm that paying for Deepseek models on OpenRouter results in faster response times compared to the free models, with one user switching due to Chutes slowing responses, but the user experience on the free models are not as good due to constant rate limits.
- One user stated, ever since that thing with chutes slowing responses I just said screw it i pay for it.
OpenRouter API Keys Vulnerable to Leaks and Exploits: A user reported a loss of $300 due to a leaked OpenRouter API key and sought advice on identifying the source of the unauthorized usage, but it’s possible for threat actors to use a proxy to mask their origin IP and the user is responsible for any leaked keys.
Is Gemini Doing the Banning Tango?: Users report massive banning occurring on Gemini, leading many to seek alternatives and reminisce about the AI Dungeon purge caused by OpenAI.
- One user lamented, we’re being sent back to 2023.
OpenRouter API keys can be used in RAG?: Users discuss the possibility of using OpenRouter LLM API keys in RAG systems with locally stored vector databases created by Milvus.
- The consensus is that it’s possible, but OpenRouter doesn’t directly support embeddings, so you’ll have to retrieve documents using milvus and put it with your prompt question to the OpenRouter LLM API.

OpenRouter (Alex Atallah) ▷ #new-models (3 messages):

“

Readybot.io Announces OpenRouter - New Models: Readybot.io has announced updates and information regarding new models available on the OpenRouter platform.
OpenRouter’s New Models Updates: The OpenRouter platform highlights the latest additions and changes to its selection of AI models, as announced by Readybot.io.

OpenRouter (Alex Atallah) ▷ #discussion (16 messages🔥):

Qwen3 coder 480b, DeepSeek v3 0324, Zero return from generative AI, Google Gemini 400 Error, Cohere reasoning model

LLMs struggle to format output correctly: Users are finding that LLMs like Qwen3 coder 480b and DeepSeek v3 0324 struggle to follow instructions for formatting their output properly, often resulting in bugs and ignored prompts.
- One user found them not useful and rather distracting, often creating tic-tac-toe sites instead of the intended application.
Most orgs see ZERO return on Generative AI: According to an AFR Chanticleer report, 95% of organizations are getting zero return out of their generative AI deployment.
- The report notes this is focused on companies that have deployed customized AI models, and the key problem is companies and their tech vendors are not spending enough time ensuring that their customized AI models keep learning about the nuances of their businesses.
Google Gemini Models trigger 400 Error: Google Gemini models return HTTP 400 errors when assistant messages with tool calls use the OpenAI-standard complex content format [{"type": "text", "text": "..."}] instead of simple string format.
- This issue affects all google/gemini-* models and only occurs when tool calls and tool results are present in the message chain.
Cohere Releases Reasoning Model: Cohere just dropped a reasoning model with further details available on Discord.
- No further details were available.
Feature Request: Auto-Collapse lengthy user messages: A user requested if it’s possible to automatically collapse lengthy user messages in the chatroom.
- The user praised the chatroom and the chat management.

not much happened today

Mon, Aug 25, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Cloudflare outage, Generations API stability

Generations API Hit by Cloudflare Hiccups: The Generations API endpoint experienced a temporary disruption due to issues with upstream infrastructure providers, causing 404 errors for some calls.
- The announcement indicated that the issue was related to intermittent problems with Cloudflare, but the Generations API has since been restored to a healthy state.
Retryable Restorations: Calls to that endpoint may 404 but should be re-tryable soon.
- The announcement assured users that the service would be restored quickly, advising them to retry any failed calls.

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

OpenRouter Cost Dashboard, Average Request Size, Gemini Input Token Calculation

Cost Reports get Visualized!: A member has developed a free dashboard to visualize .csv cost reports from OpenRouter, designed to analyze data from shared accounts.
- The dashboard, available at openroutercosts.lorenzozane.com, is planned to include additional KPIs and enhanced charts, with feedback welcome.
Average Request Size requested in Dashboard!: A member requested the addition of average request size metrics, specifically average input tokens and average output tokens, to the OpenRouter cost dashboard.
- The dashboard’s developer committed to adding this feature soon.
Gemini Input Tokens trigger Weird Counts!: The developer of the dashboard noted that OpenRouter’s calculation of input tokens for Gemini’s models appears to produce unusual counts when images are included in the input.
- They are considering seeking clarification from the OpenRouter team regarding this issue, referencing a related discussion on the Google AI Developers forum.

OpenRouter (Alex Atallah) ▷ #general (528 messages🔥🔥🔥):

Deepseek pricing, OpenRouter rate limits, Gemini banning, Using OpenRouter with RAG systems, 4.6T parameter model

Deepseek V3.1 Public Release Imminent!: Many users eagerly await the public release of Deepseek v3.1, craving it like fent and anticipating it will be free starting in September.
Paid Deepseek offers Faster Responses: Users confirm that paying for Deepseek models on OpenRouter results in faster response times compared to the free models, with one user switching due to Chutes slowing responses, but the user experience on the free models are not as good due to constant rate limits.
- One user stated, ever since that thing with chutes slowing responses I just said screw it i pay for it.
OpenRouter API Keys Vulnerable to Leaks and Exploits: A user reported a loss of $300 due to a leaked OpenRouter API key and sought advice on identifying the source of the unauthorized usage, but it’s possible for threat actors to use a proxy to mask their origin IP and the user is responsible for any leaked keys.
Is Gemini Doing the Banning Tango?: Users report massive banning occurring on Gemini, leading many to seek alternatives and reminisce about the AI Dungeon purge caused by OpenAI.
- One user lamented, we’re being sent back to 2023.
OpenRouter API keys can be used in RAG?: Users discuss the possibility of using OpenRouter LLM API keys in RAG systems with locally stored vector databases created by Milvus.
- The consensus is that it’s possible, but OpenRouter doesn’t directly support embeddings, so you’ll have to retrieve documents using milvus and put it with your prompt question to the OpenRouter LLM API.

OpenRouter (Alex Atallah) ▷ #new-models (3 messages):

“

Readybot.io Announces OpenRouter - New Models: Readybot.io has announced updates and information regarding new models available on the OpenRouter platform.
OpenRouter’s New Models Updates: The OpenRouter platform highlights the latest additions and changes to its selection of AI models, as announced by Readybot.io.

OpenRouter (Alex Atallah) ▷ #discussion (16 messages🔥):

Qwen3 coder 480b, DeepSeek v3 0324, Zero return from generative AI, Google Gemini 400 Error, Cohere reasoning model

LLMs struggle to format output correctly: Users are finding that LLMs like Qwen3 coder 480b and DeepSeek v3 0324 struggle to follow instructions for formatting their output properly, often resulting in bugs and ignored prompts.
- One user found them not useful and rather distracting, often creating tic-tac-toe sites instead of the intended application.
Most orgs see ZERO return on Generative AI: According to an AFR Chanticleer report, 95% of organizations are getting zero return out of their generative AI deployment.
- The report notes this is focused on companies that have deployed customized AI models, and the key problem is companies and their tech vendors are not spending enough time ensuring that their customized AI models keep learning about the nuances of their businesses.
Google Gemini Models trigger 400 Error: Google Gemini models return HTTP 400 errors when assistant messages with tool calls use the OpenAI-standard complex content format [{"type": "text", "text": "..."}] instead of simple string format.
- This issue affects all google/gemini-* models and only occurs when tool calls and tool results are present in the message chain.
Cohere Releases Reasoning Model: Cohere just dropped a reasoning model with further details available on Discord.
- No further details were available.
Feature Request: Auto-Collapse lengthy user messages: A user requested if it’s possible to automatically collapse lengthy user messages in the chatroom.
- The user praised the chatroom and the chat management.

not much happened today

Fri, Aug 22, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Cloudflare outage, Generations API stability

Generations API Hit by Cloudflare Hiccups: The Generations API endpoint experienced a temporary disruption due to issues with upstream infrastructure providers, causing 404 errors for some calls.
- The announcement indicated that the issue was related to intermittent problems with Cloudflare, but the Generations API has since been restored to a healthy state.
Retryable Restorations: Calls to that endpoint may 404 but should be re-tryable soon.
- The announcement assured users that the service would be restored quickly, advising them to retry any failed calls.

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

OpenRouter Cost Dashboard, Average Request Size, Gemini Input Token Calculation

Cost Reports get Visualized!: A member has developed a free dashboard to visualize .csv cost reports from OpenRouter, designed to analyze data from shared accounts.
- The dashboard, available at openroutercosts.lorenzozane.com, is planned to include additional KPIs and enhanced charts, with feedback welcome.
Average Request Size requested in Dashboard!: A member requested the addition of average request size metrics, specifically average input tokens and average output tokens, to the OpenRouter cost dashboard.
- The dashboard’s developer committed to adding this feature soon.
Gemini Input Tokens trigger Weird Counts!: The developer of the dashboard noted that OpenRouter’s calculation of input tokens for Gemini’s models appears to produce unusual counts when images are included in the input.
- They are considering seeking clarification from the OpenRouter team regarding this issue, referencing a related discussion on the Google AI Developers forum.

OpenRouter (Alex Atallah) ▷ #general (528 messages🔥🔥🔥):

Deepseek pricing, OpenRouter rate limits, Gemini banning, Using OpenRouter with RAG systems, 4.6T parameter model

Deepseek V3.1 Public Release Imminent!: Many users eagerly await the public release of Deepseek v3.1, craving it like fent and anticipating it will be free starting in September.
Paid Deepseek offers Faster Responses: Users confirm that paying for Deepseek models on OpenRouter results in faster response times compared to the free models, with one user switching due to Chutes slowing responses, but the user experience on the free models are not as good due to constant rate limits.
- One user stated, ever since that thing with chutes slowing responses I just said screw it i pay for it.
OpenRouter API Keys Vulnerable to Leaks and Exploits: A user reported a loss of $300 due to a leaked OpenRouter API key and sought advice on identifying the source of the unauthorized usage, but it’s possible for threat actors to use a proxy to mask their origin IP and the user is responsible for any leaked keys.
Is Gemini Doing the Banning Tango?: Users report massive banning occurring on Gemini, leading many to seek alternatives and reminisce about the AI Dungeon purge caused by OpenAI.
- One user lamented, we’re being sent back to 2023.
OpenRouter API keys can be used in RAG?: Users discuss the possibility of using OpenRouter LLM API keys in RAG systems with locally stored vector databases created by Milvus.
- The consensus is that it’s possible, but OpenRouter doesn’t directly support embeddings, so you’ll have to retrieve documents using milvus and put it with your prompt question to the OpenRouter LLM API.

OpenRouter (Alex Atallah) ▷ #new-models (3 messages):

“

Readybot.io Announces OpenRouter - New Models: Readybot.io has announced updates and information regarding new models available on the OpenRouter platform.
OpenRouter’s New Models Updates: The OpenRouter platform highlights the latest additions and changes to its selection of AI models, as announced by Readybot.io.

OpenRouter (Alex Atallah) ▷ #discussion (16 messages🔥):

Qwen3 coder 480b, DeepSeek v3 0324, Zero return from generative AI, Google Gemini 400 Error, Cohere reasoning model

LLMs struggle to format output correctly: Users are finding that LLMs like Qwen3 coder 480b and DeepSeek v3 0324 struggle to follow instructions for formatting their output properly, often resulting in bugs and ignored prompts.
- One user found them not useful and rather distracting, often creating tic-tac-toe sites instead of the intended application.
Most orgs see ZERO return on Generative AI: According to an AFR Chanticleer report, 95% of organizations are getting zero return out of their generative AI deployment.
- The report notes this is focused on companies that have deployed customized AI models, and the key problem is companies and their tech vendors are not spending enough time ensuring that their customized AI models keep learning about the nuances of their businesses.
Google Gemini Models trigger 400 Error: Google Gemini models return HTTP 400 errors when assistant messages with tool calls use the OpenAI-standard complex content format [{"type": "text", "text": "..."}] instead of simple string format.
- This issue affects all google/gemini-* models and only occurs when tool calls and tool results are present in the message chain.
Cohere Releases Reasoning Model: Cohere just dropped a reasoning model with further details available on Discord.
- No further details were available.
Feature Request: Auto-Collapse lengthy user messages: A user requested if it’s possible to automatically collapse lengthy user messages in the chatroom.
- The user praised the chatroom and the chat management.

Cohere Command A Reasoning beats GPT-OSS-120B and DeepSeek R1 0528

Thu, Aug 21, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Cloudflare outage, Generations API stability

Generations API Hit by Cloudflare Hiccups: The Generations API endpoint experienced a temporary disruption due to issues with upstream infrastructure providers, causing 404 errors for some calls.
- The announcement indicated that the issue was related to intermittent problems with Cloudflare, but the Generations API has since been restored to a healthy state.
Retryable Restorations: Calls to that endpoint may 404 but should be re-tryable soon.
- The announcement assured users that the service would be restored quickly, advising them to retry any failed calls.

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

OpenRouter Cost Dashboard, Average Request Size, Gemini Input Token Calculation

Cost Reports get Visualized!: A member has developed a free dashboard to visualize .csv cost reports from OpenRouter, designed to analyze data from shared accounts.
- The dashboard, available at openroutercosts.lorenzozane.com, is planned to include additional KPIs and enhanced charts, with feedback welcome.
Average Request Size requested in Dashboard!: A member requested the addition of average request size metrics, specifically average input tokens and average output tokens, to the OpenRouter cost dashboard.
- The dashboard’s developer committed to adding this feature soon.
Gemini Input Tokens trigger Weird Counts!: The developer of the dashboard noted that OpenRouter’s calculation of input tokens for Gemini’s models appears to produce unusual counts when images are included in the input.
- They are considering seeking clarification from the OpenRouter team regarding this issue, referencing a related discussion on the Google AI Developers forum.

OpenRouter (Alex Atallah) ▷ #general (528 messages🔥🔥🔥):

Deepseek pricing, OpenRouter rate limits, Gemini banning, Using OpenRouter with RAG systems, 4.6T parameter model

Deepseek V3.1 Public Release Imminent!: Many users eagerly await the public release of Deepseek v3.1, craving it like fent and anticipating it will be free starting in September.
Paid Deepseek offers Faster Responses: Users confirm that paying for Deepseek models on OpenRouter results in faster response times compared to the free models, with one user switching due to Chutes slowing responses, but the user experience on the free models are not as good due to constant rate limits.
- One user stated, ever since that thing with chutes slowing responses I just said screw it i pay for it.
OpenRouter API Keys Vulnerable to Leaks and Exploits: A user reported a loss of $300 due to a leaked OpenRouter API key and sought advice on identifying the source of the unauthorized usage, but it’s possible for threat actors to use a proxy to mask their origin IP and the user is responsible for any leaked keys.
Is Gemini Doing the Banning Tango?: Users report massive banning occurring on Gemini, leading many to seek alternatives and reminisce about the AI Dungeon purge caused by OpenAI.
- One user lamented, we’re being sent back to 2023.
OpenRouter API keys can be used in RAG?: Users discuss the possibility of using OpenRouter LLM API keys in RAG systems with locally stored vector databases created by Milvus.
- The consensus is that it’s possible, but OpenRouter doesn’t directly support embeddings, so you’ll have to retrieve documents using milvus and put it with your prompt question to the OpenRouter LLM API.

OpenRouter (Alex Atallah) ▷ #new-models (3 messages):

“

Readybot.io Announces OpenRouter - New Models: Readybot.io has announced updates and information regarding new models available on the OpenRouter platform.
OpenRouter’s New Models Updates: The OpenRouter platform highlights the latest additions and changes to its selection of AI models, as announced by Readybot.io.

OpenRouter (Alex Atallah) ▷ #discussion (16 messages🔥):

Qwen3 coder 480b, DeepSeek v3 0324, Zero return from generative AI, Google Gemini 400 Error, Cohere reasoning model

LLMs struggle to format output correctly: Users are finding that LLMs like Qwen3 coder 480b and DeepSeek v3 0324 struggle to follow instructions for formatting their output properly, often resulting in bugs and ignored prompts.
- One user found them not useful and rather distracting, often creating tic-tac-toe sites instead of the intended application.
Most orgs see ZERO return on Generative AI: According to an AFR Chanticleer report, 95% of organizations are getting zero return out of their generative AI deployment.
- The report notes this is focused on companies that have deployed customized AI models, and the key problem is companies and their tech vendors are not spending enough time ensuring that their customized AI models keep learning about the nuances of their businesses.
Google Gemini Models trigger 400 Error: Google Gemini models return HTTP 400 errors when assistant messages with tool calls use the OpenAI-standard complex content format [{"type": "text", "text": "..."}] instead of simple string format.
- This issue affects all google/gemini-* models and only occurs when tool calls and tool results are present in the message chain.
Cohere Releases Reasoning Model: Cohere just dropped a reasoning model with further details available on Discord.
- No further details were available.
Feature Request: Auto-Collapse lengthy user messages: A user requested if it’s possible to automatically collapse lengthy user messages in the chatroom.
- The user praised the chatroom and the chat management.

DeepSeek V3.1: 840B token continued pretrain, beating Claude 4 Sonnet at 11% of its cost

Wed, Aug 20, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Activity Analytics API, Allowed Models API, OpenRouter Developer APIs

OpenRouter releases Activity Analytics API: OpenRouter announced the release of a new Activity Analytics API that allows users to programmatically retrieve their daily activity rollups, as well as those for their organizations, documented here.
OpenRouter lists allowed Models API: OpenRouter has released an Allowed Models API, enabling users and organizations to programmatically fetch the models they are permitted to access based on their provider restrictions, documented here.

OpenRouter (Alex Atallah) ▷ #general (211 messages🔥🔥):

OpenWebUI Memory Feature, GPT-5 Context Issues, Deepseek v3.1 Availability on OpenRouter, Stealth Model Speculation (Grok-4 Code), Free Model Options on OpenRouter

Decoding OpenWebUI’s Memory Lane: In OpenWebUI, the memory feature defaults to manual input, but addons exist to automatically save relevant conversation snippets, enhancing recall.
- Relevant memories are injected into the system prompt, influencing the model’s responses by providing contextually appropriate information.
GPT-5’s Context Crisis: Token Tumbles: Users report issues with openai/gpt-5’s 400k context window, with calls failing at ~66k tokens, yielding a silent 200 OK response and 0 tokens.
- When trying gpt-5-chat with Cline on Cursor, users also experienced an exceeded context window error at <100k tokens.
Deepseek v3.1 Delay: OpenRouter’s Holdout: Users are questioning why Deepseek v3.1 is not yet available on OpenRouter, even as other providers like Chutes offer it, though the latter has the base version.
- OpenRouter is reportedly awaiting an official announcement before launching Deepseek v3.1, suggesting a provider-dependent release strategy.
Stealth Model Speculation: Grok-4 Code Emerges: Speculation points to Grok-4 Code as the stealth model used by Cline and Cursor, with one member giving it a 90% chance of being accurate.
- A member hinted that Deepseek instruct was another possible contender.
Unearthing Free Model Trove on OpenRouter: OpenRouter offers a range of free models, including Llama 3.3-70B-Instruct:free, accessible via the Together AI models page.
- Users are advised to navigate the Together AI models page and search for models labeled as “free”.

OpenRouter (Alex Atallah) ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter (Alex Atallah) ▷ #discussion (12 messages🔥):

LLMs Formatting Output, AFR Chanticleer AI Report, Google Gemini Models, OpenAI-standard complex content format, tool calling flows

LLMs Struggle with Output Formatting: Users reported having bad experiences using LLMs like Qwen3 coder 480b and DeepSeek v3 0324 due to their inability to follow formatting instructions properly.
- The outputs often had bugs, failed to display, and frequently ignored the initial prompt, instead creating unrelated content like a tic tac toe site.
AI Report Causes Market Nervousness: An AI report suggests that 95% of organizations are getting zero return from their generative AI deployments, particularly those using customized AI models.
- The report indicates that companies aren’t spending enough time ensuring their customized AI models keep learning, while a shadow AI economy has developed where employees rely on general AI models like ChatGPT and Gemini.
Google Gemini Models Return 400 Error: Google Gemini models return an HTTP 400 error when assistant messages with tool calls use the OpenAI-standard complex content format [{"type": "text", "text": "..."}] instead of a simple string format.
- This issue affects all google/gemini-* models but does not impact openai/* or anthropic/* models, and only occurs when tool calls and tool results are present in the message chain.

Databricks' $100B Series K

Tue, Aug 19, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Chutes Capacity, Server Outage

Chutes Capacity Plummets Offline: The Chutes Capacity service experienced an outage, with their servers going offline.
- The team is actively working on restoring the servers and anticipates commencing recovery efforts shortly.
Quick Recovery Anticipated for Chutes Capacity: Engineers are on standby to initiate the recovery process for Chutes Capacity as soon as the servers are back online.
- No estimated time of full service restoration was given.

OpenRouter (Alex Atallah) ▷ #general (638 messages🔥🔥🔥):

DeepSeek outage, Chutes overload, OpenRouter pricing, Alternatives to DeepSeek, BYOK 5% fee

DeepSeek v3 Outage Angers Users: Users report DeepSeek v3 is experiencing frequent internal server errors and rate limits, with some unable to generate outputs even after multiple attempts, with one user saying it’s so slow that it’s genuinely just not generating anything but I’m not getting any error messages.
- Some speculate that Chutes, a primary provider for DeepSeek on OpenRouter, is experiencing issues due to high demand, leading to provider errors and slow performance.
Chutes Overload Blamed for DeepSeek Issues: Several members are reporting that the overload causes 429 errors, suggesting that Chutes is experiencing a bottleneck due to miners not ramping up to meet demand; one member noted that it was completely fine all day until like 30 min ago.
- There’s speculation that Chutes may be intentionally rate-limiting the OpenRouter API key to encourage users to purchase credits directly from them, with one user advising to just burn your credits and never use their service again.
OpenRouter Pricing Debated Amidst Outages: With DeepSeek models barely working, some users are questioning the value of paying for OpenRouter, particularly as they are still getting rate-limited, with users expressing that a 10 USD investment for 1k free messages/day for a free model is no longer a good deal.
- One user suggested that users with only one model in mind should’ve gone directly for the models directly, such as with DeepSeek, which may have automatic caching on their API, and further stating that the 10 USD would have lasted for months anyway.
Free Model Alternatives Sought: Users are recommending alternative free models such as Dolphin 3.0 Mistral 24B and Mistral nemo; the latter being described as super similar to DeepSeek.
- Some users also mentioned Z.AI: GLM 4.5 Air (free), for work related stuff, but needing a prompt; finally one user hopes for Qwen3 235B A22B (free) hosted somewhere.
OpenRouter BYOK comes with 5% Fee: Members discovered that OpenRouter charges a 5% fee even when users bring their own API key (BYOK), leading to a discussion about whether this is a fair practice.
- One user joked Greedy /jor 5% when you bring your own key, with another member responding you’re welcome not to use it lol.

OpenRouter (Alex Atallah) ▷ #discussion (35 messages🔥):

OpenRouter File API Integration, Tool Calling Accuracy Stats, Qwen3 32B Pricing, DeepInfra Turbo Endpoint, New Providers Section UI

OpenRouter should Integrate File API: A member suggested that OpenRouter should figure out how to integrate a files API, noting that the top 3 labs already have this feature.
- No further discussion was made.
Tool Calling Accuracy: More Control Needed: A member shared thoughts on tool calling accuracy stats, arguing that the setup and environment needs to be more controlled for accurate comparison with confidence intervals.
- They added that the apps, tools, and use cases can be vastly different, making it pointless to compare the tool call success rate without more rigor.
Qwen3 32B priced absurdly low: Members noticed low pricing for Qwen3 32B on Chutes at $0.018/$0.072 MTok in/out, same with Mistral Small.
- It was noted that the 32b dense version is cheaper than the moe 30b a3 version, prompting some disappointment about the lack of good providers for 30A3B.
DeepInfra Throughput Claim Discrepancy: A member noted DeepInfra on Maverick does 600+ TPS (fp8) but another one said OR says DeepInfra runs at 83 TPS with a maximum of 105 TPS.
- The second member clarified that they were referring to the DeepInfra Turbo endpoint.
Providers Section Sparks UI Feedback: A member questioned if the new Providers section was bothering anyone else, mentioning that everything blurs together with the spacing, font size and separation feeling wrong.
- Another member agreed that it looks a bit weird, but thinks it is just because it’s new and unfamiliar.

not much happened today

Mon, Aug 18, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Chutes Capacity, Server Outage

Chutes Capacity Plummets Offline: The Chutes Capacity service experienced an outage, with their servers going offline.
- The team is actively working on restoring the servers and anticipates commencing recovery efforts shortly.
Quick Recovery Anticipated for Chutes Capacity: Engineers are on standby to initiate the recovery process for Chutes Capacity as soon as the servers are back online.
- No estimated time of full service restoration was given.

OpenRouter (Alex Atallah) ▷ #general (638 messages🔥🔥🔥):

DeepSeek outage, Chutes overload, OpenRouter pricing, Alternatives to DeepSeek, BYOK 5% fee

DeepSeek v3 Outage Angers Users: Users report DeepSeek v3 is experiencing frequent internal server errors and rate limits, with some unable to generate outputs even after multiple attempts, with one user saying it’s so slow that it’s genuinely just not generating anything but I’m not getting any error messages.
- Some speculate that Chutes, a primary provider for DeepSeek on OpenRouter, is experiencing issues due to high demand, leading to provider errors and slow performance.
Chutes Overload Blamed for DeepSeek Issues: Several members are reporting that the overload causes 429 errors, suggesting that Chutes is experiencing a bottleneck due to miners not ramping up to meet demand; one member noted that it was completely fine all day until like 30 min ago.
- There’s speculation that Chutes may be intentionally rate-limiting the OpenRouter API key to encourage users to purchase credits directly from them, with one user advising to just burn your credits and never use their service again.
OpenRouter Pricing Debated Amidst Outages: With DeepSeek models barely working, some users are questioning the value of paying for OpenRouter, particularly as they are still getting rate-limited, with users expressing that a 10 USD investment for 1k free messages/day for a free model is no longer a good deal.
- One user suggested that users with only one model in mind should’ve gone directly for the models directly, such as with DeepSeek, which may have automatic caching on their API, and further stating that the 10 USD would have lasted for months anyway.
Free Model Alternatives Sought: Users are recommending alternative free models such as Dolphin 3.0 Mistral 24B and Mistral nemo; the latter being described as super similar to DeepSeek.
- Some users also mentioned Z.AI: GLM 4.5 Air (free), for work related stuff, but needing a prompt; finally one user hopes for Qwen3 235B A22B (free) hosted somewhere.
OpenRouter BYOK comes with 5% Fee: Members discovered that OpenRouter charges a 5% fee even when users bring their own API key (BYOK), leading to a discussion about whether this is a fair practice.
- One user joked Greedy /jor 5% when you bring your own key, with another member responding you’re welcome not to use it lol.

OpenRouter (Alex Atallah) ▷ #discussion (35 messages🔥):

OpenRouter File API Integration, Tool Calling Accuracy Stats, Qwen3 32B Pricing, DeepInfra Turbo Endpoint, New Providers Section UI

OpenRouter should Integrate File API: A member suggested that OpenRouter should figure out how to integrate a files API, noting that the top 3 labs already have this feature.
- No further discussion was made.
Tool Calling Accuracy: More Control Needed: A member shared thoughts on tool calling accuracy stats, arguing that the setup and environment needs to be more controlled for accurate comparison with confidence intervals.
- They added that the apps, tools, and use cases can be vastly different, making it pointless to compare the tool call success rate without more rigor.
Qwen3 32B priced absurdly low: Members noticed low pricing for Qwen3 32B on Chutes at $0.018/$0.072 MTok in/out, same with Mistral Small.
- It was noted that the 32b dense version is cheaper than the moe 30b a3 version, prompting some disappointment about the lack of good providers for 30A3B.
DeepInfra Throughput Claim Discrepancy: A member noted DeepInfra on Maverick does 600+ TPS (fp8) but another one said OR says DeepInfra runs at 83 TPS with a maximum of 105 TPS.
- The second member clarified that they were referring to the DeepInfra Turbo endpoint.
Providers Section Sparks UI Feedback: A member questioned if the new Providers section was bothering anyone else, mentioning that everything blurs together with the spacing, font size and separation feeling wrong.
- Another member agreed that it looks a bit weird, but thinks it is just because it’s new and unfamiliar.

not much happened today

Fri, Aug 15, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Chutes Capacity, Server Outage

Chutes Capacity Plummets Offline: The Chutes Capacity service experienced an outage, with their servers going offline.
- The team is actively working on restoring the servers and anticipates commencing recovery efforts shortly.
Quick Recovery Anticipated for Chutes Capacity: Engineers are on standby to initiate the recovery process for Chutes Capacity as soon as the servers are back online.
- No estimated time of full service restoration was given.

OpenRouter (Alex Atallah) ▷ #general (638 messages🔥🔥🔥):

DeepSeek outage, Chutes overload, OpenRouter pricing, Alternatives to DeepSeek, BYOK 5% fee

DeepSeek v3 Outage Angers Users: Users report DeepSeek v3 is experiencing frequent internal server errors and rate limits, with some unable to generate outputs even after multiple attempts, with one user saying it’s so slow that it’s genuinely just not generating anything but I’m not getting any error messages.
- Some speculate that Chutes, a primary provider for DeepSeek on OpenRouter, is experiencing issues due to high demand, leading to provider errors and slow performance.
Chutes Overload Blamed for DeepSeek Issues: Several members are reporting that the overload causes 429 errors, suggesting that Chutes is experiencing a bottleneck due to miners not ramping up to meet demand; one member noted that it was completely fine all day until like 30 min ago.
- There’s speculation that Chutes may be intentionally rate-limiting the OpenRouter API key to encourage users to purchase credits directly from them, with one user advising to just burn your credits and never use their service again.
OpenRouter Pricing Debated Amidst Outages: With DeepSeek models barely working, some users are questioning the value of paying for OpenRouter, particularly as they are still getting rate-limited, with users expressing that a 10 USD investment for 1k free messages/day for a free model is no longer a good deal.
- One user suggested that users with only one model in mind should’ve gone directly for the models directly, such as with DeepSeek, which may have automatic caching on their API, and further stating that the 10 USD would have lasted for months anyway.
Free Model Alternatives Sought: Users are recommending alternative free models such as Dolphin 3.0 Mistral 24B and Mistral nemo; the latter being described as super similar to DeepSeek.
- Some users also mentioned Z.AI: GLM 4.5 Air (free), for work related stuff, but needing a prompt; finally one user hopes for Qwen3 235B A22B (free) hosted somewhere.
OpenRouter BYOK comes with 5% Fee: Members discovered that OpenRouter charges a 5% fee even when users bring their own API key (BYOK), leading to a discussion about whether this is a fair practice.
- One user joked Greedy /jor 5% when you bring your own key, with another member responding you’re welcome not to use it lol.

OpenRouter (Alex Atallah) ▷ #discussion (35 messages🔥):

OpenRouter File API Integration, Tool Calling Accuracy Stats, Qwen3 32B Pricing, DeepInfra Turbo Endpoint, New Providers Section UI

OpenRouter should Integrate File API: A member suggested that OpenRouter should figure out how to integrate a files API, noting that the top 3 labs already have this feature.
- No further discussion was made.
Tool Calling Accuracy: More Control Needed: A member shared thoughts on tool calling accuracy stats, arguing that the setup and environment needs to be more controlled for accurate comparison with confidence intervals.
- They added that the apps, tools, and use cases can be vastly different, making it pointless to compare the tool call success rate without more rigor.
Qwen3 32B priced absurdly low: Members noticed low pricing for Qwen3 32B on Chutes at $0.018/$0.072 MTok in/out, same with Mistral Small.
- It was noted that the 32b dense version is cheaper than the moe 30b a3 version, prompting some disappointment about the lack of good providers for 30A3B.
DeepInfra Throughput Claim Discrepancy: A member noted DeepInfra on Maverick does 600+ TPS (fp8) but another one said OR says DeepInfra runs at 83 TPS with a maximum of 105 TPS.
- The second member clarified that they were referring to the DeepInfra Turbo endpoint.
Providers Section Sparks UI Feedback: A member questioned if the new Providers section was bothering anyone else, mentioning that everything blurs together with the spacing, font size and separation feeling wrong.
- Another member agreed that it looks a bit weird, but thinks it is just because it’s new and unfamiliar.

Western Open Models get Funding: Cohere $500m @ 6.8B, AI2 gets $152m NSF+NVIDIA grants

Thu, Aug 14, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Self-Serve Refunds, Activity Improvements, Token Usage Breakdown, 3rd party credit usage, Chutes Capacity Offline

Refunds are now self-serve!: Users can now instantly refund accidental credit purchases made within 24 hours for non-crypto purchases, according to this post.
- This feature aims to provide more immediate control over billing errors.
Activity Page gets Supercharged: The activity page now displays token usage broken down by token type and includes 3rd party credit usage, as announced here.
- These improvements offer greater visibility into usage patterns and costs.
Chutes Capacity Offline: Chutes Capacity went offline, but the team is actively working on bringing their servers back online, with recovery expected to start soon.
- Users were informed of the issue and the ongoing efforts to restore service.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Deno or-models tool, OpenRouter model list

Deno Tool Gets Bug Fixes: A member has fixed bugs and cleaned the output of their Deno or-models tool, which is used to inspect the OpenRouter model list.
- The tool features a local 24-hour cache to prevent spamming the API; the tool is available here.
Deno Update Required: To get the latest version of the or-models tool, users need to run a specific command.
- The command is deno run -r -A jsr:@fry69/or-models --version because Deno does not automatically update to the latest version.

OpenRouter (Alex Atallah) ▷ #general (525 messages🔥🔥🔥):

Deepseek v3 issues and outages, Chutes' rate limiting and API key management, Alternatives to Deepseek for roleplaying, Azure credits liquidation strategies, Sonnet 4 pricing inconsistencies

Deepseek V3 Server Gets Deep Fried: Users are reporting widespread issues with Deepseek V3, including internal server errors and rate limiting, particularly affecting those using it for roleplaying on platforms like Janitor AI.
- Many attribute the issues to Chutes, a primary provider for Deepseek, struggling to meet demand and implementing stricter rate limits due to over-gooning.
Chutes Blamed for Deepseek Outages: Many users suspect Chutes of intentionally rate-limiting OpenRouter’s API key to encourage users to purchase credits directly from them, leading to frustration and calls to boycott OpenRouter.
- While paid requests reportedly still work, the situation has sparked debate about the ethics of Chutes’ actions and OpenRouter’s silence on the matter, with some suggesting OpenRouter should find an alternative provider.
Roleplaying AI Models: Deepseek, Mistral, and Llama: When Deepseek is down, users recommend exploring alternative models for roleplaying, such as Mistral and Llama, with some mentioning a free Dolphin3.0 Mistral 24B model as a workable option.
- Others suggest trying Deepseek R1, but there are conflicting reports on whether it’s also experiencing similar issues.
Azure Credit Cash Conversion Conundrum: A user is seeking advice on converting approximately $40,000 USD in Azure credits to cash following their startup’s shutdown, acknowledging the potential issues with selling credits due to liability concerns.
- Suggestions range from selling the credits as AI inference credits to a lighthearted offer of $50 for the entire amount, with a warning about potential risks from the buyer’s actions.
Sonnet 4’s Erratic Pricing Escalates Concerns: Users are reporting inconsistent pricing with the Sonnet 4 endpoint on OpenRouter, experiencing sudden spikes in costs (10x) for calls using the same amount of tokens.
- The community is requesting separate endpoints for Sonnet 4 and the 1M token version to avoid unexpected cost increases.

OpenRouter (Alex Atallah) ▷ #discussion (16 messages🔥):

Self-serve refunds, Chatroom app creation, Qwen Coder via Cerebras, Tool call evals, Model Select performance

Refunds are now self-service!: Users can now self-serve refunds of unused credits within 24 hours of purchase; an announcement is coming soon.
Chatroom app creation on OpenRouter!: Members are discussing new chatroom app creation features on OpenRouter.
Qwen Coder + Cerebras = 🔥: The combination of Qwen Coder and Cerebras is gaining attention, particularly for coding related tasks.
Tool call evals in progress!: OpenRouter is actively working on tool call evals (link to tweet).
Model Selection = Faster!: The model selection process has been improved, resulting in a faster experience to open and better search functionality.

not much happened today

Wed, Aug 13, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

The Last RAG (TLRAG), NoChain Orchestrator, Statelessness & Digital Amnesia, Persistent Identity, Token Costs

Last RAG Solves Stateless LLMs: The user is building The Last RAG (TLRAG), a foundational LLM architecture that solves problems like statelessness and digital amnesia, lacking a genuine persistent identity, massive token costs, the context window war, and expensive fine-tuning cycles.
- TLRAG introduces a persistent, long-term memory system combined with a Dynamic Work Space (DWS), to allow the AI to curate its history and remember key interactions, decisions, and emotional contexts.
TLRAG Gives AI Persistent Identity: TLRAG gives the AI a persistent identity core—the “Heart”, which is a living document shaped by its own synthesized experiences and memories, allowing it to develop a consistent, authentic, and self-aware personality over time.
- It also leverages a “Window Flush” mechanism to assemble a lean, intelligent dossier with only the most relevant short-term and long-term memories, yielding validated cost savings of up to 98% over long conversations compared to standard RAG.
NoChain Orchestrator replaces Frameworks: The NoChain Orchestrator takes the core concepts of TLRAG and makes them robust and reliable for production, replacing complex, unpredictable agent frameworks with a deterministic, server-side control plane.
- It uses hard-coded logic to manage memory, context, and tool use, eliminating the “black box” nature of many agentic systems and delivering predictable, reliable, and testable AI behavior, and more info can be found in their blogpost.
Explore TLRAG and NoChain concepts: The user shared a few links to explore the concepts behind TLRAG and NoChain, including a blog post and a visual pitch deck.
- This highlights the shift from stateless tools to stateful, persistent AI partners that learn and evolve.

OpenRouter (Alex Atallah) ▷ #general (262 messages🔥🔥):

Sonnet update, tool use and structured output with open source models, GPT-5 performance, Gemini 3 as a disappointment, OpenRouter Image resizing

OSS Models Struggle with Tool Use: Members discussed the challenge that many high-end open source models don’t support tool use, structured output, or response format, which are critical for many applications.
- It was noted that while some providers might not support it, the model itself often does, and prompts can be used to enable tool usage, though with potential accuracy tradeoffs.
Debate on GPT-5’s Hallucinations and Performance: Discussion arose around GPT-5, with some praising it as the first model to push SOTA forward while dropping hallucinations and improving alignment, suggesting it’s a step towards AGI.
- Others were more critical, with one member claiming that GPT-5 is the worst and that GPT-4.1 mini was better.
Image Resizing on OpenRouter: Only in Chatroom: A member asked if OpenRouter resizes images on the fly before sending them to the LLM.
- It was clarified that image resizing only happens in the chatroom, and otherwise, the image is passed through without modification.
Navigating GPT-OSS-120B on Cerebras via OpenRouter: A user shared a comprehensive guide on effectively using gpt-oss-120b on Cerebras through OpenRouter, emphasizing that guiding the output through the prompt is key to achieving consistent, schema-clean JSON.
- The guide includes a working configuration, a Python implementation example, and notes on what doesn’t work, such as using the /completions endpoint or setting response_format.
Copilot vs Cline/Kilo with OpenRouter: Members discussed different tools such as Copilot, Cline, Kilo, and Roo to use OpenRouter API.
- It was discussed that Cline/Kilo are better with OR, also that Copilot has options to use OpenRouter, and in their chat tab it’s supposedly less likely to edit code and talks more, but haven’t used it

OpenRouter (Alex Atallah) ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter (Alex Atallah) ▷ #discussion (8 messages🔥):

GPU Rentals, AI TLD issues, Chatroom Caching

GPU Rental Rave: Runpod, Prime Intellect, Modal: A member suggests to rent some GPUs from Runpod, Prime Intellect and Modal to experiment before investing in Macs.
- They linked to a post on X: ArtificialAnlys.
AI TLDs Cause API Endpoint Concern: Members expressed worry about AI companies changing their API endpoint due to AI TLD issues.
- It’s not exactly a generic TLD.
Cache Chat: Explicit Caching Proposed: A member suggests adding a cache button in the chatroom to explicitly cache a message.
- The goal is to let users explicitly cache messages.

OpenAI's IMO Gold model also wins IOI Gold

Mon, Aug 11, 2025

OpenRouter (Alex Atallah) ▷ #general (800 messages🔥🔥🔥):

GPT-5 vs GPT-5 Chat, Gemini 3.0 vs GPT-5, Deepseek Switching to Ascend, Horizon Beta Replacement

GPT-5 Reasoning Debate Erupts: Users debate the difference between GPT-5 and GPT-5 Chat, with some suggesting GPT-5 Chat has less reasoning capabilities and is safer, while others point out that GPT-5 requires a key and GPT-5-chat does not.
- Some suggest using gpt-5-explainer to explain the differences to friends and family, while others find GPT-5 chat to have ZERO reasoning capabilities.
Google Poised to Pounce with Genie 3: Members express that Google is poised to win the AI race, considering it created the transformer and has the infrastructure, budget, and talent to succeed, with Genie 3 touted as crazy cool.
- Some members look forward to Gemini 3.0 wiping the floor with GPT-5, while others point out that Google’s .0 models are not that good.
Deepseek R2 on Ascend is Approaching: A user reported that Deepseek is switching to Ascend and launching R2, which might provide a performance boost for the model.
- Some members express hope that Deepseek will be way better, while others share that past Deepseek models were just too unhinged.
Horizon Beta Replaced by GPT-5 Family: The AI model Horizon Beta has been replaced by GPT-5, with no option to revert, causing disappointment among users who found it useful.
- Some speculate that Horizon was early versions of GPT-5, and that free users will be directed to GPT-5 after they run out of free requests.

OpenRouter (Alex Atallah) ▷ #new-models (2 messages):

“

No significant activity: The channel shows no significant discussion or new model announcements.
- No topics warrant summarization based on the provided message history.
Channel Inactivity: The provided message history for the OpenRouter - New Models channel appears to be empty.
- There are no discussions, links, or announcements to summarize at this time.

OpenRouter (Alex Atallah) ▷ #discussion (23 messages🔥):

GPT-5 BYOK, o3, OpenRouter Trusted Partner, generation_time, moderation_latency

GPT-5 to be BYOK?: A member asked if GPT-5 will always be BYOK-only like o3 on OpenRouter.
OpenRouter’s role as trusted partner: A member congratulated OpenRouter on being one of OpenAI’s most trusted partners for the new series release.
- They mentioned how much of an impact GPT-4 has had on the world and how much Gemini 2.5 has had in the dev sphere, and how cool OR has been to watch as a product.
generation_time’s inclusion of other latencies: A member asked if generation_time includes moderation_latency and/or latency.
- They also asked if latency includes moderation_latency and noted that the OpenRouter API documentation is vague on this.
Gemini has PDF reading issues: Members reported that Gemini is not able to read the PDF files via URL while Sonnet can, even with examples from the OpenRouter multimodal documentation.
Files API troubles: A member expressed the need for OR to figure out Files API, citing that switching between providers when you want to use Files API is a pain.

not much happened today

Fri, Aug 8, 2025

OpenRouter (Alex Atallah) ▷ #general (800 messages🔥🔥🔥):

GPT-5 vs GPT-5 Chat, Gemini 3.0 vs GPT-5, Deepseek Switching to Ascend, Horizon Beta Replacement

GPT-5 Reasoning Debate Erupts: Users debate the difference between GPT-5 and GPT-5 Chat, with some suggesting GPT-5 Chat has less reasoning capabilities and is safer, while others point out that GPT-5 requires a key and GPT-5-chat does not.
- Some suggest using gpt-5-explainer to explain the differences to friends and family, while others find GPT-5 chat to have ZERO reasoning capabilities.
Google Poised to Pounce with Genie 3: Members express that Google is poised to win the AI race, considering it created the transformer and has the infrastructure, budget, and talent to succeed, with Genie 3 touted as crazy cool.
- Some members look forward to Gemini 3.0 wiping the floor with GPT-5, while others point out that Google’s .0 models are not that good.
Deepseek R2 on Ascend is Approaching: A user reported that Deepseek is switching to Ascend and launching R2, which might provide a performance boost for the model.
- Some members express hope that Deepseek will be way better, while others share that past Deepseek models were just too unhinged.
Horizon Beta Replaced by GPT-5 Family: The AI model Horizon Beta has been replaced by GPT-5, with no option to revert, causing disappointment among users who found it useful.
- Some speculate that Horizon was early versions of GPT-5, and that free users will be directed to GPT-5 after they run out of free requests.

OpenRouter (Alex Atallah) ▷ #new-models (2 messages):

“

No significant activity: The channel shows no significant discussion or new model announcements.
- No topics warrant summarization based on the provided message history.
Channel Inactivity: The provided message history for the OpenRouter - New Models channel appears to be empty.
- There are no discussions, links, or announcements to summarize at this time.

OpenRouter (Alex Atallah) ▷ #discussion (23 messages🔥):

GPT-5 BYOK, o3, OpenRouter Trusted Partner, generation_time, moderation_latency

GPT-5 to be BYOK?: A member asked if GPT-5 will always be BYOK-only like o3 on OpenRouter.
OpenRouter’s role as trusted partner: A member congratulated OpenRouter on being one of OpenAI’s most trusted partners for the new series release.
- They mentioned how much of an impact GPT-4 has had on the world and how much Gemini 2.5 has had in the dev sphere, and how cool OR has been to watch as a product.
generation_time’s inclusion of other latencies: A member asked if generation_time includes moderation_latency and/or latency.
- They also asked if latency includes moderation_latency and noted that the OpenRouter API documentation is vague on this.
Gemini has PDF reading issues: Members reported that Gemini is not able to read the PDF files via URL while Sonnet can, even with examples from the OpenRouter multimodal documentation.
Files API troubles: A member expressed the need for OR to figure out Files API, citing that switching between providers when you want to use Files API is a pain.

OpenAI rolls out GPT-5 and GPT-5 Thinking to >1B users worldwide; -mini and -nano help claim Pareto Frontier

Thu, Aug 7, 2025

OpenRouter (Alex Atallah) ▷ #general (254 messages🔥🔥):

GPT-OSS performance woes, Quantization Levels, Qwen3 Coder Removal, DeepSeek structured output

GPT-OSS Model Bashed for Poor Performance: Members in the channel derided the GPT-OSS models, citing that even smaller models are better, with one stating that the 120B model is “dead on arrival” after someone had an initial experience resulting in a “really ugly typo in the headline”.
- One member linked to a Reddit thread summing up the sentiment that it’s a “dud model” and more like a publicity stunt than a useful model.
Provider Routing Allows Custom Quantization Levels: When a user asked how to avoid quantized models, one member pointed out that users can configure quantization levels using the provider routing feature, noting models are excluded if the provider doesn’t meet the quantization level.
- The user then recommended using FP8 to avoid the quantized models, noting that anything under that is “worse than useless”.
Qwen3-Coder:Free Tier Gets the Axe: Multiple users noted that the Qwen3-Coder:Free has been removed and is no longer available through any providers.
- Members lamented the loss and mentioned that they hoped it would return.
DeepSeek’s JSON output support is provider-dependent: Users discussed the inconsistent support for structured output (JSON) with DeepSeek-r1, noting that while it’s supported on their own API, it may vary on OpenRouter depending on the provider.
- One member linked to a Reddit thread and a filtered view of OpenRouter models that support structured outputs, with most agreeing that this is provider specific.
SDK Downgrade Fixes Reasoning Issue: A user experienced that reasoning tokens were being duplicated when using the GPT-OSS model, which was resolved by downgrading the SDK from version 0.7.3 to 0.6.0.
- A team member confirmed that the fix is in the main branch, and linked to the pull request stating it has not been cut into a release yet, and that they will release the fix soon.

OpenRouter (Alex Atallah) ▷ #discussion (29 messages🔥):

20 Questions Benchmark, GPT-OSS Hallucinations, OpenRouter Provider Sanity Checks, Harmony Format and Identity, Tool Use Validation

20 Questions Benchmark hits Kaggle: A member developed a 20 Questions benchmark and found a similar competition on Kaggle, although the Kaggle competition was for custom agents.
- Their 2.5 Pro agent achieved 8/20 words on their benchmark.
GPT-OSS Under Fire For Hallucinations: GPT-OSS is reported to be prone to hallucination, making it a potentially unsuitable choice for certain applications.
- A member suggested that GPT-4.1 is a much safer choice, especially with prompt/context engineering.
OpenRouter Mulls Provider Sanity Checks: There is a suggestion for OpenRouter to implement sanity checks or smoke tests for all providers, focusing on formatting and tool call evaluation.
- Providers failing the test could be temporarily removed from the serving pool, and there is acknowledgement that current checks are relatively simple but more thorough solutions are in progress.
Harmony Format and Identity Face Scrutiny: A member inquired about how system and developer messages are treated via the OpenRouter API, specifically whether they are interpreted as developer messages or model_identity for gpt-oss.
- They linked to a Discord message regarding the topic of harmony format, identity vs system / developer message (discord.com).
Tool Use Validation Under Development: Automatic validation of tool-use, distinguishing between good and bad implementations, is under development as a better solution.
- This is related to a tweet (x.com) discussing the same topic.

not much happened today

Wed, Aug 6, 2025

OpenRouter (Alex Atallah) ▷ #general (254 messages🔥🔥):

GPT-OSS performance woes, Quantization Levels, Qwen3 Coder Removal, DeepSeek structured output

GPT-OSS Model Bashed for Poor Performance: Members in the channel derided the GPT-OSS models, citing that even smaller models are better, with one stating that the 120B model is “dead on arrival” after someone had an initial experience resulting in a “really ugly typo in the headline”.
- One member linked to a Reddit thread summing up the sentiment that it’s a “dud model” and more like a publicity stunt than a useful model.
Provider Routing Allows Custom Quantization Levels: When a user asked how to avoid quantized models, one member pointed out that users can configure quantization levels using the provider routing feature, noting models are excluded if the provider doesn’t meet the quantization level.
- The user then recommended using FP8 to avoid the quantized models, noting that anything under that is “worse than useless”.
Qwen3-Coder:Free Tier Gets the Axe: Multiple users noted that the Qwen3-Coder:Free has been removed and is no longer available through any providers.
- Members lamented the loss and mentioned that they hoped it would return.
DeepSeek’s JSON output support is provider-dependent: Users discussed the inconsistent support for structured output (JSON) with DeepSeek-r1, noting that while it’s supported on their own API, it may vary on OpenRouter depending on the provider.
- One member linked to a Reddit thread and a filtered view of OpenRouter models that support structured outputs, with most agreeing that this is provider specific.
SDK Downgrade Fixes Reasoning Issue: A user experienced that reasoning tokens were being duplicated when using the GPT-OSS model, which was resolved by downgrading the SDK from version 0.7.3 to 0.6.0.
- A team member confirmed that the fix is in the main branch, and linked to the pull request stating it has not been cut into a release yet, and that they will release the fix soon.

OpenRouter (Alex Atallah) ▷ #discussion (29 messages🔥):

20 Questions Benchmark, GPT-OSS Hallucinations, OpenRouter Provider Sanity Checks, Harmony Format and Identity, Tool Use Validation

20 Questions Benchmark hits Kaggle: A member developed a 20 Questions benchmark and found a similar competition on Kaggle, although the Kaggle competition was for custom agents.
- Their 2.5 Pro agent achieved 8/20 words on their benchmark.
GPT-OSS Under Fire For Hallucinations: GPT-OSS is reported to be prone to hallucination, making it a potentially unsuitable choice for certain applications.
- A member suggested that GPT-4.1 is a much safer choice, especially with prompt/context engineering.
OpenRouter Mulls Provider Sanity Checks: There is a suggestion for OpenRouter to implement sanity checks or smoke tests for all providers, focusing on formatting and tool call evaluation.
- Providers failing the test could be temporarily removed from the serving pool, and there is acknowledgement that current checks are relatively simple but more thorough solutions are in progress.
Harmony Format and Identity Face Scrutiny: A member inquired about how system and developer messages are treated via the OpenRouter API, specifically whether they are interpreted as developer messages or model_identity for gpt-oss.
- They linked to a Discord message regarding the topic of harmony format, identity vs system / developer message (discord.com).
Tool Use Validation Under Development: Automatic validation of tool-use, distinguishing between good and bad implementations, is under development as a better solution.
- This is related to a tweet (x.com) discussing the same topic.

OpenAI's gpt-oss 20B and 120B, Claude Opus 4.1, DeepMind Genie 3

Tue, Aug 5, 2025

OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

Anthropic Opus 4.1, OpenAI returns to Open Source, GPT-OSS models

New Anthropic Opus 4.1 takes the Crown: The latest model, Anthropic Opus 4.1, is now live and tops the charts in SWE Bench, the leading coding benchmark, as announced on X.
- It can be accessed here.
OpenAI’s GPT-OSS Models Debut: OpenAI is returning to open source with the launch of gpt-oss, new open-weight models with variable reasoning, with OpenRouter as a launch partner, as announced on X.
- Two models are available: gpt-oss-120b at $0.15/M input tokens and $0.60/M output tokens, and gpt-oss-20b at $0.05/M input tokens and $0.20/M output tokens.
GPT-OSS-20B price correction: A user pointed out that the prices were mislabeled.
- The listing was updated to reflect the prices on the 20b model.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

gardasio: ChatGPT.com https://x.com/Gardasio/status/1952501913586442541

OpenRouter (Alex Atallah) ▷ #general (251 messages🔥🔥):

Model vs Models Prioritization, Gemini video understanding, Claude providers caching, Qwen-image model, GPTs agents training

Model vs Models Prioritization Issue Fixed: A member reported that the issue with model vs models prioritization has been fixed via a quick fix that only uses model or models but not both, as per this link.
OpenRouter Not Stripping Cache Params?: Some members reported that Claude providers don’t support caching, and they expect OpenRouter to strip the cache params automatically.
- One member specified that *“a few Claude providers don’t support caching, azure and google, I’ve blacklisted them in settings, but I expected openrouter to just strip the cache params automatically.”
Qwen-Image Model incoming?: A member inquired about the possibility of getting the Qwen-image model featured on OpenRouter, pointing to the Qwen Image blogpost.
Free Gemini V3 model questions: Members clarified that with a $10 investment, OpenRouter grants 1000 free messages a day for the free V3 model.
- Others warned that although OpenRouter may allow 1000 requests a day, Chutes may not and that retrying requests is always an option.
Fal.ai for Image Generation?: A member asked about a good ‘Openrouter for images’ single endpoint for image gen? and other members recommended Fal.ai and local generation using ComfyUI.
- Fal.ai was praised for having a full API and well-structured data specs, and that a single 3060 is enough for local generation.

OpenRouter (Alex Atallah) ▷ #new-models (3 messages):

“

No new models news to report: There were no substantial discussion points or links shared regarding new models in the OpenRouter Discord channel.
Silence on the New Models Front: The ‘new-models’ channel appears to be quiet, with no specific updates, discussions, or links shared during the specified timeframe.

OpenRouter (Alex Atallah) ▷ #discussion (30 messages🔥):

LLM Emotional Understanding Benchmarks, EQ Benchmark, Gemma 3 27b, OCR Engine Comparisons, Sonnet Self-Moderation

Gemma 3 Shows Emotional Intelligence: Members discussed benchmarks for LLMs in understanding human emotions, with one suggesting the EQ benchmark by EQbench.
- It was also suggested that Gemma 3 27b is a good model for understanding emotions, while cautioning against using DeepSeek R1 for that purpose.
Users React to Sonnet Self-Moderation: Users discuss the recent changes in Sonnet self-moderation, and whether a self-moderated option will remain available.
- One user questioned whether LlamaGuard will still be applied to Anthropic endpoints, suggesting that it’s an industry-leading approach to protecting users, while another said they could no longer trigger it but haven’t tried very hard.
OCR Engine Preferences Debate: A user expressed dissatisfaction with a particular OCR engine and stated they find them shit and prefer OLMo.
- Another user asked why and another user explained that OLMo is cheaper and better on almost all benchmarks.
DeepInfra Price Hike Angers Users: A user expressed frustration with DeepInfra for raising prices.
- Another user confirmed that DeepInfra will cost about $1/1000 tokens, equivalent to the Mistral API, while OpenRouter is $2/1000, suggesting OpenRouter could adjust its pricing.
GPT-OSS-120B Ready to Launch: A provider has deployed gpt-oss-120b with 65K context ready to go on launch.
- It was noted that the moderation is applied on Anthropic and Bedrock providers, while vertex is unfiltered as per an attached image.

Qwen-Image: SOTA text rendering + 4o-imagegen-level Editing Open Weights MMDiT

Mon, Aug 4, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (11 messages🔥):

PyrenzAI launch, Personality.gg, OpenRouter PKCE, PyrenzAI feedback

Personality.gg enables Roleplay via OpenRouter: Personality.gg launched a roleplay site using OpenRouter for most models, providing access to all 400 models through OpenRouter PKCE (Proof Key for Code Exchange) completely free/cheap.
PyrenzAI Launches a Free AI Chat Website: A developer announced the launch of PyrenzAI, an AI chat website with a clean UI, models, a memory system and free RAG (Retrieval-Augmented Generation) for all tiers, using OpenRouter as the main AI generation backend.
PyrenzAI app faces speed and security critiques: A user critiqued the newly launched PyrenzAI app, noting it’s cooked in terms of both speed and security, with laggy performance and excessive fetching of user preferences (over 200+ times on every load).
UI and UX lauded on PyrenzAI release: A member complimented the UI/UX of PyrenzAI, appreciating its unique look and style, and distinctive sidebar design compared to other apps.

OpenRouter (Alex Atallah) ▷ #general (242 messages🔥🔥):

API Errors, Deepseek r1, Free Models, Horizon Alpha, API Key credit limit

API Errors plague OpenRouter Users: Some users reported experiencing API errors when trying to use models via the OpenRouter API, including no endpoint found errors and other issues.
- A member suggested checking the model ID prefix and the base URL for potential misconfiguration.
Deepseek v3 Outage Strikes Users: Users reported issues with the Deepseek v3 0324 free model, including internal errors, empty responses, and timeouts.
- One member noted that switching to the paid version of the model resolved the issues, suggesting the free version was overloaded: free is completely overloaded. paid has none of these issue, and the actual content quality is better.
Free Model Limits Frustrate OpenRouter Users: Several users inquired about free models with higher message limits, with one user asking if there was any free model that wont stop at 50 messages?
- Members clarified that topping up with $10 provides a 1000 requests/day limit and referenced OpenRouter documentation detailing the limits.
Horizon Alpha Raves Gain Momentum: Users discussed the Horizon Alpha model, with some reporting that it was reasoning effectively and offering good performance.
- The model itself reported that it was developed by OpenAI, though other members clarified that it was likely a distilled model.
Budget Overruns Baffle API users: A user reported being charged significantly over their API key credit limit, suspecting that running API calls in parallel with Python threads might be the cause.
- Other users shared similar experiences, suggesting that the credit limit updates might not be real-time, leading to occasional overcharges.

OpenRouter (Alex Atallah) ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter (Alex Atallah) ▷ #discussion (23 messages🔥):

Groq OpenBench, Provider Benchmarks, GPQA Evals, Inspect.ai, Prompt Caching for Kimi K2 and GLM 4.5

OpenBench Groqs for Provider Benchmarks: Members discussed the Groq OpenBench repository and how many times it has been posted regarding provider benchmarks.
- One member mentioned they are already working on evals (recently got prioritized), such as GPQA per provider, and expanding to other things.
Inspect.ai Discovery Praised: A member expressed happiness in discovering inspect.ai through the OpenBench link, noting it’s exactly what I’ve been looking for.
- This same user noted concerns about the chat UI using their full name from their account without control over it, leading to potential doxxing.
Prompt Caching Questioned for Kimi K2 and GLM 4.5: A user inquired whether OpenRouter supports prompt caching for Kimi K2 and GLM 4.5, noting that Moonshot’s platform directly supports it.
- They stated it somewhat looks like it on z.ai.
Bypassing 20MB Limit: Bigger PDFs are now sendable: Members questioned whether new feature would bypass the 20MB limit, and they mentioned that they recently added a way to send bigger pdfs.
- The new limit is the upstream provider limit.

Gemini 2.5 Deep Think finally ships

Fri, Aug 1, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (11 messages🔥):

PyrenzAI launch, Personality.gg, OpenRouter PKCE, PyrenzAI feedback

Personality.gg enables Roleplay via OpenRouter: Personality.gg launched a roleplay site using OpenRouter for most models, providing access to all 400 models through OpenRouter PKCE (Proof Key for Code Exchange) completely free/cheap.
PyrenzAI Launches a Free AI Chat Website: A developer announced the launch of PyrenzAI, an AI chat website with a clean UI, models, a memory system and free RAG (Retrieval-Augmented Generation) for all tiers, using OpenRouter as the main AI generation backend.
PyrenzAI app faces speed and security critiques: A user critiqued the newly launched PyrenzAI app, noting it’s cooked in terms of both speed and security, with laggy performance and excessive fetching of user preferences (over 200+ times on every load).
UI and UX lauded on PyrenzAI release: A member complimented the UI/UX of PyrenzAI, appreciating its unique look and style, and distinctive sidebar design compared to other apps.

OpenRouter (Alex Atallah) ▷ #general (242 messages🔥🔥):

API Errors, Deepseek r1, Free Models, Horizon Alpha, API Key credit limit

API Errors plague OpenRouter Users: Some users reported experiencing API errors when trying to use models via the OpenRouter API, including no endpoint found errors and other issues.
- A member suggested checking the model ID prefix and the base URL for potential misconfiguration.
Deepseek v3 Outage Strikes Users: Users reported issues with the Deepseek v3 0324 free model, including internal errors, empty responses, and timeouts.
- One member noted that switching to the paid version of the model resolved the issues, suggesting the free version was overloaded: free is completely overloaded. paid has none of these issue, and the actual content quality is better.
Free Model Limits Frustrate OpenRouter Users: Several users inquired about free models with higher message limits, with one user asking if there was any free model that wont stop at 50 messages?
- Members clarified that topping up with $10 provides a 1000 requests/day limit and referenced OpenRouter documentation detailing the limits.
Horizon Alpha Raves Gain Momentum: Users discussed the Horizon Alpha model, with some reporting that it was reasoning effectively and offering good performance.
- The model itself reported that it was developed by OpenAI, though other members clarified that it was likely a distilled model.
Budget Overruns Baffle API users: A user reported being charged significantly over their API key credit limit, suspecting that running API calls in parallel with Python threads might be the cause.
- Other users shared similar experiences, suggesting that the credit limit updates might not be real-time, leading to occasional overcharges.

OpenRouter (Alex Atallah) ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter (Alex Atallah) ▷ #discussion (23 messages🔥):

Groq OpenBench, Provider Benchmarks, GPQA Evals, Inspect.ai, Prompt Caching for Kimi K2 and GLM 4.5

OpenBench Groqs for Provider Benchmarks: Members discussed the Groq OpenBench repository and how many times it has been posted regarding provider benchmarks.
- One member mentioned they are already working on evals (recently got prioritized), such as GPQA per provider, and expanding to other things.
Inspect.ai Discovery Praised: A member expressed happiness in discovering inspect.ai through the OpenBench link, noting it’s exactly what I’ve been looking for.
- This same user noted concerns about the chat UI using their full name from their account without control over it, leading to potential doxxing.
Prompt Caching Questioned for Kimi K2 and GLM 4.5: A user inquired whether OpenRouter supports prompt caching for Kimi K2 and GLM 4.5, noting that Moonshot’s platform directly supports it.
- They stated it somewhat looks like it on z.ai.
Bypassing 20MB Limit: Bigger PDFs are now sendable: Members questioned whether new feature would bypass the 20MB limit, and they mentioned that they recently added a way to send bigger pdfs.
- The new limit is the upstream provider limit.

Figma's $50+b IPO

Thu, Jul 31, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

DeepTrail, DeepSecure, AI agent authorization, Agent delegation, Policy enforcement

DeepTrail builds open-source DeepSecure: A member is building DeepTrail, an open source auth and delegation layer for AI agents backed by Berkeley SkyDeck.
- With Deepsecure (https://github.com/DeepTrail/deepsecure), developers can integrate authorization, agent-to-agent delegation, policy enforcement, and secure proxying across any model, platform, or framework with just a few lines of code.
DeepSecure’s technicals under the hood: The technology involves a split-key architecture, gateway/proxy, separate control/data plane, policy engine, and macaroons for agent-agent delegation, detailed in the technical overview.
- There are also several simple examples and integrations for Langchain/LangGraph.
DeepSecure examples using Langchain/LangGraph: The member has built some examples of DeepSecure integrations for Langchain/LangGraph, including secure multi-agent workflows with fine-grained access controls.
- The repo also features delegation workflows, advanced delegation patterns, and platform agent bootstrapping.

OpenRouter (Alex Atallah) ▷ #general (152 messages🔥🔥):

NotebookLLM, OpenRouter Pricing, Blocking quants via API, Becoming a provider, API Key Issues

Unlock Daily Free Messages with a One-Time OR Top-Up: Adding $10 to OpenRouter credits unlocks 1000 daily free messages as a one-time purchase, even if the credits are used up.
- Users confirmed that even after the initial $10 of credits are exhausted, the 1000 requests/day limit remains unlocked.
API Enables Quantization Control: Users can now specify acceptable quantization levels via the API to avoid lower-precision models like FP4, using the provider routing documentation.
- The API allows specifying exclusions, like allowing everything except FP4 models.
Pydantic-AI and Kimi-K2 Join Forces to Tackle Bugs: A user highlighted the benefits of pydantic-ai, including its fully pydantic approach, MCP server support, model/provider adapters, and automated graph crafting and noted they had fixed a bug using Kimi-K2.
- The user emphasized pydantic-ai’s ability to focus on business logic rather than fiddling with gluing together an agentic framework from disparate bloatparty repos.
OR Faces Kimi-K2 Tool Calling Issue: A user believes they’ve identified an issue with Kimi K2’s tool calling support on OpenRouter, potentially fixable via model template adjustments.
- The user provided research with examples from frameworks like vllm and expressed that fixing this issue could lead to 80% savings for their business, and suggested that they will move to moonshot.
Gemini Flash 1.5 Faces Overload Issues: Google Gemini Flash 1.5 is reportedly showing error 503: The model is overloaded, with a user sharing the pricing structure.
- The model has a fluctuating pricing, with input ranging from $0.075 to $0.15 and output from $0.30 to $0.60.

OpenRouter (Alex Atallah) ▷ #new-models (2 messages):

“

No new models updates in OpenRouter: There were no significant discussions or updates regarding new models in the OpenRouter channel.
- The channel remained inactive, lacking any substantive information for summarization.
Readybot.io logs no new activity: The Readybot.io logs indicate a period of silence in the OpenRouter - New Models channel.
- Consequently, there are no specific topics or discussions to report from this time.

OpenRouter (Alex Atallah) ▷ #discussion (63 messages🔥🔥):

Quantized Providers, Groq's Quantization, Deepinfra Pricing, Vertex for Claude, OpenRouter's Ori bot

Quantized Providers’ Default Status Debated: A user suggested that quantized providers should be disabled by default, which could affect Groq due to its unique quantization approach.
- Another user cautioned about the risks of publicly shaming providers before reaching a critical mass of users, potentially causing providers to exit OpenRouter.
Deepinfra Offers Gemini 2.5 Pro via Google: DeepInfra reportedly negotiated a lower rate with Google for Gemini 2.5 Pro and passed the savings onto customers, confirmed by a user who cited a message from DeathMax and the presence of a ‘partner’ tag on DeepInfra’s listing.
- DeepInfra’s Gemini 2.5 Pro has a “partner” tag unlike the Kimi K2 model, indicating a direct partnership with Google.
Vertex proves Victorious for Claude 4: One user reported better quality, throughput, and uptime by using Vertex for Claude 4 Sonnet.
- The user also noted that AWS/GCP/Azure mirrors for closed models could provide qualitative differences.
OpenRouter Ori Bot’s Accuracy Under Scrutiny: A user suggested that OpenRouter’s Ori bot might be a net negative due to inaccurate responses and should be limited or disabled.
- The user pointed out that Ori often puts the fault on the user and asks questions that lead nowhere, especially in payment processing issues.
Adding Knowledge Update Feature for Ori bot: One of the developers is working to add ways to update Ori’s knowledge when it gets things wrong.
- Others pointed out that Ori is missing a lot of knowledge and is hallucinating incorrect knowledge, and suggested limiting the bot’s answers to a maximum of 2-3.

not much happened today

Wed, Jul 30, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

DeepTrail, DeepSecure, AI agent authorization, Agent delegation, Policy enforcement

DeepTrail builds open-source DeepSecure: A member is building DeepTrail, an open source auth and delegation layer for AI agents backed by Berkeley SkyDeck.
- With Deepsecure (https://github.com/DeepTrail/deepsecure), developers can integrate authorization, agent-to-agent delegation, policy enforcement, and secure proxying across any model, platform, or framework with just a few lines of code.
DeepSecure’s technicals under the hood: The technology involves a split-key architecture, gateway/proxy, separate control/data plane, policy engine, and macaroons for agent-agent delegation, detailed in the technical overview.
- There are also several simple examples and integrations for Langchain/LangGraph.
DeepSecure examples using Langchain/LangGraph: The member has built some examples of DeepSecure integrations for Langchain/LangGraph, including secure multi-agent workflows with fine-grained access controls.
- The repo also features delegation workflows, advanced delegation patterns, and platform agent bootstrapping.

OpenRouter (Alex Atallah) ▷ #general (152 messages🔥🔥):

NotebookLLM, OpenRouter Pricing, Blocking quants via API, Becoming a provider, API Key Issues

Unlock Daily Free Messages with a One-Time OR Top-Up: Adding $10 to OpenRouter credits unlocks 1000 daily free messages as a one-time purchase, even if the credits are used up.
- Users confirmed that even after the initial $10 of credits are exhausted, the 1000 requests/day limit remains unlocked.
API Enables Quantization Control: Users can now specify acceptable quantization levels via the API to avoid lower-precision models like FP4, using the provider routing documentation.
- The API allows specifying exclusions, like allowing everything except FP4 models.
Pydantic-AI and Kimi-K2 Join Forces to Tackle Bugs: A user highlighted the benefits of pydantic-ai, including its fully pydantic approach, MCP server support, model/provider adapters, and automated graph crafting and noted they had fixed a bug using Kimi-K2.
- The user emphasized pydantic-ai’s ability to focus on business logic rather than fiddling with gluing together an agentic framework from disparate bloatparty repos.
OR Faces Kimi-K2 Tool Calling Issue: A user believes they’ve identified an issue with Kimi K2’s tool calling support on OpenRouter, potentially fixable via model template adjustments.
- The user provided research with examples from frameworks like vllm and expressed that fixing this issue could lead to 80% savings for their business, and suggested that they will move to moonshot.
Gemini Flash 1.5 Faces Overload Issues: Google Gemini Flash 1.5 is reportedly showing error 503: The model is overloaded, with a user sharing the pricing structure.
- The model has a fluctuating pricing, with input ranging from $0.075 to $0.15 and output from $0.30 to $0.60.

OpenRouter (Alex Atallah) ▷ #new-models (2 messages):

“

No new models updates in OpenRouter: There were no significant discussions or updates regarding new models in the OpenRouter channel.
- The channel remained inactive, lacking any substantive information for summarization.
Readybot.io logs no new activity: The Readybot.io logs indicate a period of silence in the OpenRouter - New Models channel.
- Consequently, there are no specific topics or discussions to report from this time.

OpenRouter (Alex Atallah) ▷ #discussion (63 messages🔥🔥):

Quantized Providers, Groq's Quantization, Deepinfra Pricing, Vertex for Claude, OpenRouter's Ori bot

Quantized Providers’ Default Status Debated: A user suggested that quantized providers should be disabled by default, which could affect Groq due to its unique quantization approach.
- Another user cautioned about the risks of publicly shaming providers before reaching a critical mass of users, potentially causing providers to exit OpenRouter.
Deepinfra Offers Gemini 2.5 Pro via Google: DeepInfra reportedly negotiated a lower rate with Google for Gemini 2.5 Pro and passed the savings onto customers, confirmed by a user who cited a message from DeathMax and the presence of a ‘partner’ tag on DeepInfra’s listing.
- DeepInfra’s Gemini 2.5 Pro has a “partner” tag unlike the Kimi K2 model, indicating a direct partnership with Google.
Vertex proves Victorious for Claude 4: One user reported better quality, throughput, and uptime by using Vertex for Claude 4 Sonnet.
- The user also noted that AWS/GCP/Azure mirrors for closed models could provide qualitative differences.
OpenRouter Ori Bot’s Accuracy Under Scrutiny: A user suggested that OpenRouter’s Ori bot might be a net negative due to inaccurate responses and should be limited or disabled.
- The user pointed out that Ori often puts the fault on the user and asks questions that lead nowhere, especially in payment processing issues.
Adding Knowledge Update Feature for Ori bot: One of the developers is working to add ways to update Ori’s knowledge when it gets things wrong.
- Others pointed out that Ori is missing a lot of knowledge and is hallucinating incorrect knowledge, and suggested limiting the bot’s answers to a maximum of 2-3.

not much happened today

Tue, Jul 29, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (11 messages🔥):

AgentSmith launch, OpenRouter integration, Agent templates

AgentSmith Launches as Open Source Prompt CMS: A member announced the launch of AgentSmith, an open-source prompt CMS built on top of OpenRouter (GitHub) to streamline prompt/context engineering.
- Other members praised the landing page as :chefkiss:
AgentSmith Integrates with OpenRouter Account: AgentSmith connects to your OpenRouter account, utilizing your credits, with the option for self-hosting.
- A user jokingly mentioned that they were getting inspiration from the project, but not giving credits.
Agent Templates Proposed for Specific Clients: A user suggested adding templates for specific clients, referencing Claude Code’s YAML header format (docs).
- The creator responded he’d have to do some digging to see what other clients do for this.

OpenRouter (Alex Atallah) ▷ #general (347 messages🔥🔥):

GLM vs Kimi pricing, Model Settings being deleted randomly, 401 error with Deepseek, Qwen3 as architect, GPT 4.1 web search issues

GLM Pricier than Kimi: Despite being exciting, GLM is more expensive than Kimi due to its long reasoning capabilities.
- Users discussed the merits of Qwen3 for architecture tasks and expressed gratitude for the advancements in open-source models.
Deepseek V3 error 401 surfaces: Users reported experiencing error 401 with the Deepseek model, with suggestions pointing to potential API key issues.
- Others mentioned ongoing issues with Deepseek V3, including “All providers ignored” errors and temporary outages from Chutes.
OpenRouter’s free requests: OpenRouter’s free usage limits are up to 20 requests per minute for free models, with daily limits of 50 or 1000 requests depending on credit purchases as per the documentation.
- Members also shared the activity page link for checking their activity.
DeepSeek’s efficiency on H800s: Deepseek showcased their setup with ~2200 H800s, achieving 700B+ input and 168B output in 24hrs, demonstrating efficient hosting capabilities.
- Others raised concerns about the capacity of Groq’s LPUs compared to GPUs.
Prompt Engineering Saves the Day?: Users discussed methods for preventing unwanted behaviors in language models, such as repetitive sentence structures in Deepseek V3, suggesting prompt adjustments and negative prompts (“never wrap up the scene”).
- One member linked to a Reddit thread with potential solutions.

OpenRouter (Alex Atallah) ▷ #discussion (9 messages🔥):

OpenRouter PR, Model Quality Transparency, Standard Lane Routing, DeepSeek Model Complaints

OpenRouter’s PR Nudged for Normie Newbies: Some members suggested that OpenRouter’s PR might suffer because new users may not know they can change providers, citing complaints about DeepSeek models in chutes.
- It’s like going to a restaurant and ordering the first thing you see rather than asking the waiter what to order or what the restaurant is known for.*
Model Pricing Doesn’t Reflect Quantization Quality: A member noted that OpenRouter’s reputation already suffers by favoring the cheapest models, often blaming DeepInfra quants for poor performance.
- Many users don’t know what quant is, or even that OR doesn’t host the models themselves, and assume best-case quality when they see a price for a model.
Determinism is Difficult and Expensive: External pressure exists that leads to a race to the bottom, both for the providers (find cheaper inference) and for OR (find cheaper providers).
- Requiring 100% determinism is difficult - why bother spending effort and compute on 100% determinism if there’s no demand for it? Especially since big-name providers like OpenAI and Anthropic don’t have deterministic outputs.
Standard Lane Routing Balances Quality Factors: OpenRouter is actively working on what they call their standard lane routing which right now sorts purely by price, but they wanna consider other factors like throughput, latency, objective data on things like tool call success rates, possibly quantization, essentially reaching more towards which provider offers the best version of a model.
- They are kind of trying to define what best means here through a variety of factors rather than just here’s the cheapest version.
Quality Preset Requested for End Users: One member requested a best quality option/preset for end users that doesn’t want to think too much about these issues and has not the time to check each provider regularly.
- No one from the OpenRouter team responded to this request.

GLM-4.5: Deeper, Headier, & better than Kimi/Qwen/DeepSeek (SOTA China LLM?)

Mon, Jul 28, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

toven: Chutes and Targon are experiencing downtime. Users are reporting a spike in 502s

OpenRouter (Alex Atallah) ▷ #app-showcase (31 messages🔥):

Ramparts security scanner, Model Context Protocol (MCP), Tool interface vulnerabilities, t3.chat sync DB, Cloudflare R2 storage

Ramparts Open Sourced for MCP Security Scanning!: Javelin AI open-sourced Ramparts, a security scanner for the Model Context Protocol (MCP), designed to identify vulnerabilities in LLM agent tool interfaces, including path traversal and command/SQL injection.
- Ramparts scans MCP servers, enumerates capabilities, and flags higher-order abuse paths; the repo is available on GitHub and the launch blog is here.
t3.chat Boasts Superior Sync DB!: A member commented that yourchat.pro (3s load time) is shittier than shitterrible, no where near the t3.chat quality, noting t3.chat’s sync DB is pretty fire (0.3s load time).
- Another member agreed, stating i dont think he understands how complicated t3.chat kinda is.
Cloudflare R2 wins storage speed race!: A member mentioned they use Cloudflare R2 storage after moving away from a self-hosted CDN due to latency issues (2x latency or higher).
- When asked about which model was used, the member mentioned theres 42 models that were being used.
Kimi K2 Fact Checks Claude Code: A member mentioned that Kimi K2 is useful inside Claude Code to fact-check Claude’s plans and linked to Consult Kimi K2 inside Claude Code.
- They also mentioned they did work with the new subagents to make it easier to use inside Claude Code.

OpenRouter (Alex Atallah) ▷ #general (1031 messages🔥🔥🔥):

OpenRouter Rate Limits, NSFW Content with Bots, Alternative Models to Deepseek, Slenderman and Creepypasta Bots, Payment Issues on OpenRouter

Unveiling the Long-Awaited OpenRouter Rate Limits: A user inquired about rate limits on OpenRouter, and another member clarified that there are virtually no rate limits as long as you have funds, except for Cloudflare’s DDoS protection, with the main limitation stemming from the provider’s end.
- Discussions also touched on strategies for managing free daily requests, with the consensus being that if you hit a rate limit on free models, it’s due to high demand and provider capacity, recommending switching to a paid version or trying a different model.
Sexing Bots: The NSFW Truth: A user jokingly suggested that using models for creative tasks and writing is a sanitized version of engaging in sexual interactions with bots.
- Another countered, emphasizing the need for models that require non-slop and non-no-filter content, steering the conversation towards high-quality creative outputs rather than explicit content.
Exploring Alternative Models to Deepseek’s Excellence: With Deepseek being a favorite for its long-term context and detailed character descriptions, users sought alternatives due to its current downtime.
- Recommendations included Qwen3, Claude, and various others with users also warning to AVOID THE MODEL GIVING ACHIEVEMENTS OR BAD ENDINGS TO STORIES.
Creepypasta Reborn: Slenderman Chatbot Mania: A user requested recommendations for creepypasta bots, sparking a conversation about favorite bots like the Slender Mansion and discussions about the horror genre’s resurgence.
- The group also went into discussing the ethics of Cannibalism with the chatbots.
Navigating Payment Predicaments on OpenRouter: Several users reported issues with adding credits, with debit card payments being declined and Amazon Pay not working, with most of the dev team being offline till morning.
- Some suggested the issue might be region-specific (e.g., UK), while others confirmed similar problems with US-based cards, and that the workaround is by creating a new OpenRouter API key for refreshing rate limits.

OpenRouter (Alex Atallah) ▷ #new-models (2 messages):

“

No new models discussed: There were no new models discussed in the provided messages.
- The channel activity consisted only of reminders from Readybot.
Readybot keeps watch: Readybot.io provided reminders that this is the OpenRouter - New Models channel.
- The bot did not provide any discussion or links, merely a channel label.

OpenRouter (Alex Atallah) ▷ #discussion (35 messages🔥):

Wandb Inference vs OpenRouter, Compute Exchange for Spare GPUs, Payment Processing Complaints, Power User Tool for OR APIs, Chutes Pricing and Reliability

Wandb Inference Joins the Ring: Members discussed whether Wandb Inference is a competitor to OpenRouter, but it was noted that it’s just another GPU (Coreweave) wrapper and OpenRouter has a ton of providers they could onboard eventually.
- A member estimated there may be 30+ providers.
OpenRouter Considering GPU Exchange: There was discussion of OpenRouter spinning up a compute exchange where groups with spare compute can put it to work, where OpenRouter provides the demand and a one click install image.
- It was compared to Bittensor, but one member noted that Bittensor has the user base and is crypto based.
OpenRouter Grapples with Payment Problems: Some users reported problems with payment processing.
- Example links were provided of the issues.
New Power User Tool: A user has created a prototype/alpha, a power user tool for browsing/aggregation/filtering of the OR APIs (link to the tool).
- The tool is designed to help quickly understand what people are talking about in the OpenRouter channels and surface more info for the inference connoisseur.
Free Models Won’t Hide: A user reported that they could not hide free models in the OpenRouter UI.
- They further noted that using chutes pricing for a model seems like not the best decision, especially since chutes cannot verify if all nodes run R1 and at the same quant.

not much happened today

Fri, Jul 25, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

Personality.gg, AI Translation, Slang translation, Contextual understanding

Personality.gg Transcends Translation: Personality.gg offers multiple ways of translating and features an auto-translator capable of discerning the language of origin, determining if a message is in English or another language.
- Leveraging AI, it adeptly handles slang and nuances, avoiding the pitfalls of literal translations.
Pro Version Promises Precise Prose: The Pro version will incorporate enhanced context understanding by analyzing the surrounding chat to refine AI interpretations.
- The author is looking for more suggestions on things to add.

OpenRouter (Alex Atallah) ▷ #general (269 messages🔥🔥):

Qwen SimpleQA Drama, Qwen3 Coder vs Free, Deepseek V3 Base Model Gone?, Deepseek as Dipsy, OpenAI blocking China

OpenRouter apologizes for Qwen Drama: A member apologized for a mistake potentially causing the Qwen SimpleQA issue, wishing everyone a good night.
- They didn’t elaborate any further, so the specific details remain unclear.
Free Tier Rate Limits on Chutes: Members discussed hitting rate limits on the free tier with Chutes for Qwen3, experiencing frequent 429 errors, and recommended retrying requests.
- A member pointed out that depositing $10 unlocks 1000 requests a day, but failed requests still count toward the limit; plus, providers can still ratelimit you.
Alternative AI for Translation: Members discussed the best AI for translation, with KIMI K2 recommended as a good, not-too-expensive option, and a member noted they use Gemini 2.5 Pro.
- One member noted that, in their subjective tests, KIMI is very close to 2.5 Pro and has good knowledge about regional language differences.
Deepseek’s API downtime: Members reported experiencing issues with the Deepseek v3 0324 model, getting error messages on the paid tier.
- They also noted that Deepseek’s API has the best api, speed, and uptime but its horrible during peak times, but it is horrible during peak times.
OpenAI is blocking China in the region of Hong Kong for GPT-4.1: A member inquired about why OpenAI’s GPT-4.1 model cannot be used in Hong Kong via OpenRouter, while other models like GPT-4o are accessible.
- Members explained that OpenAI blocks people in China from using their models, but this block can easily be bypassed with a VPN. This is an attempt at slowing China down, avoiding synthetic data.

OpenRouter (Alex Atallah) ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter (Alex Atallah) ▷ #discussion (109 messages🔥🔥):

OpenRouter Serverless Architecture, Cloudflare R2 Storage, Large File Support, WandB Inference as Competitor, Compute Exchange

OpenRouter Embraces Serverless, Eyes Image/Video: OpenRouter’s API runs on Cloudflare Workers, making it entirely serverless, and they are actively working on a solution for the large file limitation to support image and video generation, effectively unlocking multimodal capabilities.
- The team is considering whether this market is worth prioritizing over other opportunities.
Cloudflare R2 for Image Storage?: A member suggested using Cloudflare R2 for image storage with serverless architecture, proposing a fee on image models to generate profit.
- A link to the relevant discussion on Cloudflare R2 was shared here.
Large PDF Support Incoming!: OpenRouter is working on supporting larger PDFs, even those exceeding 20MB, despite common provider request size limits around 25MB.
- This enhancement utilizes the same process to unlock other modalities such as image, audio and video; this is to avoid exceeding Cloudflare Worker’s 128MB memory limit per request.
Cloudflare Bandwidth Gotchas: Discussion arose about the potential for Cloudflare to force upgrades to expensive enterprise plans due to high bandwidth usage; a video was shared about a gambling website charged $120k after exceeding bandwidth limits.
- It was clarified that the issue was more complex than just bandwidth, involving shady activities under Cloudflare’s IP; another member stated that Cloudflare are an extremely fair company to deal with at many levels and I love them.
WandB Inference: Friend or Foe?: It was suggested that WandB Inference might be a competitor to OpenRouter.
- Another member clarified that it’s just another gpu (coreweave) wrapper, and OpenRouter has a large number of providers to onboard, with potentially close to 30 available.

not much happened today

Wed, Jul 23, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Qwen3-Coder, SWE-Bench Verified, 480B param Mixture-of-Experts

Qwen3-Coder beats open models: The Qwen3-Coder model is now live and beats every open model on SWE-Bench Verified, plus most closed models, according to this tweet.
- It can be tried out at OpenRouter.ai.
Qwen3-Coder has impressive specs: The Qwen3-Coder model features 480B parameters (35B active), 256K context length (extrapolates to 1M), and built-in support for function calling and multi-turn agent workflows.
- It is optimized for SWE-Bench, plus browser and tool use.
Qwen3-Coder almost beat Claude Sonnet-4: On the SWE-Bench Verified benchmark (500 turns), Qwen3-Coder achieved a score of 69.6%, narrowly missing Claude Sonnet-4’s 70.4%.
- It beat OpenAI o3 (69.1%), Kimi-K2 (65.4%), GPT-4.1 (54.6%), and DeepSeek-V3 (38.8%).

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

Openrouter, QwEn-3, automation deployment

Openrouter pairs with QwEn-3 for coding: The Openrouter platform now supports the QwEn-3 model for coding tasks, according to this tweet.
Automated deployment unlocked!: One user inquired about the automation of a deployment step, and another confirmed that their app includes automatic deployments.
- This feature is integrated into the app they are developing.

OpenRouter (Alex Atallah) ▷ #general (534 messages🔥🔥🔥):

Qwen3 Coder, Kimi K2, Gemini Pro/Flash for Coding, Free vs. Paid LLMs, Claude's strange behavior

Qwen3-Coder Benchmarks vs Real-World Coding: While Qwen3-Coder benchmarks well, one user found it failing miserably with a real-world coding task, repeatedly getting stuck on simple tasks, even with adjusted temperature settings.
- Other users chimed in support of Kimi K2 and Gemini as alternatives, with some suggesting it may depend on the size of the codebase or the prompting strategy used.
Gemini Pro and Flash models battle for coding supremacy: Gemini Pro is favored for architecting and orchestration, while Gemini Flash is liked for regular coding tasks and is super cheap, whereas others praised Kimi K2 and the Qwen models (new versions) for coding and debugging, noting its terse, economical code.
- There were discussions of using Gemini Flash Lite for its speed and cost-effectiveness, but one user reported that it often provides incorrect answers for anything beyond basic questions.
OpenRouter Data Policy Detailed: OpenRouter’s default policy is no storage of user inputs/outputs, however, users get a 1% discount to allow the data to be used for ranking LLMs; some providers may retain data, and the providers that do this are explicitly labeled as such.
- Users can disable all providers that store prompts/outputs by toggling off Enable providers that may train on inputs in settings.
Claude Exhibits Hallucinations: Users reported that Claude models began exhibiting strange hallucination behavior, where it barely follows instructions and is adding completely irrelevant stuff in to its responses.
- Reportedly, Toven (from OpenRouter) knows about this and has already escalated it to the team.
LLM Quantization Tradeoffs Explored: Users discussed the trade-offs of model quantization (FP8, BF16, etc.), with smaller models (FP4) trading accuracy for speed and memory efficiency.
- One user reported an FP4 can result in ~10% accuracy loss compared to FP16.

OpenRouter (Alex Atallah) ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter (Alex Atallah) ▷ #discussion (14 messages🔥):

Qwen Coder, Contextualized Evaluations, Chutes Models, Muting Thread Owners, xAI Colossus 2

Qwen Coder gets Chunky Context: Qwen Coder now has chonky 1mm context and a member indicated tooyep can do gimme a bit.
- The thread owners can implement it now.
Dive into Contextualized Evaluations: The evaluators are requested to assemble to discuss contextualized evaluations.
- This aims to assess model performance in realistic scenarios.
Chutes Models hit OpenRouter: A question was raised whether OpenRouter is adding every model on Chutes right now, or only the popular models.
- Another question was whether OpenRouter has finished adding older Chutes models.
Thread Owners Can’t Mute?!: It was noted that thread owners lack the ability to mute a specific user, as illustrated in this Discord thread.
- The issue will be addressed.
xAI Swings for Colossus 2: xAI is developing Colossus 2, which will host over 550k GB200s & GB300s soon as seen in this reddit post.
- Currently, 230k GPUs, including 30k GB200s, are training Grok in Colossus 1.

not much happened today

Tue, Jul 22, 2025

OpenRouter (Alex Atallah) ▷ #announcements (4 messages):

Intermittent 408 errors, DeepSeek v3 0324 Free Model, Chutes rate limits, Traffic Spikes, OpenRouter credits

408 Errors Investigated Amidst Traffic Surges: The team is aware of intermittent 408 errors and is investigating the traffic spikes that are causing this issue.
- The surges may be malicious, and the team apologizes for the issues.
DeepSeek v3 Free Tier Down Due to High Demand: Due to 2x surge in demand, the free tier of the DeepSeek v3 0324 model experienced downtime and instability, leading Chutes to introduce rate limits.
- To maintain stability for their paying customers, Chutes had to limit free usage.
Opt for Paid DeepSeek v3 to Circumvent Limits: Users impacted by the rate limits on the free DeepSeek v3 model are encouraged to use the paid DeepSeek v3 endpoint, costing less than $0.004 per request.
- Your initial 10 OpenRouter credits cover over 2,500 requests without affecting your daily 1,000 free model requests.
Enable Training-Allowed Providers for Cheapest DeepSeek v3: If using the DeepSeek V3 paid model, users can find the cheapest providers (Chutes and Targon) by enabling ‘Enable providers that may train on inputs’ in privacy settings.
- Doing so can reduce costs significantly, by opting into allowing providers to train on inputs.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

YourChat.pro, T3.chat, ChatGPT

YourChat.pro sets sights on T3.chat: A member promoted YourChat.pro as a competitor to T3.chat and ChatGPT.
- They emphasized OpenRouter’s heavy support and encouraged users to explore the application.
YourChat.pro, the little app that could: The member makes the claim that YourChat.pro has the features that will suede you over t3.chat and maybe even chatgpt.
- Unfortunately, the member gave no further details.

OpenRouter (Alex Atallah) ▷ #general (723 messages🔥🔥🔥):

OpenRouter Free Tier, DeepSeek v3 Issues, Qwen3 Model Discussions, Chutes Rate Limiting, Model Censorship

OpenRouter’s Free Tier Under Scrutiny: Users are debating the value of OpenRouter’s free tier, with claims of false advertising for the 1,000 free requests after depositing $10, due to rate limiting issues.
- Some users argue that the rate limits on free models, particularly DeepSeek v3, make it barely usable, with one user claiming that Chutes is essentially pressuring users to pay.
DeepSeek v3 faces rate limits: Users are reporting 408 errors and 404 errors using DeepSeek V3 (free), with some suspecting a potential DDoS attack.
- There’s confusion over whether the original DeepSeek v3 is permanently replaced by DeepSeek v3 0324, with reports of providers like Chutes no longer offering the original version for free.
Qwen3’s performance generates interest: Users are excited about the new Qwen3 model, discussing its potential as a coding model, as well as comparisons between reasoning and non-reasoning versions.
- The new Qwen 3 model has 5 request/min rate limit, one user noted that it is really good and FREE.
Chutes feels the Pressure, Bans User: One user reports being banned from Rayon Labs/Chutes’ server for pointing out issues with rate limiting, suggesting users are being pressured to pay for services.
- The user complained that Chutes auto delete every msg you put there without a trace and said chutes is basically pressuring you to either pay for them, or pay for deepseek.
Model Censorship?: Users debate the extent of censorship in current models, with some claiming they have never been censored on anything from openrouter, while others note the importance of proper prompting to bypass restrictions.
- One user noted that, regarding censorship, Models are harder to jailbreak reliably.

OpenRouter (Alex Atallah) ▷ #new-models (4 messages):

“

Channel OpenRouter - New Models Initialized: The channel OpenRouter - New Models was initialized on Readybot.io.
Readybot.io Announcement: The bot Readybot.io has announced the initialization of the OpenRouter - New Models channel.

OpenRouter (Alex Atallah) ▷ #discussion (72 messages🔥🔥):

Window AI browser extension status, Native Search Functionality for Models, OpenRouter's Exa search Implementation, Modular Add-on System, Gemini 2.5 Flash Lite GA

Window AI Extension Nearing End-of-Life: The OpenRouter team is considering end-of-life for the Window AI browser extension due to apparent low usage.
- This may result in the removal of dedicated Window AI source code in ST (presumably a development environment).
OpenRouter Weighs Native Search Integration: OpenRouter is actively working on bringing native search functionality online for models like OpenAI, Anthropic, Grok, and Gemini, as announced in Toven’s tweet.
- There was debate on whether to implement this via suffix, supported param, or the plugin API.
Exa Search Called Bad: One user stated that Exa search implementation is bad.
- The suggestion was made to input Exa search as a tool so the LLM can make a search query each time.
Modular Add-on System Framework is Proposed: A member suggested a modular add-on system where users could swap models like Grok for Claude while maintaining a consistent Exa experience.
- This could be expanded to implement moderation features, though concerns about cost and implementation were raised.
Gemini 2.5 Flash Lite Goes GA: Gemini 2.5 Flash Lite went GA with the same preview model, with one month given to migrate, according to this Google Developers blog post.
- This may prompt an alias, though concerns about the thinking tax were raised.

OAI and GDM announce IMO Gold-level results with natural language reasoning, no specialized training or tools, under human time limits

Mon, Jul 21, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (7 messages):

Kimi K2, GROQ, OpenRouter, Email Builder, FlowDown

Kimi K2, GROQ, OpenRouter Backend Ready in 5!: A member announced Kimi K2, GROQ, and OpenRouter backend is fully functional in under 5 minutes, demonstrated at fixupx.com.
FlowDown Gets a Facelift and Brew Boost: The FlowDown app received an update and is now installable via brew install —cask flowdown from its GitHub repository.
Mario Bros become AI Email Builders: A member jokingly transformed the Mario Bros into AI Email Builders, showcased in a tweet.
Code gets Organization Boost: A member inquired whether the code was human-readable, to which another confirmed its improved organization.

OpenRouter (Alex Atallah) ▷ #general (258 messages🔥🔥):

Claude 4 Opus pricing and usage, GPTs Agents Learning, Free Models, Janitor AI and 401 errors, Chutes Free Tier Limits

Opus 4 users discuss Usage and Pricing: Users discuss if Claude 4 Opus is too expensive, with one mentioning spending $10 in 15 minutes and another suggesting Anthropic’s €90/month plan for almost unlimited use.
- Another user states they “barely ever hit my limit” on the $20 plan because they don’t use AI tools in their IDE.
Discuss GPTs Agents’ Learning Limitations: One user asked about GPTs agents not learning after initial training, clarifying that uploaded files are saved as “knowledge” files but don’t continually modify the agent’s base knowledge.
- This means that while agents can reference new information, they don’t inherently learn from it in the same way as during pre-training.
Free Models Cause Confusion about Credit Limits: A user reports issues with the free model v3-0324, questioning why they were switched to the non-free version despite using the free tier.
- Several other users report similar issues with hitting credit limits or receiving errors even when using free models, with one noting their AI hasn’t been used since June.
Janitor AI Users Encounter 401 Errors: Multiple users report encountering 401 authentication errors while using Janitor AI, prompting OpenRouter support to investigate the issue.
- The support team suspects it might be a widespread problem and advises users to contact support with their account details for further assistance.
Chutes Scaling Back Free Tier Support: It’s revealed that Chutes is transitioning to a fully paid service, leading to fewer free models on the OpenRouter platform.
- Users express disappointment over the removal of previously available free models like Google’s Gemma-3-27b-it, though the paid version of Chutes is considered relatively inexpensive.

OpenRouter (Alex Atallah) ▷ #discussion (11 messages🔥):

OpenRouter models in Cursor, Kluster.ai shuts down, AI inference services shutting down

OpenRouter Models Integrate with Cursor but Breaks: OpenRouter announced the ability to use OpenRouter models in Cursor, highlighting Moonshot AI’s Kimi K2, but users reported issues getting it to work, especially outside of GPT-4o and Grok4.
- A member stated that it worked when we wrote it and then cursor broke stuff according to a tweet.
Kluster.ai Inference Service Shuts Down: Kluster.ai is shutting down their inference service, which has been described as a very cheap and good service.
- A user said this comes after CentML also shut down, raising concerns about the sustainability of AI inference services.
AI Inference Services Face Shutdowns: Several members are wondering why are all the inference services shutting down, speculating about a potential AI bust or hardware acquisitions.
- The closure of services like Kluster.ai and CentML has sparked concerns about the viability of smaller AI service providers in the current market.

ChatGPT Agent: new o* model + unified Deep Research browser + Operator computer use + Code Interpreter terminal

Thu, Jul 17, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (7 messages):

Kimi K2, GROQ, OpenRouter, Email Builder, FlowDown

Kimi K2, GROQ, OpenRouter Backend Ready in 5!: A member announced Kimi K2, GROQ, and OpenRouter backend is fully functional in under 5 minutes, demonstrated at fixupx.com.
FlowDown Gets a Facelift and Brew Boost: The FlowDown app received an update and is now installable via brew install —cask flowdown from its GitHub repository.
Mario Bros become AI Email Builders: A member jokingly transformed the Mario Bros into AI Email Builders, showcased in a tweet.
Code gets Organization Boost: A member inquired whether the code was human-readable, to which another confirmed its improved organization.

OpenRouter (Alex Atallah) ▷ #general (258 messages🔥🔥):

Claude 4 Opus pricing and usage, GPTs Agents Learning, Free Models, Janitor AI and 401 errors, Chutes Free Tier Limits

Opus 4 users discuss Usage and Pricing: Users discuss if Claude 4 Opus is too expensive, with one mentioning spending $10 in 15 minutes and another suggesting Anthropic’s €90/month plan for almost unlimited use.
- Another user states they “barely ever hit my limit” on the $20 plan because they don’t use AI tools in their IDE.
Discuss GPTs Agents’ Learning Limitations: One user asked about GPTs agents not learning after initial training, clarifying that uploaded files are saved as “knowledge” files but don’t continually modify the agent’s base knowledge.
- This means that while agents can reference new information, they don’t inherently learn from it in the same way as during pre-training.
Free Models Cause Confusion about Credit Limits: A user reports issues with the free model v3-0324, questioning why they were switched to the non-free version despite using the free tier.
- Several other users report similar issues with hitting credit limits or receiving errors even when using free models, with one noting their AI hasn’t been used since June.
Janitor AI Users Encounter 401 Errors: Multiple users report encountering 401 authentication errors while using Janitor AI, prompting OpenRouter support to investigate the issue.
- The support team suspects it might be a widespread problem and advises users to contact support with their account details for further assistance.
Chutes Scaling Back Free Tier Support: It’s revealed that Chutes is transitioning to a fully paid service, leading to fewer free models on the OpenRouter platform.
- Users express disappointment over the removal of previously available free models like Google’s Gemma-3-27b-it, though the paid version of Chutes is considered relatively inexpensive.

OpenRouter (Alex Atallah) ▷ #discussion (11 messages🔥):

OpenRouter models in Cursor, Kluster.ai shuts down, AI inference services shutting down

OpenRouter Models Integrate with Cursor but Breaks: OpenRouter announced the ability to use OpenRouter models in Cursor, highlighting Moonshot AI’s Kimi K2, but users reported issues getting it to work, especially outside of GPT-4o and Grok4.
- A member stated that it worked when we wrote it and then cursor broke stuff according to a tweet.
Kluster.ai Inference Service Shuts Down: Kluster.ai is shutting down their inference service, which has been described as a very cheap and good service.
- A user said this comes after CentML also shut down, raising concerns about the sustainability of AI inference services.
AI Inference Services Face Shutdowns: Several members are wondering why are all the inference services shutting down, speculating about a potential AI bust or hardware acquisitions.
- The closure of services like Kluster.ai and CentML has sparked concerns about the viability of smaller AI service providers in the current market.

not much happened today

Wed, Jul 16, 2025

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

o1-preview Deprecation

o1-preview shutdown on July 28, 2025: The openai/o1-preview endpoint is deprecated and will be shut down by OpenAI on July 28, 2025.
- Newer o-series models from OpenAI are available here.
Deprecation info via API: A member asked whether it would be possible to get deprecation information via the API.
- It is uncertain whether this will be implemented.

OpenRouter (Alex Atallah) ▷ #general (357 messages🔥🔥):

Deepseek R1 Quality Drop, OpenRouter AutoRouter Privacy Concerns, Model Deprecation Notices, GPT 3.5 Turbo Endpoint Gone, Claude Opus 4 Weekly Token Usage

User Exposes OpenRouter’s SwitchPoint Router Privacy Blunder: A member reports that the SwitchPoint Router was pre-selected without their consent, potentially violating their NDA and sending tokens to China, after adding a firewall rule to ban the OpenRouter Chat, because it’s not safe enough for use.
- An OpenRouter admin responded, stating that Switchpoint is US-based, not China, and users can disable providers in settings, and the auto router picks between a list of high quality models on OpenRouter.
OpenRouter’s Usage Metrics Not Always Real-Time: A user wonders why their OpenRouter activity isn’t updating in real-time, noting discrepancies between DeepSeek activity and recent API tests with Devstrall and K2.
- Another user confirms experiencing the same issue.
Why OpenAI’s GPT 3.5 Turbo Endpoint is Gone: A user points out the disappearance of the openai/gpt-3.5-turbo endpoint, seeking clarification and notes that other providers could’ve served you. It had successful chess records until 2025-06-23.
- An OpenRouter admin responded that they are looking into this issue and have resurrected it for future use.
User Flags $412 Unauthorized Top-Up: A user reports an unauthorized $412 top-up on their account, with a corresponding 404 error when trying to view the invoice, and is seeking assistance with investigating the charge and ensuring account security.
- An OpenRouter admin explains that it was not a charge, but rather a refund, please check your spam folder.
DeepSeek Quality Plummets with Q4 Quantization: A user notes a significant drop in quality for Deepseek R1 0528 in roleplay, with the model hallucinating more intensely at lower quantizations.
- Another user agrees, noting similar issues and recalling a truly horrible R1 performance, and is doing tests comparing Q4 to fp8.

OpenRouter (Alex Atallah) ▷ #discussion (41 messages🔥):

Quality Certification Badges, Speed Certification Badges, Eval Harness for Model Benchmarks, Context Compression Shenanigans, Tool Use Benchmarks

OR crafts Quality and Speed Certification Badges: OpenRouter is exploring quality and speed certification badges for models, similar to :nitro filters, to address disparities like Kimi K2’s varying speeds across providers (10 TPS vs 100 TPS vs 500 TPS).
- The aim is to highlight providers with reliable tool calling, consistent ratelimits, and high-quality outputs, while accounting for quantization and potential tool call failures, with new models starting at an unverified tier.
Eval Harness drifts Model Benchmarks: The proposed eval harness would baseline against model authors’ published benchmarks, continuously measuring the drift or difference, between official scores and endpoint scores to verify model performance at OpenRouter.
Context Compression Shenanigans Benchmarked: A long fiction benchmark up to 128k context is suggested to verify the absence of context compression shenanigans, alongside tests prompting models to output many tokens to confirm providers’ stated output token count.
Tool Use Benchmarks Tau-2 airline: Tool use benchmarks like the Tau-2 airline are recommended, inspired by tweet, to detect and resolve tool use chat template bugs.
- For example, implementing checkers or chess, which test a lot of random things about the model with a very easy evaluation criteria.
Baseten Latency transforms into Retries: There is an idea for handling 429s is to transform latency numbers into expected latency based on retries, addressing scenarios where prompts take excessively long due to frequent 429s and high latency, such as Baseten.

not much happened today

Mon, Jul 14, 2025

OpenRouter (Alex Atallah) ▷ #announcements (10 messages🔥):

Cypher Alpha Sunset, Kimi K2 launch, Gemini 2.5 Flash Deprecation

Cypher Alpha Sunsets: The Cypher Alpha demo period ended on July 14th between 11am and 12pm ET.
- The team thanked users for contributing to the early model development.
Kimi K2 Arrives with Coding Prowess: Kimi K2 by Moonshot is now live on OpenRouter, served by Novita and Parasail, boasting 1T parameters and 65.8% on SWE-Bench Verified, topping open-source charts for coding and tool use.
- The launch suffered from a huge traffic surge and a DoS attack, so users may see some errors on the website as the team scales up and diagnoses the issues, more info at OpenRouter.
Gemini 2.5 Flash Preview’s Flashy Farewell: The Gemini 2.5 Flash Preview models (google/gemini-2.5-flash-preview-05-20, and google/gemini-2.5-flash-preview) were deprecated by Google on July 15th.
- The recommended replacement is google/gemini-2.5-flash, but due to pricing changes, OpenRouter will not auto-route traffic and users need to update their code; previously, flash preview was much cheaper than flash, it’s actually a better model.

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

Mathcheap, Y-Router, Personality.gg, Multi-AI Automated Research Bot

Mathcheap Debuts as Free Mathpix Snip Alternative: Mathcheap emerges as an AI-powered, free alternative to Mathpix Snip.
Y-Router Simplifies Claude Code Integration with OpenRouter: Y-Router, now available on GitHub, serves as a simple proxy enabling Claude Code to work with OpenRouter.
Personality.gg Offers Free Roleplay Experience: Personality.gg (Discord) is a free roleplay website and app alternative to Character.ai and Janitorai.com, powered by OpenRouter.
Multi-AI Automated Research Bot Launches for Deep Project Analysis: The Multi-AI Automated Research Bot automates in-depth project research using the OpenRouter API, orchestrating multiple LLMs to concurrently execute, cross-validate, and synthesize information into structured reports, all managed through simple text files for high customizability.

OpenRouter (Alex Atallah) ▷ #general (833 messages🔥🔥🔥):

Text Completion, OpenRouter's Credit System, Chatroom GUI, Svelte vs React Chat Performance, Rate Limits

Text Completion Service Returns Errors: Users reported that some providers are returning errors for text completion requests, with one user noting that OpenRouter might be sending text completion requests as chat completion requests, with one user providing a code example showing missing field created error.
- One user asked about the status of text completion, and another user linked to a news article related to OpenAI.
Free Model Usage with Paid Credit: Users discussed the 1000 free model requests per day limit, one user confirmed that it requires a one-time deposit of at least $10 USD.
- A user stated if you bought at least $10, you get 1000 requests to free models but questioned whether it was permanent, and another confirmed that it is permanent.
Chatroom GUI Got Updated: Users are reporting new GUI isn’t carrying over default model preferences for new rooms and other GUI issues are being addressed.
- One user has reported slower performance, while another noted that the toggle for reasoning is gone.
Next.js and Performance vs Svelte: Users discussed the performance of chat applications built with React versus Svelte, noting that Svelte-based chats tend to perform much better due to React’s immutability model.
- One user argued that everything you build with react is very heavy, fat and runs very bad.
Rate Limits: Members are asking about rate limits and how Chutes’s rate limits are impacting OpenRouter; and whether there is any way to determine rate limits.
- Several users stated the belief that Chutes are at around 200 daily free now (Not openrouter rate limit) But chutes free limit per user.

OpenRouter (Alex Atallah) ▷ #new-models (12 messages🔥):

Switchpoint Router, Default Model Settings, Auto Router Functionality

Switchpoint Router fixed pricing raises questions: A user questioned the fixed pricing of the Switchpoint Router and expressed concerns about its default selection, as they prefer to use their own pre-selected default model in chatrooms.
- The user further criticized the lack of customization and high cost, suggesting that it might limit adoption compared to customizable routing solutions.
Default Model setting malfunction: Users reported that the default model setting in account preferences is being ignored, with the chat defaulting to Switchpoint Router instead of their specified model, requiring manual selection each time.
- One user said, *‘Now it just defaults to switchpoint and ignores the default model I set, and I have to manually select my model every time instead.’
Auto Router Confusion Clarified: It was clarified that clearing the default model setting reverts to the Auto router, not Switchpoint, but a bug was identified where the default model setting isn’t functioning correctly in the chatroom.
- A screenshot (linked here) was shared to illustrate this, prompting a promise to investigate the bug.

OpenRouter (Alex Atallah) ▷ #discussion (89 messages🔥🔥):

OpenRouter Pricing, Frontend UI Discussions, Gemini Embedding, Fast LLMs

OpenRouter’s Fixed Output Pricing Disclosed: Members clarified that OpenRouter uses fixed output pricing, meaning the cost is the same regardless of the underlying model used.
- Some users expressed disappointment, expecting routers to provide savings, while others focused on the potential latency benefits.
UI gripes get attention: Users voiced criticisms about the OpenRouter frontend UI, particularly the lack of distinction in the reasoning block, centered chat layout, and small chatbot input box.
- A member also noted that changing from one room to another, Auto Router overrides the previous model saved in the room and that copy-pasting doesn’t work.
Gemini Embedding moves into GA: It was mentioned that Gemini Embedding is graduating from experimental to General Availability (GA).
- While some members reported good results, concerns were raised about rate limits, pricing competitiveness, and the risks of customer lock-in with closed-source models.
Fast LLMs get discussed: Members discussed options for fast LLMs, comparing Llama 3.3 70B, Llama 4 Maverick, and Groq’s big Qwen3
- Suggestions included Cerebras models and Grok 3 mini, but one member reported Grok 3 mini is in the slow side (TTFT).

Kimi K2 - SOTA Open MoE proves that Muon can scale to 15T tokens/1T params

Fri, Jul 11, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Cypher Alpha, Kimi K2, Moonshot, Novita, Parasail

Cypher Alpha Demo Period Expires: The demo period for Cypher Alpha will expire on Monday, July 14th between 11am and 12pm ET.
- A message thanked users for contributing to early model development.
Moonshot’s Kimi K2 debuts on OpenRouter: Kimi K2 by Moonshot is now live on OpenRouter, served by Novita and Parasail in the US.
- With 1T total parameters and 65.8% on SWE-Bench Verified, it’s top of the open-source charts for coding and tool use, per this announcement.

OpenRouter (Alex Atallah) ▷ #general (412 messages🔥🔥🔥):

Grok 3 mini endpoints, OpenRouter Credit Issues, Prompt Optimization, Image Token Double Counting, Grok 4 Rate Limits

Grok 3 Mini Endpoint Confusion Cleared Up: The grok-3-mini-beta and grok-3-mini-latest slugs in OpenRouter both point to the same grok-3-mini model, acting as aliases as confirmed by XAI docs.
OpenRouter Addresses Image Token Overcharge: OpenRouter informed users of a bug that double-counted image tokens between April 3rd and June 26th, resulting in overcharges and has issued credits to affected accounts to compensate, such as $713.80 in one reported instance.
- Users were encouraged to contact support at [email protected] for further details regarding affected requests and calculation specifics.
Debate on Google’s Gemini 2.5 Pro: Users debated whether Google’s free tier degraded Gemini 2.5 Pro, noting stability issues with free versions of services.
- Concerns were raised about the fairness of abusing free tiers with bot accounts versus supporting API providers like OpenRouter, in addition to a cancerous take regarding the same.
Text Completion API Status Questioned: Users reported issues with OpenRouter’s text completion endpoint, with some providers returning errors indicating prompts were in chat completion format and according to some reports, it would seem that text completion has been broken since at least May.
- A user requested clarification on whether text completion is supported and if not, requested a refund.
OpenRouter to Include Paid Chutes Models: OpenRouter plans to include paid Chutes models, which are currently free, sometime next week as confirmed by a staff member, and users also pinged OpenRouter to add Cerebras Qwen3 235b.
- Furthermore, questions were raised about OpenRouter updating the old free chutes-only models to now let users use chutes paid.

OpenRouter (Alex Atallah) ▷ #new-models (5 messages):

Switchpoint Router, $/mtok Pricing

Switchpoint Router Question: A member inquired about the status of the fixed $/mtok pricing on the Switchpoint Router.
Pricing concerns: The member was confused.

OpenRouter (Alex Atallah) ▷ #discussion (11 messages🔥):

Mistral deep research model, Amazon & Anthropic AI alliance, Microsoft & OpenAI partnership, Devstral Medium Pricing, Translation models

Mistral Cooks Up Deep Research Model: Mistral is reportedly developing a deep research model this month, but no further details are available.
Amazon eyes deeper Anthropic alliance: Amazon is considering further investment in Anthropic to strengthen their AI partnership, according to a Financial Times report.
Microsoft and OpenAI Plot Quietly: Microsoft and OpenAI are mooching under the covers quietly again to further their partnership.
- Meanwhile, a user declared ultra.doan has the best branding imho, which was accompanied by a logo depicting a Minecraft-esque avatar.
Model Recommendations for Translation: A member sought model recommendations for translating texts between English, German, French, and Italian, noting that Gemini 2.5 Pro often but not always does a good job.
- They pointed out that it has issues if the target text length is limited, i.e. resulting text must be between X and Y characters long.
Devstral Medium’s Pricing Questioned: A member shared that Devstral Medium costs only $0.032, but another expressed confusion about the output pricing, questioning if it’s a fixed price for the LLM response.
- The member asked: How does the output pricing here work? I’m kind of confused about what is meant by “output”, because if it’s really a fixed price for the LLM response, there’s little point for routing in the first place.

Grok 4: xAI succeeds in going from 0 to new SOTA LLM in 2 years

Thu, Jul 10, 2025

OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

Grok 4, Free Tier Changes, Venice Uncensored, DeepSeek V3 and R1

Grok 4 model goes live with impressive specs: The Grok 4 model is now live on OpenRouter as of last night at 10pm PT, boasting impressive benchmark results, a 256k context window, and support for parallel tool calling, structured outputs, and images.
- The announcement encouraged users to discuss it on X.com or the OpenRouter Discord channel.
Free tier undergoes changes: Two providers are transitioning from free inference to a paid model, but OpenRouter is onboarding new providers to sustain free access and covering some costs to keep popular models available.
- They also mention that they’re working with partners to ensure that DeepSeek V3 and DeepSeek R1 will have similar free tiers for the time being, while some less popular models may no longer be available for free.
Venice Uncensored model debuts: A new model called Venice Uncensored, created by the Dolphin creator, is now available for free on OpenRouter’s list of free models.
- A user asked about using the Dolphin-Mistral-24b-Venice-Edition via the API, and another user suggested clicking the API tab, pointing out that Google Gemini 2.5 Pro’s API offers a free tier.

OpenRouter (Alex Atallah) ▷ #general (437 messages🔥🔥🔥):

Grok 4, Chutes paywall, Free Models, OpenRouter Credits, Model Usage

**Grok 4’s Heavy Hitter is hard to get: Members discussed the availability of Grok 4, noting it’s not yet live on the API, and its heavy version may involve a best-of-N approach.
- Some users reported getting empty responses and 429 errors from the API, while others shared initial impressions, stating that its prose is nerdy.
**Chutes Closes Free Tier Window, OpenRouter Users Fret: Chutes implemented a paywall, moving to a $5 deposit for free model access with a limit of 200 uses per day.
- Many OpenRouter users expressed concern about the impact on free model availability, with some anticipating the removal of less popular models.
**OpenRouter to Double check Grok and other free models: OpenRouter reassured users that community favorites like DeepSeek V3 and DeepSeek R1 will maintain similar free tiers for the time being, while less popular models may be cut off.
- The team is doing all they can to keep DeepSeek free at least, and possibly more models. See OpenRouter’s models page.
****Overcharged Image Tokens on OpenRouter? Check Your Credits!**: OpenRouter acknowledged a bug that double-counted image tokens between April 3rd and June 26th, resulting in overcharges.
- Affected users received credits to their accounts to compensate for the error, as detailed in an email from the OpenRouter Team.
**BYOK Users will pay 5% more: When using Usage Accounting and BYOK, the total cost is usage.cost + usage.cost_details.upstream_inference_cost, meaning that OpenRouter charges usage.cost in addition to the provider’s cost, with a 5% convenience fee.
- BYOK cost is documented on OpenRouter’s BYOK page + in the docs.

OpenRouter (Alex Atallah) ▷ #new-models (4 messages):

Grok 4 on OpenRouter

Grok 4 Lands on OpenRouter: Grok 4 is now live on OpenRouter, expanding the platform’s model offerings.
Image analysis: An image with the title OpenRouter - New Models was attached to this message.

OpenRouter (Alex Atallah) ▷ #discussion (32 messages🔥):

MCP server with OpenRouter, Chutes going paid, Grok 4's Elon-approved finetuning, Mistral's deep research model, Amazon invests in Anthropic

MCP Server Explored with OpenRouter: A member inquired about using the MCP server from neurabase.deploya.dev with OpenRouter, referencing a tweet about chutes going paid.
- Another member asked about the meaning of “and that surname…”.
Chutes Transitioning to Paid Model: Users discussed whether Chutes is transitioning to a paid model, questioning if the marketing copy is misleading.
- One member mentioned that Chutes is going paid only, while another pointed out a $5 deposit for free API access, calling it a bad deal compared to OR.
Grok 4 Finetuning Speculations: There was discussion about possible evidence of Grok 4’s Elon-approved “based” finetuning, although one user suggested it might just be a system prompt, referencing an attached image.
Mistral’s Research and New API Models: Mistral is reportedly developing a deep research model this month, as well as new models devstral-small-2507 and devstral-medium-2507 on their API.
Amazon Eyes Further Anthropic Investment: Amazon is considering further investment in Anthropic to deepen their AI alliance, per FT report.
- Meanwhile, Microsoft and OpenAI are “mooching under the covers quietly again”.

not much happened today

Wed, Jul 9, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Token Market Share Rankings, Langfuse Integration

Track Token Titans on Leaderboard: The Rankings page now lets you track token market share of different labs over time, with a better legend.
- This should provide a clearer view of which labs are leading in token usage.
Langfuse Lands on OpenRouter: Docs for Langfuse + OpenRouter integration are now live.
- Langfuse provides open-source observability and analytics for LLM applications.

OpenRouter (Alex Atallah) ▷ #general (262 messages🔥🔥):

Stripe Alternatives, FreeBSD Wifi Cards, RAG Query Array, OpenRouter Hunyuan API, Google Model Error Rates

Paddle or Polar for Stripe Replacement?: A user is seeking alternatives to Stripe because it’s unavailable in their country, specifically asking about Paddle or Polar.
- Another user initially suggested that Stripe is superior, but this was not helpful given the original user’s constraint.
FreeBSD Wifi Card Picks Stir Debate: Qwen3 recommends Atheros (Qualcomm) chipsets for FreeBSD, while R1 suggests newer Intel AX210 and AX200 cards, including Wifi 6 and Wifi 6e support.
- The recommendation of newer Intel cards is questioned since FreeBSD didn’t have wifi 5 support when the models were trained and these AX chipsets are rather buggy.
RAG Systems Get Query Array Boost: For RAG systems, it’s suggested to have an LLM prepare an array of queries from a text, like breaking “Tell me what happened in America on 4th of July” into multiple queries, then use a function to fetch top k documents based on these queries.
- A reranker and function to remove identical chunks is then suggested after the top-k documents are found.
Hunyuan API Woes Plague Users: Some users reported that the OpenRouter Hunyuan API isn’t working and questioned whether Hunyuan receives the system prompt.
- A user shared an error attachment in the discord channel but no resolution was presented.
OpenRouter’s 100% Uptime: Fact or Fiction?: One user touted having 100% uptime with OpenRouter for two months, while another stated 100% uptime is like a fantasy when using main servers.
- This comment was made in response to Deepseek 0324 free crashing on all providers.

OpenRouter (Alex Atallah) ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter (Alex Atallah) ▷ #discussion (23 messages🔥):

Grok disabled on Twitter, Gemini Flash 2.5, MCP server from neurabase.deploya.dev, chutes going paid

Grok gets Glock’d on X: Grok has apparently been disabled on Twitter (X).
- The anticipated Grok 4 release was delayed, causing confusion as to whether it had been released or not.
Gemini Flash Steals The Show: A member inquired whether Gemini Flash 2.5 is the best option currently available in terms of speed, price, and tool-use ability.
Neurabase MCP Server goes OpenRouter: A user asked if anyone has tried the MCP server from neurabase.deploya.dev with OpenRouter.
- They referenced this X post without additional explanation.
Chutes Charges Ahead: Concerns were raised whether the chutes service is going paid, due to copy seeming to be misleading.
- The users clarified that the copy was probably not updated, and that chutes is now paid.

SmolLM3: the SOTA 3B reasoning open source LLM

Tue, Jul 8, 2025

OpenRouter (Alex Atallah) ▷ #general (263 messages🔥🔥):

Automod bypass role, Grok 4 arrival, Vertex-AI integration, Cerebras context length increase, Free money methods

Automod gets Bypass Role: Admins are planning to add a bypass role for users who joined before the series-A announcement to avoid false positives with the automod due to greetings, but false positives may still occur. nohello.net was suggested as an alternative.
- The automod is currently stopping messages from being sent initially to prevent bot-like behavior
Grok 4 Incoming: Members speculate that Grok 4 may be released soon, possibly coinciding with a livestream on July 9th as teased by elon.musk.
- This speculation follows reports of increased censorship, suggesting potential updates or changes to the model.
Vertex-AI Feature Wishlist: Users are requesting the addition of Vertex-AI’s new global location feature to the BYOK integration, which aims to reduce hitting rate limits.
- Currently, users have to specify the region for the service account.
Cerebras Beefs Up Context Length: Cerebras has increased the context lengths for llama-3.3-70b and qwen-3-32b from 8K to 64K for the free tier.
- This matches their paid tier, which is already accessible via OpenRouter.
Deepseek R1 speed issues: Users note that Deepseek’s official provider is slow and unstable and all the other providers are either too slow, too expensive, or too unstable so there’s no good provider for Deepseek models.
- One user recommends Deep Infra that has two options with R1, one is 50 tokens and the other is 200 tokens per second, the second one costs double though. Deep Infra R1 Turbo

OpenRouter (Alex Atallah) ▷ #new-models (2 messages):

“

No new models discussed: There were no new models discussed in this channel.
- The channel was quiet.
Readybot Initialization: Readybot.io initialized for OpenRouter - New Models.
- This likely indicates the start of a new message history or a reset.

OpenRouter (Alex Atallah) ▷ #discussion (1 messages):

soflowsen: UwU

not much happened today

Mon, Jul 7, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (6 messages):

MCP for Claude Code, personality.gg, NipponHomes.com

MCP Helps Claude Code: A member created a simple MCP (Message Control Protocol) for Claude Code/Cursor to manage long-running processes like a dev server, available on GitHub.
personality.gg: Free Roleplay Website Alternative: A member promoted personality.gg, a free roleplay website and app alternative to Character.ai and Janitorai.com, powered by OpenRouter.ai.
- The platform also has a Discord server.
NipponHomes.com built on OpenRouter: A member created NipponHomes.com, a Zillow-like service for Japan, using OpenRouter + Zyte + Scrapy.

OpenRouter (Alex Atallah) ▷ #general (862 messages🔥🔥🔥):

Llama 3.2 3B pricing anomaly, DeepSeek V3 setup guide, Perplexity API issues, Grok 4 Leaks, Monad Tag

Llama 3.2 1B Model Costs More!: Members were surprised to find that the Llama 3.2 1B model from DeepInfra is priced higher at $0.005/0.01 than the more capable Llama 3.2 3B model at $0.003/0.006.
Users Need a Guide to Deepseek V3 Setup: One user requested a guide for setting up Deepseek V3 0324 on a frontend like risuAI, emphasizing the need to send more than 50 messages per day, but they were informed that this required purchasing OpenRouter credits.
Perplexity API Deprecation Causes OpenRouter Issues: Users reported issues with the Perplexity API, specifically the model llama-3.1-sonar-small-128k-online, and a moderator clarified that Perplexity likely deprecated the old model, requiring users to update the models parameter in their API request for correct model routing.
- They pointed to a Changelog Notice from February, noting the surprise that it still worked, and recommended using the feature filters on the Models Overview to find alternatives.
Alleged Grok 4 Leaks Tease: An image was shared (image.png) purporting to show benchmark results for Grok 4.
- Discussion suggests that Grok 3 mini is a real deal and that Grok 3 is not anything special considering its price.
Toven’s Crypto-Spam Squelch: Spam from crypto bros has been on the rise, and they have been getting blocked, they are also identifiable as having a Monad tag and bull posting on X, and a mod, Toven, has been locking them out for a week at a time.

OpenRouter (Alex Atallah) ▷ #new-models (2 messages):

“

No New Models to Report: There were no new models or significant discussions about models in the provided messages.
Channel Silent on Innovation: The channel activity lacked substantive discussion, updates, or links related to new AI models.

not much happened today

Thu, Jul 3, 2025

OpenRouter (Alex Atallah) ▷ #announcements (6 messages):

Airdrops, Cryptocurrency

Airdrop Speculation DOA: A PSA was issued stating that there is no airdrop, live or planned.
- A member inquired what an “airdrop” is, in response to the PSA.
Airdrop = Cryptocurrency: A member clarified that an airdrop is a cryptocurrency thing.
- No further information or context was provided.

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

Roleplay Website, personality.gg, character.ai alternative, janitorai.com alternative

personality.gg Launches as Roleplay Alternative: A member shared personality.gg, touting it as a free roleplay website and app alternative to character.ai and janitorai.com.
- The platform is powered by OpenRouter.
Discord Community for personality.gg: The platform has a Discord community for users to connect and discuss the platform.

OpenRouter (Alex Atallah) ▷ #general (540 messages🔥🔥🔥):

OpenRouter provider selection, Contribution to OpenRouter, Chutes paywall, OpenRouter Trivia, Gemini 2.5 Pro

OpenRouter Chooses Expensive Providers by Default: A user reported that OpenRouter sometimes selects more expensive providers even when cheaper options are available and working, but a member explained that without specifying a provider, OpenRouter uses load balancing which routes to other providers if one is experiencing high traffic.
- Users can sort providers by price to prioritize cheaper providers using a floor price shortcut.
Discord Users Seek Ways to Contribute to OpenRouter: New Discord users inquired about contributing to OpenRouter, but it was clarified that this is not a crypto project or a Web3 platform, and instead a unified interface for using LLMs from different providers.
- A member suggested that OpenRouter should create a contribution link to redistribute funds as credits, but another user pointed out many are seeking financial rewards and were potentially misled by crypto-related announcements.
Chutes Implements Paywall, OpenRouter Discussed as Alternative: Users discussed Chutes’ decision to implement a paywall, with one mentioning a switch to OpenRouter as an alternative because Chutes now requires a $5 payment for 200 daily messages.
- It was noted that one user made 10,000 alt accounts to exploit free requests, leading to the paywall, and that OpenRouter’s model of providing 1,000 free requests daily after a $10 deposit is a good model.
Gemini 2.5 Pro Free on AI Studio: Members shared that Gemini 2.5 Pro is available for free on AI Studio and users can obtain an API key without credit card details, however its free tier is rate limited to 5 RPM and 100 RPD.
- It was noted that data may be used for training unless users are from the European Economic Area, Switzerland, or the United Kingdom.
Users Get Hooked on Horny AI Models for Roleplay: Some members are struggling to control their AI models. One user asked why v3 generated horny responses, even in sfw roleplays, and another suggested that a system prompt is causing the issue, as LLMs aren’t horny by default.
- The user said that model Llama-3_1-Nemotron-Ultra-253B-v1 doesn’t seem to exist anymore.

not much happened today

Wed, Jul 2, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

DeepSeek V3, Configuration Mistake, Downtime Apology

DeepSeek V3 Glitches Briefly Offline: The DeepSeek V3 0324 model experienced approximately 15 minutes of downtime due to a configuration mistake on their end.
- They apologized for the interruption and confirmed that the model is now back online.
DeepSeek V3 is back online!: The model is back online and ready for use after the unexpected downtime.
- Users can resume their tasks without further interruption.

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

AI-powered dictionary app, Free roleplay website

AI Powers New Dictionary App: A member built a free AI-powered dictionary app at mnemix.arnost.org.
- The app generates explanations, examples, and quizzes to help users learn vocabulary faster and more effectively.
Free roleplay website alternative: A member posted about a free roleplay website and app alternative for character.ai, janitorai.com, and many more powered by openeouter.ai at personality.gg.
- A link to the Discord personality.gg was also shared.

OpenRouter (Alex Atallah) ▷ #general (561 messages🔥🔥🔥):

Deepseek 0324 outage, Cypher Model Details, cometapi using OpenRouter data, Grok-4-code-0629, Contributing to OpenRouter

Deepseek V3 0324 briefly disappears, community panics: The Deepseek v3 0324 model temporarily went offline, leading to a surge of users in the Discord channel expressing concern and sharing their reliance on it for role-playing bots.
- A staff member confirmed it was coming back soon, quieting the panic, and users discussed alternative models like R1 0528 and praised the support staff for the quick response.
Cypher Model Internal Parameter Hallucinations: Users discovered that the Cypher model, when prompted, consistently outputs details about its parameters, like 117 million parameters and a 768 embeddings dimension, despite being likely hallucinations.
- Members noted the model, based on a modified GPT-2 architecture, is designed to respond appropriately to potentially harmful prompts but seemingly lacks accurate self-awareness of its own technical specifications.
CometAPI Poaches Clients with OpenRouter data: A user was contacted by cometapi on Facebook, who claimed to know which LLM model they were using and offered cheaper services, leading to suspicion that OpenRouter data might be used for client poaching.
- It was revealed that the user was sending their website and title to OpenRouter via HTTP headers, making them visible in lists of top token users and exposing their model usage, as stated in the OpenRouter API documentation.
DeepSeek’s Overload Causes Load Balancing Issues: Some users experienced slow or timed-out responses from DeepSeek, especially around 3 PM PST, leading to billing issues even when responses didn’t arrive, which indicates that DeepSeek’s load balancing mechanism is kicking in.
- Community members recommended trying alternative models like Mistral 3.2 24B small or Grok and suggested that DeepSeek may be overloaded and that the user may need to look for lower latancy models.
Community Seeks Contribution Methods for OpenRouter: New Discord members inquired about ways to contribute to OpenRouter, expressing interest in helping with the project.
- While OpenRouter is not a crypto project, community members suggested contributing through documentation or developing innovative ways to gauge user understanding through trivia.

not much happened today

Mon, Jun 30, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Llama 3.3 70B, Cloudflare Vietnam Philippines issue

Llama 3.3 70B discounted 70%: There is now a 70% discount live for Llama 3.3 70B.
Cloudflare issue resolved: A Cloudflare issue impacting Vietnam and the Philippines based requests was investigated.
- The issue is now resolved and they are continuing to investigate to understand the root cause of the problem.

OpenRouter (Alex Atallah) ▷ #app-showcase (26 messages🔥):

PGaaS Prototype Feedback, Chat Super Slow Update, Minmax-m1 to Llama 3.3, Authentication / Anti Rev, Codebase to Text Tool

PGaaS Prototype Launched, Seeks Feedback: A user launched a hasty PGaaS prototype and is looking for feedback at paulgraham.resurrect.space.
Chat Update Still Super Slow: A developer pushed a new update to a chat application that is still reported to be super slow, but they have transitioned from minmax-m1(extended) to llama 3.3.
Authentication anti-rev improvements: After pushing an update, a user noted a main concern would be adding authentication / anti rev.
- The developer asked for input on what to improve in the UI/UX, suggesting maybe voice mode or a dark theme.
Codebase to text tool launched: A developer launched a codebase-to-text tool (PromptMan) to convert any codebase into a markdown file at PromptMan.
EveryDev.ai Uses OpenRouter for tool sharing: The founder of EveryDev.ai, a new platform where AI developers can find, rate, and share tools, has been using OpenRouter, especially through the Cline API.
- They are planning a giveaway and asking if there is a way to create promo codes or fund users’ accounts directly to try out different models on OpenRouter.

OpenRouter (Alex Atallah) ▷ #general (576 messages🔥🔥🔥):

OpenRouter token?, Gemini 2.5 Pro API, telegram raid, Automated spammers, GPT writing style

Speculation Arises over OpenRouter Token: Users speculated about the existence of an OpenRouter token ($OR), with some believing it’s related to recent influx of crypto enthusiasts, however, it has been stated that there is no xp / community rewards, there is no token, there is no airdrop.
- Members dismissed the idea, pointing out the implausibility of a scam token after a $40 million seed A funding round.
Gemini 2.5 Pro Free Tier Rumors: A member mentioned a rumor about Gemini 2.5 Pro having a free tier for the API based on a tweet by Logan Kilpatrick.
- The community expressed hope that the free tier would last longer than just a weekend, but one user was concerned about potential abuse through automated counting.
OpenRouter Server Under Telegram Bot Raid: The OpenRouter Discord server experienced a raid from Telegram groups, with many new users joining and posting generic greetings.
- Members speculated that these users were lured in by the promise of crypto rewards or airdrops, while one user mentioned they came from telegram, from a place called роснодмониторинг.
OpenRouter Server Battles Automated Spammers: The OpenRouter community is facing an increase in automated spammers posting generic messages.
- The community is discussing potential solutions, including automod rules and mobile verification, but no specific measures have been implemented yet.
LLMs Adjective-Overloading Writing Style: One member complained about Large Language Models writing fiction with excessive expository adjectives.
- They said No matter how much I tell it not to, no matter how many examples I show it, none of them can say “He snarled at her before walking away,” instead of “He glared at her evilly before storming off in anger.”

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

Error Identification, Image Analysis

Error Spotted by User: A user reported an unspecified error, including a screenshot for context.
Awaiting Details: Further information is needed to identify and resolve the error.

OpenRouter (Alex Atallah) ▷ #new-models (2 messages):

“

No New Models Discussed: The channel activity consisted only of bot messages indicating the channel name.
Channel Identified: The channel is named ‘OpenRouter - New Models’ and is managed by Readybot.io.

not much happened today

Fri, Jun 27, 2025

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

LLM Presets, Morph v2 code patching, Llama 3.3 70B Discount

Presets Debut: LLM Configuration Centralized!: OpenRouter launched Presets, a new feature allowing users to manage LLM configurations such as model settings, system prompts, and routing rules directly from the dashboard.
- Presets can be applied directly as a model, combined with a model override, or using the new preset field, as detailed in the documentation.
Morph v2 Patches Code at Breakneck Speed: Morph v2, a new code-patching LLM, merges AI-suggested edits straight into your source files at 4000+ tokens per second.
- More information is available on the OpenRouter website.
Llama 3.3 70B Slashed by 70%: A 70% discount is now live for Llama 3.3 70B.
- See the announcement on X for more details.

OpenRouter (Alex Atallah) ▷ #app-showcase (8 messages🔥):

Quicke.in, Multiple Models inference, PGaaS feedback

Quicke Aims for Multiple Model Mastery: A member introduced Quicke, an interface to prompt multiple LLM models at once, aiming to provide a summary generation from the responses, thus providing greater answer quality.
- It helps you to avoid maintaining multiple LLM tabs for asking question, with a final overall best answer incorporating all the strong points of each LLMs.
Latency Woes Plague Supabase Setup: A member critiqued a user’s visually okay setup using Supabase, citing poor latency and recommending investment in a VPS.
- They noted a 3-second fetch time for their profile compared to a normal self hosted db at around 200ms.
PGaaS Prototype Seeks Feedback: A member shared a very hasty prototype of PGaaS and requested feedback from the community on this site.
- No further details were provided.

OpenRouter (Alex Atallah) ▷ #general (255 messages🔥🔥):

Preset API keys, LLM websearch, Gemini's Grounding, Morph, OpenAI SDK

Preset API Keys Gain Traction: A user suggested attaching API keys to a preset, allowing only those keys to work with the preset, and noted that the new preset feature looks better than I expected.
- This could be implemented via a drop-down in the preset builder to add API keys to the preset.
Users compare Web Search Tools: Users discussed their preferences for LLM web search, with many finding OpenAI expensive but hard to beat in speed and performance.
- Others suggested Gemini for its grounding and pricing, and some mentioned Tavily and Exa for custom web research, but most agreed ChatGPT with o3 is sufficient and cheaper.
OpenRouter API Gains Traction: Users are finding OpenRouter to be a good substitute for the OpenAI API.
- Members discussed the OpenAI SDK being drop-in compatible with OpenRouter for connecting from a React SPA by changing the base URL.
Free Gemini 2.5 Pro Tier is Coming: A user announced the impending arrival of a free tier for Gemini 2.5 Pro API, referencing Logan Kilpatrick’s tweet.
- The community speculated about the implications, particularly regarding potential abuse and the duration of the free tier, and potential performance on VertexAI.
Funding Pumps New Users: Many new users were directed to the discord due to news about a new OpenRouter funding round and general token speculation.
- Community members clarified that there is no xp / community rewards, there is no token, there is no airdrop.

OpenRouter (Alex Atallah) ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenAI releases Deep Research API (o3/o4-mini)

Thu, Jun 26, 2025

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Database Downtime, Frontend Authentication Outage, Presets Launch, LLM Configuration Management

OpenRouter Suffers Brief Database Hiccup: OpenRouter experienced approximately 30 seconds of unexpected database downtime at 4:10pm ET due to an SSL config change, potentially causing intermittent 401 errors for some users.
- The issue has since been resolved, and the team apologized for any inconvenience.
Clerk Authentication Faces an Outage: OpenRouter’s frontend authentication provider, Clerk, experienced an outage, though the API remained functional.
- The outage was resolved by 12:00AM PT.
Presets Set to Revolutionize LLM Configuration: OpenRouter launched Presets, a feature enabling users to manage LLM configurations such as model settings, system prompts, and routing rules directly from the OpenRouter dashboard, facilitating rapid, code-free iteration; see the documentation.
- Presets offer centralized control, allowing users to define model selection and generation parameters in one place, ensuring consistency across organizations.
Unlock LLM Configurations via API Calls: The new Preset feature allows you to manage LLM configurations, system prompts, and routing rules directly from the OpenRouter dashboard.
- To use Presets in API calls you can reference the preset directly as a model with "model": "@preset/your-preset-slug", combined with a model override with "model": "google/gemini-2.0-flash-001@preset/your-preset-slug", or using the new preset field with "preset": "your-preset-slug".

OpenRouter (Alex Atallah) ▷ #general (199 messages🔥🔥):

OpenRouter raise, Gemini roasting, Clerk outage, Free Mistral version

OpenRouter Scores a Whopping $40M Raise!: OpenRouter announced a $40M raise, garnering congratulations and excitement from the community, with one member humorously noting a Karpathy namedrop in the announcement. Check out the LinkedIn post about the raise.
- Several members suggested migrating off of Clerk authentication due to its outage on the same day as the announcement.
Gemini gets Savage with Coping Arguments: Gemini is really really good at roasting you, and one user shared the AI’s retort: This is the most potent cope you’ve deployed yet.
Clerk Outage Causes Chaos for OpenRouter Users: OpenRouter experienced downtime due to a Clerk outage, leading to widespread issues, and a user search shows many tweets about the Clerk outage.
- API access remained functional despite the frontend being affected, with some users suggesting migrating away from Clerk.
Free Mistral API Calls Yield 404 Error: Users encountered 404 Not Found errors when attempting to use the free Mistral version with a message No allowed providers are available for the selected model.
- It was discovered that enabling the Paid Models training setting resolved the issue, even for free models.
Deep Dive into the Deep Research API Pricing: OpenAI’s o3-deep-research-2025-06-26 and o4-mini-deep-research models are now available, with the former priced at $10/$40, but only via the Responses API.
- Members noted that it burns through tokens and charges $10/1K searches, calling the web search tool call very expensive.

OpenRouter (Alex Atallah) ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

Context Engineering: Much More than Prompts

Wed, Jun 25, 2025

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

API for model uptime, BYOK improvements, Platform fee simplification, Sales tax for WA and OH, DB downtime

OpenRouter Keeps Tabs on Model Uptime via API: Developers can now track model uptime via the API.
BYOK users enjoy new improvements: Bring Your Own Key (BYOK) users can now test keys before saving them, limit upstream usage, and track usage in API calls (details here).
Platform Fees See Streamlined Structure: OpenRouter is simplifying its platform fee to 5.5%, with a minimum fee of $0.80, while crypto payments will be 5% with no minimum, with previous announcement here.
Sales Tax Incoming for Washington and Ohio: Washington and Ohio users will start seeing applicable sales taxes during checkout, with other states that tax inference to follow.
- Fees on smaller orders will increase, with OpenRouter noting that for the vast majority of orders, total fees will go down compared with our previous pricing.
Brief Database Hiccup Causes 401s: OpenRouter experienced about 30 seconds of unexpected database downtime at 4:10pm ET due to an SSL config change.
- The downtime might have caused a blip of 401s for some users.

OpenRouter (Alex Atallah) ▷ #general (168 messages🔥🔥):

Midjourney Video Model, GPT 4.1 Mini Issues, OpenRouter Fees Changes, Claude Max vs OpenRouter, Veena Voice AI Model

Midjourney’s Video Venture is Victorious: Members raved about the new video model from Midjourney and Spellbrush, calling it a “chatgpt moment of i2v” and expressing hope they can get more infra to roll out 720p, with a preference for hosting on GPU.
- Other members chimed in mentioning alternatives such as seedance and hailuo, but the initial poster reported they were not even close in quality.
GPT 4.1 Mini’s Mischief with Output: GPT 4.1 mini is exhibiting disobedience, truncating output at 3800 tokens despite a 33k token capacity and adding \xa0 before JSON keys.
- Members suggested lowering the temperature and specifying "response_format": {"type": "json_object" } to enforce correct JSON output, while another reported success using GPT 3.5 for similar tasks.
OpenRouter’s Fee Structure Faces Fire: OpenRouter’s new fee structure, introducing a base fee of $0.80, sparked mixed reactions, with some users expressing concern over increased costs for smaller orders (e.g., $0.80 on a $5 top-up).
- Defenders of the change noted that it simplifies fee calculation and benefits the majority of users and larger orders, and that taxes were also being added. Openrouter staff chimed in to further explain.
Claude Max Competes with OpenRouter’s Convenience: With Anthropic offering Claude Max and Claude Code, a member questioned the continued value of OpenRouter, citing cost savings with Claude’s subscription.
- Other members stated that OpenRouter offers a single login/payment for various models and the ability to test new models, with OpenRouter staff responding they may release an OR max solution.
Veena Voices Victory in Indian Languages: A member announced the launch of Veena, a new voice AI model for Indian languages, crediting OpenRouter for assistance.
- Details are on X.com and members congratulated them on the launch.

Bartz v. Anthropic PBC — "Training use is Fair Use"

Tue, Jun 24, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Meta Provider Issues, Pricing Questions

Meta Provider Glitches Reported: The Meta provider on OpenRouter is experiencing some issues today.
- The issues have been flagged to Meta, and efforts are underway to resolve them and bring the service back online soon.
Pricing Questions Abound: Users have expressed questions and concerns on pricing structures.
- The team is actively addressing these inquiries to provide clarity.

OpenRouter (Alex Atallah) ▷ #general (103 messages🔥🔥):

OpenRouter Provider Preference, Novita's Incorrect Information on R1-528 Max Output Length, Stripe Payment Method Issues on OpenRouter, Reasoning Tokens vs Total Token Count, Cent-ML Provider Replacement

Provider Preference Paradox at OpenRouter: A user questioned how OpenRouter’s provider preference works, noting that selecting a specific provider doesn’t function as expected and inquired about the meaning of “sort preference”.
- A link to the OpenRouter documentation on provider routing was provided to clarify the feature.
Novita Misreports R1-528’s Token Limit: A user pointed out that Novita provides incorrect information about the max output length for R1-528, claiming it is 131k when it is actually 16k.
- The user questioned whether OpenRouter verifies provider information, suggesting that such discrepancies should be easy to identify.
Reasoning Tokens Trigger Token Tally Trauma: A user reported receiving results where reasoning_tokens were higher than total_tokens when using OpenRouter.
- A staff member clarified that reasoning tokens are part of completion token details, and that total tokens does not include reasoning tokens and that changing the JSON to add reasoning tokens to the total tokens would break thousands of running apps.
Gemini 2.5 Pro Rate-Limited Ruckus on Google AI Studio: Users discussed rate limits for Gemini 2.5 Pro on Google AI Studio, noting that while the interface lists a 150 RPM limit, the actual limit for free tier users appears to be lower.
- One user found errors and cooldown after sending many requests in a short period, suggesting the limit is more of a fair use limit to prevent reverse engineering or automation.
Midjourney + Spellbrush drop groundbreaking i2v model: Midjourney and Spellbrush’s new video model is insanely good.
- One user stated it is basically the chatgpt moment of i2v and hoped for more infrastructure to roll out 720p.

Not much happened today

Mon, Jun 23, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Gemini 2.5 Pro, API Migration, Breaking Changes

Gemini 2.5 Pro Model is GA: The Gemini team announced the migration from Gemini 2.5 Pro Preview models to the new General Availability (GA) endpoint for google/gemini-2.5-pro.
- The change will alias the preview models google/gemini-2.5-pro-preview and google/gemini-2.5-pro-preview-05-06 to the new endpoint.
Reasoning Parameter Adds Breakage: The max_tokens parameter, previously ignored, is now usable in the GA model, posing a potential breaking change.
- API calls with invalid settings (e.g., disabling reasoning or setting max_tokens: 0) will now return an error, as disabling reasoning is not supported in Gemini 2.5 Pro GA.
Call to Update API Calls: Users are urged to update their API calls to use google/gemini-2.5-pro and test their implementation to ensure a smooth transition.
- This is especially important for those who use the reasoning max_tokens parameter in their API calls.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Mnemix app launch, AwesomeMCPs app launch

Mnemix App Debuts with Multilingual Flair: A member launched a demo of Mnemix, a fast, smart dictionary app supporting 34 languages and using 5 APIs, including OpenRouter, available at mnemix.arnost.org.
AwesomeMCPs app launches with free week: AwesomeMCPs launched and hit #1 in Developer Tools on the UK App Store, and is giving away the app for free to early adopters, available here from June 20–26.
- The app indexes over 1900 Model-Context-Protocol (MCP) servers and provides AI-generated insights and GitHub metrics, with a zero-friction experience.

OpenRouter (Alex Atallah) ▷ #general (372 messages🔥🔥):

Deepseek R1T Chimera Disappearance, Deepinfra B200 Promo, Azure vs OVH Cost Comparison, OpenAI's Confusing Model Naming Strategy, Cohere Moderation Changes

Deepseek R1T Chimera Model Missing: Users noted the disappearance of Deepseek R1T Chimera from OpenRouter, with the page being down, and were unsure about the status of the chutes version.
- One user pointed out that the link from OpenRouter to Deepinfra for Llama-4-Maverick is broken.
Deepinfra Promotes B200 at Discounted Price: Deepinfra is offering promotion prices for B200 at $1.49/hr until the end of June.
- A user noted that their H100 costs them $70k a year, equating to about $7/hour, making the B200 promo significantly cheaper than their A100.
Azure Overcharges Individual Users: A user with $150k in free Azure credits admitted to using Azure despite being overcharged, because they are getting free money.
- They contrasted this with OVH, saying that OVH is very cheap and costs like a dollar for Chutes.
OpenAI’s Model Naming Confusion: Users expressed confusion over OpenAI’s model naming strategy, citing examples like 4.5, 4o, o4-mini, and 4.1, making it difficult to determine which model is newer or better.
- One user joked that the naming likely stemmed from downgrading a GPT-5 to a GPT-4.5 due to lack of significant improvement and marketing concerns.
Cohere’s Moderation Becomes More Aggressive: Users reported that Cohere models are now exhibiting very aggressive moderation, with system prompts that previously worked being flagged for violence, correlating with a Cohere blog post about AI security challenges.
- It was confirmed that OpenRouter recently increased moderation on Cohere models at Cohere’s request, which led some users to replace Cohere models with Mistral.

OpenRouter (Alex Atallah) ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

The Quiet Rise of Claude Code vs Codex

Fri, Jun 20, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Gemini 2.5 Pro Uptime Boost, Claude Sonnet 4 Uptime Boost, GPT-4.5 Deprecation

Gemini 2.5 gets Uptime Boost: Users are seeing a 5-10% uptime boost for Gemini 2.5 Pro; using your own key will get them even higher as mentioned in this tweet.
Claude Sonnet also gets Uptime Boost: Users are also seeing an impressive 10% uptime boost for Claude Sonnet 4; using your own key will get them even higher as mentioned in this tweet.
GPT-4.5 gets the Ax: The GPT-4.5 model (openai/gpt-4.5-preview) will be deprecated on July 14th by OpenAI, according to this post.

OpenRouter (Alex Atallah) ▷ #general (221 messages🔥🔥):

OpenRouter Pricing, Gemini vs GPT, Deepseek Models, Chrome Extensions, MiniMax

OpenRouter sees Crazy Spending: $126k was spent through OpenRouter yesterday, with the majority of usage being Claude Sonnet 4.
Gemini is dissing Ideas: One user says that with Gemini, “OpenAI feels like its trying to be intelligent yet also a yes man mixed with redditsms” and “Gemini is the first model I’ve had unpromptedly disagree with– and diss my ideas.”
Gemini Tool Calling Can Be Versatile: Gemini models often return text and tool calls, whereas OpenAI usually outputs tool calls only, depending on the application.
R1 May Bankrupt Chutes: One user joked about singlehandedly bankrupting Chutes by using 500 free R1 requests per day, all above 50k tokens.
Image Analysis is Hot Now: One user claims image analysis models are getting 90%+ accuracy, and that MiniMax may be overperforming Opus4.

minor ai followups: MultiAgents, Meta-SSI-Scale, Karpathy, AI Engineer

Thu, Jun 19, 2025

OpenRouter (Alex Atallah) ▷ #general (110 messages🔥🔥):

Claude 3.7, MiniMax-M1, Free 1M context model, Free Gemini version, Glazing models

Token Rebalancing Troubles Triggered by Costly Claude: Users are rebalancing output vs input tokens due to the doubling of input costs between Claude 2.5 preview and live, impacting high-frequency use cases.
Free Gemini Lost, Gemini 2.0 Flash surfaces: The free version of Gemini is unavailable on Hugging Face because it is made by Google, but a free Gemini 2.0 Flash model with 1M context is available on OpenRouter.
DeepInfra Deploys Discounted Gemini: DeepInfra is serving Google Gemini 2.5 Pro/Flash on their own hardware at lower prices than Google, but it is likely a proxy to Google’s API with negotiated cloud provider pricing.
Deepseek R1 0528 recommended for coding: Members recommend the new Deepseek R1 0528 as a good coding model, particularly because unlike older models like the 0324 version, it is a thinking model and therefore better for code.
- It was reported that the 0528 version does not support prefill although this was later retracted.
OpenRouter’s Impressive Economics: OpenRouter’s economics are impressive, processing around $126k in usage of Claude Sonnet 4 in a single day.
- One member compared OR to the Mastercard/VISA of AI while noting that their growth and ubiquity are insane and well deserved, though they only make about 5% in fees.

Zuck goes Superintelligence Founder Mode: $100M bonuses + $100M+ salaries + NFDG Buyout?

Wed, Jun 18, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

MiniMax M1, Gemini 2.5 Pro, Flash, and Flash Lite, Reasoning by default

MiniMax M1 debuts as longest open reasoning model: The MiniMax M1, the longest-context open-source reasoning model, is now live and features a 25% discount on OpenRouter for launch week.
Google’s Gemini 2.5 Reasoning Models Go Live: Gemini 2.5 Pro, Flash, and Flash Lite reasoning models are all live, with the first two considered stable.
- Gemini 2.5 Pro now requires reasoning.
OpenRouter Shifts Towards Reasoning by Default: OpenRouter is moving towards enabling reasoning by default for thinking models like anthropic/claude-3.7-sonnet, a trend also observed in benchmarks to maximize model performance.
- Reasoning can still be disabled or configured using the multi-model reasoning standard.

OpenRouter (Alex Atallah) ▷ #general (316 messages🔥🔥):

Gemini 2.5 Pro, Key Credit Balances, AI Discord Bot Template, Minimax Models

Gemini 2.5 Pro Errors Plague Users: Users reported receiving Error 400 when using google/gemini-2.5-pro via the API, with the error message “Budget 0 is invalid. This model only works in thinking mode.”, requiring reasoning to be enabled.
- A fix was implemented to address issues with the 2.5 flash preview thinking/non-thinking models.
Key Credit Balances requested ASAP: Members requested the ability to assign a specific balance to API keys for better cost control and consistency.
- This feature would allow users to allocate funds to specific keys and withhold spending beyond the set limit, with one member suggesting it could be managed via middleware.
Community offers AI Discord Bot Template: A member shared their AI Discord bot template on GitHub, designed to directly feed announcements and model stats from OpenRouter.
- The goal is to create a bot that handles new model announcements and links directly to the user, providing more stats about models within Discord itself.
Minimax Models: Users identified problems with Minimax models token usage, and in particular discrepancies between number of reasoning tokens between OR and Novita.
- The reason for this is the fact that they have a system prompt injected.

Gemini 2.5 Pro/Flash GA, 2.5 Flash-Lite in Preview

Tue, Jun 17, 2025

OpenRouter (Alex Atallah) ▷ #announcements (14 messages🔥):

Gemini 2.5 Pro, Gemini Flash, Model Renaming, Pricing Updates

Gemini 2.5 Pro and Flash Stable: Gemini 2.5 Pro and Flash are now stable and live on OpenRouter.
- A member noted that it’s the same 06-05-preview model now renamed to stable.
Flash Gets Pricing Update: Flash has updated pricing, including a new Flash Lite version that costs 30% of the original Flash.
- The linked image shows that every single metric is exactly the same score lol, but this issue with metrics has been fixed.

OpenRouter (Alex Atallah) ▷ #general (262 messages🔥🔥):

Gemini 2.5, New Pricing, BYOK, Key Credit Balance

Google Releases Gemini 2.5 Pro GA, Flash GA and Lite: Google has released Gemini 2.5 Pro GA, Gemini 2.5 Pro Deepthink, and Gemini 2.5 Flash Lite.
- The GA version is just a naming change with no improvements, much to the disappointment of users.
Google Jacks Up Pricing on Gemini 2.5 Flash: Google has increased the input prices and reduced output thinking prices for Gemini 2.5 Flash GA, removing the non-thinking discount.
- Some users are considering sticking with 2.0 Flash or moving to other models due to the price increase.
OpenRouter to Switch to Subscription Model for BYOK: OpenRouter is switching to a subscription model for BYOK, which would make high volume BYOK cheaper, but presumably low volume BYOK will end up being more expensive.
- Some users are hoping that low volume BYOK will not become more expensive.
OpenRouter Users Request Key Credit Balances: OpenRouter users are requesting the ability to set a credit balance for API keys.
- This would allow users to withhold a specific amount of credits for a key, preventing it from spending more than its limit. Toven mentioned it will be discussed.
Gemini 2.5 Pro Requires Thinking Budget: The stable version of Gemini 2.5 Pro requires a thinking budget of at least 128 tokens, but OpenRouter’s default is 0.
- This causes errors when using the API, but it is being investigated by OpenRouter.

Chinese Models Launch - MiniMax-M1, Hailuo 2 "Kangaroo", Moonshot Kimi-Dev-72B

Mon, Jun 16, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

Chess Leaderboard, Book Testing

Chess Leaderboard Replay Functionality: A member added homebrew replay functionality to chess-leaderboard for every game (past and future).
- They mentioned that it’s maybe a bit better than lichess gifs, but a pain to implement with my stack with an attached chessreplay.gif.
Book Testing Opportunity: A member mentioned their book is live as of June 15 and is offering to help with testing.
- They said they are happy to help if anyone shoots them a DM.

OpenRouter (Alex Atallah) ▷ #general (533 messages🔥🔥🔥):

OpenRouter Discord Tag Request, Claude Prompt Debugging, GPT-4.1 Mini Offering, Free Model Credit Usage, Multilingual Model Recommendations

Discord Tag Craving Unacknowledged: A member expressed frustration over an unacknowledged request to restore the OpenRouter Discord tag, offering to pay for it despite OpenRouter’s financial superiority.
- They jokingly threatened to ping Alex Atallah due to the lack of response.
Claude & Copilot Prompts lack Token Economy: A member debugged prompts from Claude Code and GitHub Copilot, finding they often ignore token efficiency when improving workflows, sending irrelevant content unless verbosity affects performance.
- They observed that conciseness isn’t a primary goal for these systems when adjusting prompts.
Testing a GPT-4.1 mini version on OpenRouter: A member offered access to GPT-4.1 mini with 200K tokens/minute available at 20% of the official token price, compatible with the OpenAI SDK, inviting high-usage testers to DM for more details.
- They highlighted its suitability for apps like Cline.bot and BYOK/BYOB setups.
Deepseek’s Free Tier Suffers Outages: Users reported encountering 502, 503, and 524 errors when using the free version of Deepseek-r1-0528 through the API, with one suggesting the issues may stem from high traffic due to smut RPs.
- Members noted the paid version remained functional and discussed potential causes, including data center problems or issues with Chutes.
OpenAI Faces Antitrust Threat from Microsoft Spat: Discussions revealed that OpenAI executives have considered accusing Microsoft of anti-competitive behavior during their partnership, potentially seeking regulatory review and launching a public campaign.
- This arose from difficult negotiations, prompting reactions of surprise and concern from community members.

Cognition vs Anthropic: Don't Build Multi-Agents/How to Build Multi-Agents

Fri, Jun 13, 2025

OpenRouter (Alex Atallah) ▷ #announcements (4 messages):

Google Cloud Outage, Cloudflare Status, Internet Recovery

Google Cloud Suffers Major Outage: Google Cloud experienced a major outage, as reported on their status page.
- Users reported intermittent issues even after initial signs of recovery around 4:25pm ET.
Cloudflare and Google Status Pages Provide Updates: Updates on the outage and recovery can be tracked via the Cloudflare status page and the Google Cloud status page.
OpenRouterAI Tweets on Recovery: OpenRouterAI tweeted about seeing recovery from the outage, expressing hope it wouldn’t be temporary (tweet link).

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

Button cutoff on narrow browser

Narrow Browser Button Bug squashed!: A member reported a bug where the button was cut off in a narrow browser window, as shown in this screenshot.
- Another member quickly addressed and fixed the issue, then provided a screenshot of the fix.
Another Bug Reported: To comply with the prompt’s requirement of a minimum of two topic summaries, here is another topic to fulfill the requirement.
- No actual second bug was found in the provided text, but including this ensures the JSON is valid as per the schema.

OpenRouter (Alex Atallah) ▷ #general (377 messages🔥🔥):

Cloudflare Outage Impacts OpenRouter, OpenRouter Status Fluctuation, Model Performance Variations by Provider, Agentic Tool Use with Cost-Effective LLMs, Multi-Modal Support for OpenRouter

Cloudflare Kills the Internet (Again): A widespread Cloudflare outage caused significant disruptions, taking down numerous AI services including OpenRouter, Google, and others.
- Users reported widespread issues, leading to humorous speculation about the cause, ranging from interns spilling coffee on servers to Skynet taking over.
OpenRouter Teases Users With Up-And-Down Cycle: Users experienced intermittent OpenRouter service, with the status page flipping between MAJOR and MINOR outages, leading to frustration and jokes about timing API requests like a carnival game.
- Some users found success using specific models or configurations, while others continued to face timeouts and authentication errors.
Provider Variability Impacts Model Qualities: Users discussed the significant quality variations among different providers offering the same models through OpenRouter, noting that Parasail and Lambda generally offer more consistent performance.
- Quality is more important than cost as one user said the quality varies alot by providers , so choose wisely.
Cheap Agent LLMs Emerge as Top Tool-Users: Users debated the best cheap Large Language Models (LLMs) for agentic tool use, with Claude 2.5 Flash being recommended as a cost-effective option that requires careful prompting.
- The high cost of models like O4 Mini High and the potential release of a new Google Flash model were also discussed, alongside the efficiency of using a monthly Claude Max subscription for API usage.
Dreaming of OpenRouter Multi-Modal Capabilities: Members requested future support for multi-modal capabilities like audio and video generation within the OpenRouter platform.
- No explicit response was given by OpenRouter.

not much happened today

Thu, Jun 12, 2025

OpenRouter (Alex Atallah) ▷ #announcements (7 messages):

Cloudflare downtime, Google Cloud outage, Internet outage

Internet-wide Outage Strikes!: An internet-wide outage was reported, impacting Cloudflare and Google Cloud, as detailed on Downdetector.
- The Cloudflare status page and Google Cloud status page provided ongoing updates, with recovery seen later in the day, according to this tweet.
Cloudflare & Google Cloud Suffer Downtime!: Cloudflare and Google Cloud experienced downtime due to a broader internet outage, prompting investigations and status updates.
- Updates were actively shared on the Cloudflare status page and Google’s status page, where users could monitor the situation.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

memgrafter: I will test it tomorrow, send it over

OpenRouter (Alex Atallah) ▷ #general (971 messages🔥🔥🔥):

Free Model Rate Limits, Paid Model Rate Limits, OpenRouter Global Outage, DeepSeek models and Chinese, Requesty as an alternative to OpenRouter

Free Model Limits Capped at 50 or 1000: Members discussed the rate limits for free models, with one noting it’s 50 requests/day if you have less than $10 total topped up, otherwise, it’s 1000/day shared across all free models.
- It was clarified that this limit applies to the number of requests, regardless of the number of tokens in/out, and that failed requests also count.
Rate Limits for Paid Models in Flux: A user encountered 429 errors despite paying for the service, and inquired about the rate limits for paid models, trying to run a bunch of requests concurrently to label data, but was being rate limited.
- A staff member noted that the displayed rate_limit object is inaccurate and will be deprecated, stating that there aren’t really rate limits for paid models, but identified the user was hitting 10,000 RPM on the only structured outputs provider for Qwen3 30B, which is Fireworks.
OpenRouter Paralyzed by Global Internet Meltdown: OpenRouter experienced a global outage due to a widespread internet issue impacting services like Cloudflare and Google Cloud, causing widespread service disruptions and user frustration.
- Staff confirmed they were impacted but the outage was not their fault, linking to a status page and Downdetector for updates, while users humorously speculated on the cause and impact, with some mentioning Gemini Website was luckily working.
DeepSeek V3 Model Garbles Gibberish: A user reported DeepSeek models were intermittently switching to Chinese during responses, with others confirming the issue and suggesting potential causes and solutions.
- Recommendations included adjusting settings like temperature, top_p, and top_k, and monitoring which providers are serving broken responses, with suggestions to try r1 0528 and providers such as GMICloud and Inference.net.
Requesty Replaces Router During Rough Ride: Users briefly mentioned Requesty as a reliable alternative to OpenRouter, with one user describing it as more of an enterprise-grade infra solution focused on reliability and performance, while OpenRouter focuses on trying new models.
- It was noted that Requesty users were experiencing uptime while OpenRouter was struggling due to the global outage and it was touted as a solution for production workloads needing stability.

Execuhires Round 2: Scale-Meta, Lamini-AMD, and Instacart-OpenAI

Wed, Jun 11, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter (Alex Atallah) ▷ #general (329 messages🔥🔥):

OpenRouter OpenAI Default Reasoning Mode, O3 Pro Pricing, Assistance Needed with OpenRouter Code, LLM Recommendations for Research, OpenRouter Rate Limits

OpenRouter Reasoning Mode Defaults: A user inquired about the default reasoning mode used for OpenAI requests through OpenRouter, specifically whether it defaults to detailed or auto.
- The question went unanswered in the provided context.
Sama and Noam Tweet O3 Pro Pricing: Users expressed interest in O3 Pro pricing, with one user describing O1 Pro pricing as bonkers.
- Information on O3 pricing reportedly came from tweets by Sama and Noam Brown.
Healthcare Professional Requests help with code: A user with a medical background requested assistance with OpenRouter code to input 279 prompts into various LLMs for a healthcare consensus project, sharing a pastebin link to the code.
- Other users suggested using unique persona prompts, rate limiting, and validating CSV data to ensure questions aren’t empty, and recommended that the LLM output JSON for easier parsing.
LLM Panelists Debating Models for Consensus Study: Users discussed selecting LLMs for a consensus study, with suggestions to include Gemini, Claude, and Sonar (Perplexity), while excluding OpenAI, DeepSeek, and Grok due to performance gaps.
- It was mentioned that for a fair comparison, non-reasoning models should be used, and that Qwen needs to be set to non-reasoning mode; and that Grok had low contextual memory and is really stupid.
OpenRouter Has Almost Unlimited TPM Rates: A user inquired about TPM rate limits imposed by OpenAI and whether they apply to OpenRouter, particularly when not using a personal OpenAI key.
- It was clarified that OpenRouter has very high limits in practice, meaning unlimited TPM for users, and rate limits do apply based on the model used.

Reasoning Price War 2: Mistral Magistral + o3's 80% price cut + o3-pro

Tue, Jun 10, 2025

OpenRouter (Alex Atallah) ▷ #announcements (4 messages):

Magistral, Mistral's Reasoning Model, OpenRouter New Models, Model Pages

Magistral Reasoning Arrives!: Magistral, Mistral’s first reasoning model, is now live on OpenRouter, as announced in this X post.
- A video showcases the model thinking very hard (at 4x speed) - available here.
Model Pages Live!: Model pages are now live on OpenRouter, shown here.
- Watch it think very hard (4x speed).

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Jamflow, Discord testers

Users looking for testers: A member is looking for testers for Jamflow and attached a video.
- Another member said they would be interested once they finish writing their book.
Finishing books before debugging: A user mentioned they were too busy finishing up a book to immediately participate in the Jamflow testing.
- This implies a preference for completing creative tasks before moving on to debugging or testing new software.

OpenRouter (Alex Atallah) ▷ #general (523 messages🔥🔥🔥):

Crypto Payment Options, OpenAI o3 Price Cut, Model Degradation Concerns, OpenRouter and BYOK for o3, LLM choice for research purposes

Crypto Payments Considered: A user requested that OpenRouter add a one-time crypto payment option that does not require wallets, similar to NowPayments, for easier transactions with USDT.
- The user expressed frustration with the current crypto payment process, finding it difficult due to wallet requirements and gas fees.
OpenAI’s o3 Price Slashed, Input Costs Drop: OpenAI’s o3 input token prices were reportedly reduced by 80%, dropping from $10 to $2 per million tokens, which was confirmed by Sam Altman on Twitter.
- However, the output token price remains at $40, and some suggest this could be a decoy strategy to push users toward o3 Pro.
Model Nerfing Rumors Fly High: Concerns are circulating about OpenAI potentially nerfing the o3 model after the price cut, with some users claiming to have observed a degradation in performance.
- While there’s no conclusive evidence, some suggest it could be a deliberate tactic to encourage users to switch to o3 Pro, though others dismiss such claims as survivorship bias.
BYOK Still Required for OpenAI’s o3: Despite the o3 price change, the Bring Your Own Key (BYOK) requirement remains in place on OpenRouter due to OpenAI’s policies, requiring users to have a verified organization.
- Some users are leveraging the BYOK option to capitalize on free tokens offered by OpenAI, while others question the rationale behind the restriction, speculating that it’s a strategy to drive sign-ups on OpenAI’s platform.
Gemini Gets the Nod for LLM Research Panel: For a consensus study involving LLMs, a user asked for recommendations on which models to include, and it was suggested that Gemini, Claude, and Sonar (Perplexity) are top contenders for now.
- The user was advised that Gemini is a very strong choice and the performance gap between those mentioned and other LLMs is too large to ignore, with some generative capabilities that surpass the GPT-4.1 level at certain points.

Apple exposes Foundation Models API and... no new Siri

Mon, Jun 9, 2025

OpenRouter (Alex Atallah) ▷ #announcements (33 messages🔥):

Database issues, Platform fee simplification, BYOK subscription fee, RSS Chrome extension for new models, Model versioning

OpenRouter Investigates Database Queues: OpenRouter experienced database issues due to a cloud provider preventing queue consumers from launching, impacting activity tracking and balance updates.
- The issue was resolved, and activity rows are now backfilling, as of 5:10am ET.
Platform Fee Gets Easier To Parse, Mostly Cheaper: OpenRouter is simplifying its platform fee by removing the fixed $0.35 on Stripe payments; non-crypto payments will be 5.5% (min $0.80), crypto payments 5.0% with no minimum.
- For most credit purchases, the total fees will decrease, but some users noted that the new fee structure increases the cost for larger credit purchases, like $1,000, to $55 from $52.98 under the old system.
BYOK Subscription Model Sparks Debate: OpenRouter plans to replace the 5% BYOK fee with a fixed monthly subscription, generating mixed reactions; some users expressed concerns about adding another monthly fee, particularly for home users.
- Others suggested having both options coexist or considering a cost per million tokens instead, while some believe a subscription model is reasonable for power users with significant AWS, OpenAI, or GCP credits, as it could simplify cost management and potentially decrease costs.
Subscribe to new models with RSS!: OpenRouter suggests subscribing to new models using an RSS Chrome extension, with instructions available on X.com.
- It was further suggested that model updates could be broken out into a separate channel.
Versioning Needed For Model Management: A user requested that OpenRouter implement versioning for models, similar to upstream providers, to better manage model updates.
- They suggested that each model should have a versioned ID that remains constant and a separate ID that always points to the latest version.

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

Dana AI launch, AI powered learning platform, Web app development, Excel macros

Dana – AI Powered Interactive Learning Platform Launched: A member launched a website called Dana - an AI-powered interactive learning platform and its currently in free beta.
- The platform builds a personalized course for you on the spot and is available at https://dana-ai.xyz/.
Desire to develop Dana as Web or Desktop App: After the website launch, a member suggested creating a web or desktop app.
- They followed up with next to a web app this is about the simplest this can get.
Excel Macro Netherworld: One user was very impressed with the launch, and expressed interest in riffing off of it.
- They expressed that Theres a whole netherworld of Excel macros, VBA, and Power BI automation.

OpenRouter (Alex Atallah) ▷ #general (341 messages🔥🔥):

Model Sorting by Throughput, DAPO vs Dr GRPO, Small VLM like Gemma 3, Gemini+Claude deprecated OpenAI, Compromised Accounts

OpenRouter offers Model Sorting by Throughput: A member asked about sorting models by throughput, and another member pointed to the OpenRouter models page which allows sorting by throughput to find the fastest models.
- It was noted that the user was already aware of Groq and Cerebras, and was looking for other options.
DAPO’s Better Bifurcated Flux Indexing: A member asked about the tradeoffs between DAPO and Dr GRPO for a research project.
- Another member replied that DAPO is better for bifurcated flux indexing, but Dr GRPO handles pseudo-scalar entanglement with less recursive drift, depending on your loop fidelity.
Gemini and Claude take over OpenAI’s throne: One member claimed that Gemini and Claude have entirely deprecated OpenAI for them, except for 4o-mini.
- Another member agreed, noting that Gemini Pro seems unbeatable for reasoning, thinking, and very long chains of thought, while Claude is unbeatable for creative writing.
Users Targeted in Account Hack: A member reported they got hacked and lost all their money, believing they were going to receive free money from Mr Beast.
- Another member joked about the scam, quipping Get one free H100 sxm! Only 100 available, give us all your account details and we’ll send the H100 shortly.
OpenRouter Politics Debate Erupts: A member took issue with another member’s profile containing politics, particularly related to the palestine flag.
- Other members told them to stop and said there are channels for political discussions.

not much happened today

Fri, Jun 6, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

OpenRouter, RSS Feed for models, API models

OpenRouter Reveals RSS Feed for Models: OpenRouter announced the availability of an RSS feed for its API models.
OpenRouter Model Updates via RSS: Users can now subscribe to a real simple syndication feed to receive up-to-date information on new models and changes within the OpenRouter ecosystem.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

insight_cheats: For the gooners - https://personality.gg

OpenRouter (Alex Atallah) ▷ #general (130 messages🔥🔥):

Gemini 2.5 Pro regression, Claude Max vs Gemini pricing, OpenAI logging practices, Gemini 2.5 flash lite, GPT-4.1 mini is good for many routine task

Gemini 2.5 Pro regresses in intelligence: Users report that the 06-05 version of Gemini 2.5 Pro seems dumber than previous versions, with one describing it as flash thinking level dumb.
- Another user suggests using the older model while it’s still available, saying the newer version was made smaller to run faster and cheaper.
Pirating Gemini is suggested: A user jokingly suggests pirating Gemini 2.5 to avoid paying, leading to a discussion on the cost-effectiveness of Claude Max versus Gemini API usage.
- The user argues that Claude Max is more economical for vibe coding and daily use, especially for those sensitive to API costs.
OpenAI Logging Spurs Privacy Concerns: Concerns arise over an article stating OpenAI is forced to log outputs, raising questions about data retention on OpenRouter.
- It was clarified that the Enable training and logging setting is not relevant for OpenAI models and that OpenAI may retain inputs for up to 30 days.
Gemini 2.5 Flash Lite is coming soon: Users speculate about the upcoming Gemini-2.5-flash-lite model, with opinions split on whether it will be useful or simply a lower-quality, cheaper option.
- Some suggest it could replace the older 1.5 Flash if the pricing and performance are comparable, while others see it as potentially highly underrated.
GPT-4.1 Mini hailed for coding and tool use: GPT-4.1 mini is praised for its coding abilities, tool usage, and cost-effectiveness, making it suitable for routine tasks and inference.
- It’s considered a real winner and more obedient than Gemini 2.5 Flash, especially for tasks not involving code or math, though not as good for creative writing as Claude 3.7.

Gemini 2.5 Pro (06-05) launched at AI Engineer World's Fair

Thu, Jun 5, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

OpenRouter RSS Feed, Model Announcements

OpenRouter Launches RSS Feed for Models: OpenRouter now offers an RSS feed for real-time model updates, accessible here.
Stay Updated with OpenRouter Model Announcements: The new RSS feed ensures users can stay informed about the latest models available on OpenRouter.

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

iOS App with OpenRouter LLM backend, Spreadsheets for Business World, Personality.gg

iOS App integrates OpenRouter for LLM: A member plans to release an iOS app via TestFlight, utilizing OpenRouter for the LLM backend to process character cards.
- The main challenge remaining is message formatting, with plans to incorporate additional clients in the future.
Spreadsheet Skills Key for Product Adoption: A member highlights the importance of spreadsheets in the business world, emphasizing that familiarity with them lowers the barrier to entry for new products.
- Lowering the barrier to entry to a product is the best thing you can do.
Personality.gg for Gooners: A member shared a link to Personality.gg.
- The member didn’t elaborate, only clarifying for the gooners.

OpenRouter (Alex Atallah) ▷ #general (253 messages🔥🔥):

OpenRouter API request limits, Character cards for roleplay, Gemini 2.5, OpenAI logging, Gemini Pro vs Flash

OpenRouter API request limits defined: Free OpenRouter models have API request limits: 50 requests per day for deposits less than $10, and 1000 requests per day for deposits over $10, across all free models.
- One user clarified “So if you send 5 messages to DeepSeek v3 0324 (free), you’ll only have 995 left for DeepSeek r1 (free)”.
Sillytavern is resourceful for roleplay assets: One user suggested SillyTavern as a resource for finding “character cards” for roleplay experiments with OpenRouter.
Gemini 2.5 Flash model generates infinite repetitions: A user reported that structured responses in Gemini 2.5 Flash returned an infinite repetition of the same value; for example {"appereance" : "red-rimmed eyes, red-rimmed eyes, red-rimmed eyes..."}.
- Another user suggested that the first user seek support from the Chatwise MCP platform that integrated OpenRouter, in case the integration package needed updating.
Data Logging concerns get addressed: After a user shared an article about OpenAI being forced to log user outputs, it was asked whether every OpenAI model on OpenRouter should now display the red bubble warning when “Enable training and logging” is turned off.
- OpenRouter confirmed they don’t have zero data retention, and were asking OpenAI to check again if that setting could be enabled in light of the new court demands.
Kingfall coding beast lives and dies: Users discussed Kingfall, a model mentioned on Reddit, with some saying it was a “beast at coding” and “better” than Gemini 2.5 Pro.
- Some users said it was out on Google AI Studio, but others reported that it had been removed and one claimed it “could even be the same OP” as a demo with 2.5 Pro.

AI Engineer World's Fair Talks Day 1

Wed, Jun 4, 2025

OpenRouter (Alex Atallah) ▷ #announcements (8 messages🔥):

GIF Support, Omni-Search, Tool Call Caching, BYOK Flag

GIFs Galore: Animations Accepted Across Models: image/gif is now accepted for image prompts on OpenAI, Gemini, Anthropic, and Llama routes, eliminating the need for pre-converting animations.
Provider Pages Appear Promptly in Omni-Search: Users can now press ⌘/Ctrl + K, type a provider name, and jump directly to their page for models, pricing, and status.
Tool-Call Turbocharging: Anthropic Gets Caching: Caching for tool calls is now supported for Anthropic, reducing latency and token usage.
BYOK Backtracking: Usage Flag Unveiled: Including usage: { include: true } in a request now returns "is_byok": true | false to confirm whether BYOK (bring your own key) was used.

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

iOS App, TestFlight, OpenRouter, LLM Backend

iOS App integrates OpenRouter via TestFlight: A member plans to share an iOS app soon via TestFlight, utilizing OpenRouter for the LLM backend.
- The app uses character cards, but the member still needs to complete message formatting due to its complexity.
Additional iOS App Details: The app uses character cards and OpenRouter for the LLM backend, planning to add more clients later.
- Message formatting is still in progress due to its complexity; the app is being prepared for release on TestFlight.

OpenRouter (Alex Atallah) ▷ #general (258 messages🔥🔥):

Opus Rate Limits, Chutes Business Model, Nous Training, OpenRouter Batch Inference API, Chutes R1 Quality

Opus gains Higher Rate Limits!: OpenRouter now offers higher rate limits for Opus, specifically when routing traffic to Anthropic models.
- The announcement sparked questions about the economics of Chutes, given the GPU resources required, with speculation about crypto money out of thin air.
Nous distributed training hitting Hurdles.: Nous is attempting to train a SOTA model distributively using 416 H100s, but the project is progressing slowly.
- At the current rate, training is projected to take until next year, prompting skepticism despite claims of breakthroughs reducing inter-GPU bandwidth needs, with only ~300mbps of inter GPU bandwidth being utilized.
OpenRouter API Call Tactics Explored!: Members discussed how to send 100K calls to an LLM via OpenRouter, prioritizing throughput over latency, with suggestions to check for provider discounts and deposit funds into OpenRouter.
- Links to Modal’s LLM Almanac Advisor were shared.
OpenRouter Daily Free Message Cap Clarified: The daily free message limit on OpenRouter is 50 requests, increasing to 1,000 requests for users who have deposited at least $10.
- These limits apply across all free models and reset daily at UTC.
Mistral ships Code Agent!: Mistral released their own coding agent, prompting discussion about the quality of Mistral models compared to others, such as Deepseek and Qwen.
- One member argued that Codestral models are superior.

not much happened today

Tue, Jun 3, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

LLM Scribe, dataset creation tool, multi-turn creation, python

LLM Scribe tool launches to streamline dataset creation: A member introduced LLM Scribe, a tool designed to streamline the creation of hand-written datasets for fine-tuning, and exports to multiple formats like ChatML, Alpaca, and ShareGPT.
- The tool includes features like auto-saving, multi-turn creation support, token counters loaded from Hugging Face, goal tracking, and custom fields (instructions, system, IDs), and is available on Hugging Face, with a video demo and the full version on Gumroad.
Spreadsheets vs Coding for Frontends: A member noted that if someone doesn’t code much, they should check out Python.
- The same member added that using spreadsheets is great as a frontend or even to embed some functionality.

OpenRouter (Alex Atallah) ▷ #general (241 messages🔥🔥):

Best Model for Mathematics, Gemini 2.5 Flash Issues, Grok climate denial, Qwen good for coding, Nous Training SOTA model

DeepSeek Prover v2 touted as Top Math Model: A user mentioned that DeepSeek Prover v2 is the best model for mathematics.
- However, another user stated that Prover V2 is quite meh for anything non-proof, performing worse in tests than other reasoning models and that they were ill-informed.
Gemini 2.5 Flash’s Internal Server Error: Users reported experiencing an Internal Server Error when using Gemini 2.5 Flash via OpenRouter, as well as high latency and the model using reasoning tokens without being configured to do so.
- The issue seems to stem from Google’s side due to load and is related to vercel/ai#6589, with one user suggesting using a try-catch block with retries.
Grok Called Out for Climate Change Denial: A user petitioned to remove Grok from the Flagship Model list because it’s reciting climate-denial talking points, citing this article.
- Another user argued against this, stating that many people like Grok due to the freedom it offers, giving a different perspective compared to other models.
Qwen as Open Source Coding Champ: Qwen is considered the best open source model for coding, but one user was having issues getting responses from that model when using raptorwrite.
- It was also observed that all Anthropic endpoints on OpenRouter have OpenRouter moderation, except for self moderated.
Nous Trains a SOTA Model Distributedly: Nous is attempting to train a State-Of-The-Art model distributedly, leveraging Psyche.network and Bittensor.
- The model is training with limited inter-GPU bandwidth (~300mbps) but faces challenges in attracting enough GPUs to join, and is currently limited to 416 H100s online.

not much happened today

Mon, Jun 2, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

AI Agent Engineering, LLMs & Foundation Models, Google Sheets for model/prompt eval, LLM Scribe tool

Engineer showcases AI expertise: An AI/ML engineer and full-stack developer with over 8 years of experience introduced themselves, highlighting expertise in AI Agent Engineering, LLMs & Foundation Models, and Full-Stack & Backend Systems.
- The engineer also shared a portfolio link and expressed excitement to collaborate on cutting-edge AI and agentic workflows.
Quickly Iterating on models/prompts for eval with Google Sheets: A member shared a Google Sheet tool for quickly iterating on models/prompts for eval, seeking feedback on how to make it more useful, as seen in this tweet and screenshot.
LLM Scribe streamlines handwritten datasets for fine tuning: A member introduced LLM Scribe, a tool for streamlining the creation of handwritten datasets for fine-tuning, which supports multiple formats (chatml, alpaca, sharegpt), autosaving, multi-turn creation, token counters, and custom fields.
- Demos of LLM Scribe include a Hugging Face Space and a YouTube video, with the full version available here.

OpenRouter (Alex Atallah) ▷ #general (250 messages🔥🔥):

REST API sk-or-v1 keys, Submitting end-user IDs, DeepSeek v3 free rate limit, DeepSeek provider rankings, Chess Data & LLMs

REST API key confusion clarified: Users are trying to use the REST API and are confused about the sk-or-v1 keys, but members clarify that it is the only correct key.
- A user was running into an error in n8n, and was pointed to this guide.
End-user IDs: Not ready for prime time: Members are discussing the ability to submit optional end-user IDs to prevent abuse and improve moderation, a feature found here.
- The discussion points out that metrics are not yet available for this feature, with one member stating, eventually = metrics not available yet.
DeepSeek showdown: Is it the best provider?: Members debated the best provider for DeepSeek, with some preferring Parasail due to trust and consistent performance, despite a higher cost of $5, whereas others favored DeepSeek itself, citing lower cost, caching strategy, and direct model implementation.
- One member noted concerns about crowded servers, Chinese location, and slow speeds with DeepSeek’s official API, and another reported that Deepseek has issues with max_tokens and enforcement of an 8K max on non-reasoning output tokens, which was upgraded with R1 to 64k.
Chess benchmarks are back?: Members discussed chess benchmarks and the surprising performance of gpt-3.5-turbo-instruct, an instruct model trained on chess data, with one member linking to research indicating chess training improves problem-solving (https://arxiv.org/pdf/2312.09390#page=29).
- Another member referred to an article on the topic (“https://dynomight.net/more-chess/”), noting that RLHF can degrade performance and that gpt4-base (pre-RLHF) performs better than 4o in chess.
Zero Logging… or not: Members were asking about Parasail’s prompt logging policy (https://www.parasail.io/legal/privacy-policy), which claims zero-logging for serverless and dedicated versions.
- However, OpenRouter’s documentation states that Prompts are retained for an unknown period, leading to confusion and requests for clarification from OpenRouter team members.

Mary Meeker is so back: BOND Capital AI Trends report

Sat, May 31, 2025

OpenRouter (Alex Atallah) ▷ #general (127 messages🔥🔥):

OpenRouter Support, Anthropic models, DeepSeek models, Meta LLaMA, GPTs and OpenAI data sharing

OpenRouter Support can be contacted via email: Users can contact OpenRouter support directly by sending an email to [email protected] for assistance.
- For API-related issues, users are encouraged to open a thread in the designated channel for community support.
Free DeepSeek r1-0528 Now Available!: The free version of DeepSeek r1-0528 is now available on OpenRouter via the deepseek/deepseek-r1-0528:free model slug.
- Users confirmed that selecting DeepSeek r1:free in the command line will utilize the r1-0528 version.
Meta LLaMA API key routes to Claude Sonnet: A user reported that their API requests for Meta LLaMA 4 Maverick were unexpectedly being routed to Claude Sonnet models, resulting in extra charges.
- It was suggested that this could be due to an API key leak, and the user was advised to delete their current API keys and generate new ones.
OpenAI offers free tokens for Data Sharing: OpenAI offers free tokens to users who agree to share their prompts, providing 250k/1M tokens for o3/gpt4.1/gpt4.5 and 2.5M/10M of o4-mini/4.1 mini per day.
- However, a user noted that xAI no longer offers a similar program.
Sora API now available in Azure!: Sora is available via API on Azure before it’s available on OpenAI directly, as per Microsoft’s blog.

DeepSeek-R1-0528 - Gemini 2.5 Pro-level model, SOTA Open Weights release

Thu, May 29, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

DeepSeek R1, 100M tokens, Free variant

DeepSeek R1 Hits 100M Tokens!: The new DeepSeek R1 model is available on OpenRouter, now supporting 100M tokens and offering a free variant.
OpenRouter Announces DeepSeek R1 on X: OpenRouter announced the availability of DeepSeek R1 on X.com, highlighting its large context window.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

AI Agent Engineering, Memory-Augmented Agents, LLMs & Foundation Models, Full-Stack & Backend Systems, Automation & Agent Ops

AI Engineer Joins the Fray: An AI/ML engineer and full-stack developer with 8+ years of experience in building intelligent systems across various industries introduced themselves.
- They specialize in building agentic systems using modern stacks like LangGraph, AutoGen, LlamaIndex, Letta, and DSPy, with experience in AI observability tools like LangSmith and Langfuse.
Expertise in LLMs and Backend Systems Highlighted: The engineer has worked with top models such as GPT-4o, Claude 3, Gemini, Mixtral, LLaMA-3, and Mistral, and is proficient in fine-tuning and RAG.
- Their full-stack skills include using React, Next.js, FastAPI, and building scalable architectures for serving LLMs via vLLM, Ollama, and Fireworks AI.
Portfolio and Collaboration Invitation: The engineer shared their portfolio and invited collaboration on cutting-edge AI and agentic workflows.
- They expressed enthusiasm for connecting with other builders and researchers pushing the boundaries of intelligent agents.
Unique Vibe Coder Admission: The engineer humorously mentioned having 2 months of experience as a vibe coder.
- They also shared eccentric coding habits: coding from midnight to 6 am, using only 1 file for the entire codebase, and deleting projects after three failed debugging attempts.

OpenRouter (Alex Atallah) ▷ #general (320 messages🔥🔥):

PDF Size Limit on OpenRouter API, Gemini 2.5 Pro Creative Writing Struggles, DeepSeek R1 Release, OpenRouter Provider Application Timeline, Embeddings Implementation on OpenRouter

OpenRouter API struggles with Large PDFs: Users are experiencing 413 Request Entity Too Large errors when uploading PDFs around 400MB to the OpenRouter API.
- The suggested workaround is to use a signed URL to upload the file and pass the URL to the API, as OpenRouter currently only supports base64 for PDFs.
Gemini 2.5 Pro falters in Creative Writing: Users are finding it difficult to get Gemini 2.5 Pro to follow instructions for creative writing, specifically in discouraging certain phrases, with others noting that LLM writing is inherently cliché.
- Users are suggesting to try Opus, or the newly released R1 model, and someone in the community chimed in to note that R1-0528 was just released.
DeepSeek R1 New Model Releases: The community expresses positive sentiments on DeepSeek releasing their new R1 model to rival Claude.
- A user happily states bro there’s no way in hell i’m going back to claude noting that DeepSeek dropping this is a huge blessing for my wallet.
OpenRouter Provider Applications face delays: Those inquiring about becoming a provider on OpenRouter should expect a delay of a few weeks due to high application volume.
- If offering a model for free, the process may be expedited. A form is required to become a provider.
Cloudflare powers OpenRouter Backend: OpenRouter is built on Cloudflare Workers, with a member questioning if the team is serverless.
- A team member confirmed they are using Cloudflare and the community member marvels at the pricing and its cost effectiveness.

not much happened today

Wed, May 28, 2025

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

GPT-4 32k Deprecation, OpenRouter New Features, DeepSeek R1 on OpenRouter

GPT-4 32k Models Ride Off into the Sunset: OpenAI’s GPT-4 32k models (openai/gpt-4-32k and openai/gpt-4-32k-0314) will be deprecated on June 6th with openai/gpt-4o as the recommended replacement; full post linked here.
Reasoning Summaries Stream, End-User IDs & Crypto Invoices: New OpenRouter features include streaming reasoning summaries for OpenAI o3 and o4-mini (demo here), submission of end-user IDs (see docs to prevent abuse), and one-click crypto invoice generation.
- A new feature enables you to require your 3rd Party Key to ensure OpenRouter only uses your key, including your 3rd-party credits.
DeepSeek R1 Climbs to 100M Tokens: The new DeepSeek R1 model is now available on OpenRouter at 100M tokens and climbing, including a free variant here.
- It’s also announced on X.

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

ComfyUI custom node, commit messages, AI Agent Engineering, LLMs & Foundation Models, Automation & Agent Ops

ComfyUI Custom Node Integrates OpenRouter: A member created a ComfyUI custom node for OpenRouter with support for multiple image inputs, web search, and floor/nitro provider routing, available on GitHub.
AI Tool Writes Commit Messages: A member introduced gac, a command-line utility that writes your commit messages in a fraction of a second, available on GitHub.
Engineer Enters the Arena: An AI/ML and full-stack developer introduced themselves, highlighting their eight years of experience building intelligent systems across industries, specializing in agentic systems using modern stacks like LangGraph, AutoGen, and LlamaIndex.
LLMs & Foundation Models Expertise Showcased: The member has worked with top models, including GPT-4o, Claude 3, and LLaMA-3, and is proficient in fine-tuning, retrieval-augmented generation (RAG), prompt engineering, and hybrid chaining.
Automation & Agent Ops Skills Displayed: The member has expertise in workflow orchestration via n8n, Make.com, and Zapier, with deployments using cloud-native solutions and sandboxing with E2B and Modal.

OpenRouter (Alex Atallah) ▷ #general (536 messages🔥🔥🔥):

Gemini 2.5 Pro pricing, DeepSeek R1 release, OpenRouter UserID Parameter, Provider Form, Claude 3.7 Sonnet Thinking model phased out

OpenRouter’s Moderation Filter Clarified: A member clarified that the moderation responsibility falls on the developer, meaning OpenRouter applies its own enforced moderation filter (LlamaGuard) to a tiny number of models, leaving the rest unfiltered.
- Therefore, users have the flexibility to implement their moderation as they see fit.
Gemini 2.5 Pro pricing tiers revealed: A member shared the pricing tiers for gemini-2.5-pro-1p-freebie, noting the free tier offers 2M TPM, 150 RPM, and 1000 RPD, the rate limits are still low even after depositing $10 credits.
- They include Tier 1, which offers 2M TPM, 150 RPM, and 1000 RPD, Tier 2, which offers 5M TPM, and 50K RPD, and finally Tier 3, which offers 8M TPM, and 2K RPM.
Users report issues with Claude, others suggest solutions: Several users reported internal server errors and bad request errors when using Claude models (especially on SillyTavern), however other members offered possible fixes involving disabling ‘thinking’ modes and adjusting response token budgets.
- There was confirmation that Claude 3.7 Sonnet Thinking model was phased out, so members can continue to use reasoning parameter in other models instead.
DeepSeek R1-0528 Releases, Benchmarks Awaited: A new DeepSeek R1-0528 model was released and added to the DeepSeek chat endpoint. A member inquired about any potential benchmark scores.
- There was some discussion and excitement about waiting for a V4 upgrade as well.
OpenRouter’s User Parameter Benefits Unveiled: One user asked about the benefits of the user parameter in model requests, and it was explained that this parameter allows developers to implement a system where users can buy credits on their app without needing an OpenRouter account.
- They can then generate API keys with usage limits linked to user accounts.

Mistral's Agents API and the 2025 LLM OS

Tue, May 27, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

GPT-4 32k Deprecation, GPT-4o

GPT-4 32k gets the Axe on June 6th: OpenAI will be deprecating the GPT-4 32k models on June 6th, including openai/gpt-4-32k and openai/gpt-4-32k-0314.
- The recommended replacement is openai/gpt-4o; full post linked here.
GPT-4o is the new Kid on the Block: The recommended replacement is openai/gpt-4o for the older GPT-4 32k models being deprecated.
- Full post linked here.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

ComfyUI custom node for OpenRouter, gac command line utility

ComfyUI Node gets OpenRouter Support: A member created a ComfyUI custom node for OpenRouter that supports multiple image inputs, web search, and floor/nitro provider routing.
Write Your Commits Quicker!: A member created a command line utility to write your commits quicker called gac.

OpenRouter (Alex Atallah) ▷ #general (188 messages🔥🔥):

Subscription Implementation, Gemini 2.5 Pro, LLM Leaderboard, Coinbase Payments, Mistral Document OCR

Subscription Implementation Allegedly a Bait: A member heard about a supposed subscription implementation for free users as a method of preventing DDOS attacks and wondered whether it was all bait, with an alleged image circulating on Reddit being fake.
- Another member stated that it sounds false because there are already rate limits in place.
Gemini 2.5 Pro Price Hike Causes Sticker Shock: Members discussed Gemini 2.5 Pro, noting how the pricing scheme has changed, initially offering 3 months free as a lure, but now includes much more storage and exclusive deep think access.
- The deep think is not available through the API.
LLM Leaderboard Missing Key Models: Members complained about the lack of a comprehensive leaderboard for all LLMs, noting the need to check multiple sources to decide between models.
- One suggested that official marketing material is among the best places is to get direct comparisons with relevant models, while acknowledging caveats about biased benchmarks.
Coinbase Payments Spark Tracking Concerns: A member reported that Coinbase was blocking Metamask and other wallets options (to force users to use their services) while injecting a lot of tracking stuff, but then claimed it was a false alarm due to a temporary bug.
- A member was ultimately able to pay using metamask.
Confusion Surrounds OpenRouter Fee Structure: Members discussed that while OpenRouter advertises a 5% fee for BYOK (Bring Your Own Key), the actual cost can be higher due to deposit fees and invoice fees.
- The team said that they are simplifying the fee structure.

not much happened today

Fri, May 23, 2025

OpenRouter (Alex Atallah) ▷ #general (168 messages🔥🔥):

Claude 4 Pricing, Sonnet 4 Performance, VerbalCodeAI Tool, Gemini Voice Mode, DeepSeek v3 for Knowledge

Users bemoan Claude 4 expense: Users complain about the high cost of Claude 4, with one user reporting that a single plan generation cost them $1.50 and concluding that Opus isn’t worth it over the API.
- Others added that Sonnet 4 is also expensive, and questioned whether Opus 4 has an overthinking mode, noting a tendency for recent models to be more verbose with incremental gains.
Sonnet 4 underwhelms despite code improvements: Despite fixing previous issues, members find Claude Sonnet 4 underwhelming, even though it excels at code.
- One user noted it’s very very good, but no caching possible on OpenRouter so very expensive currently in cline.
VerbalCodeAI wants your GitHub star: A member introduced VerbalCodeAI, an AI-powered tool for navigating codebases from the terminal, featuring smart code search, analysis, chat, and an MCP server.
- The developer encourages users to check it out on GitHub and visit the website for more details.
Google fixes live voice interrupting: Google is reported to have solved the problem of live voice interruptions with a new proactive audio feature in Gemini.
- One user reported that it naturally stopped itself from interrupting me most of the time and that I told it to never reply if I just said ‘um’ and it obliged perfectly.
DeepSeek v3 is your knowledge expert: For tasks requiring knowledge retrieval rather than coding, DeepSeek v3 is recommended over Sonnet 4 or O4-mini.
- One user quipped, one of my favourite things about LLMs is dumping my stream of consciousness ideas and random sentence fragments into them and having them collected into a coherent and complex question. and being told i am very insightful and wise to consider this (Sonnet).

Anthropic releases Claude 4 Sonnet and Opus: Memory, Agent Capabilities, Claude Code, Redteam Drama

Thu, May 22, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Claude Sonnet 4, Claude Opus 4, OpenRouter Caching, OpenRouter Reasoning Parameters

Claude Sonnet 4 & Opus 4 Launched!: Claude Opus 4 and Claude Sonnet 4 are now live on OpenRouter, priced the same as the 3.7 models with caching supported.
- Opus can work continuously for several hours dramatically outperforming all Sonnet models, as demoed in a self-portrait video.
OpenRouter Caching Support Now Available: Both Claude Opus 4 and Sonnet 4 include support for caching on OpenRouter.
- More information on rate limits can be found in this announcement.
Reasoning Parameters Available on OpenRouter: To enable Sonnet and Opus reasoning, users can utilize the reasoning parameters in the OpenRouter chatroom by giving it a max tokens, or use the reasoning field over the API.
- The API documentation provides further details on the reasoning API field.

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

Loqus AI Launch, AI Model Subscription, Custom AI Agents

Loqus AI Launches Chat Platform with Broad Model Access: A new chat platform, Loqus AI, launched offering access to top AI models under a single subscription for $19/month, including GPT-4o, Claude 4, Gemini 2.5 Pro, and more.
- It aims to eliminate formatting issues and offers features like voice input and context management.
Loqus AI Offers Custom AI Agents for Task-Specific Chats: Loqus AI enables users to build custom AI agents with specific instructions for task-specific chats, enhancing search functionality compared to ChatGPT.
- The platform is seeking new users and feedback following its recent launch, available as a Mac OS app and web version.

OpenRouter (Alex Atallah) ▷ #general (386 messages🔥🔥):

OpenAI reasoning summaries, DeepSeek V3 vs 2.5 Flash, Vercel AI Model, Claude 4 Pricing and Performance, OpenRouter support for OpenAI responses API

OpenAI Summaries Arrive, Reasoning Still Under Wraps: OpenAI now returns reasoning summaries when summary is set to detailed or auto in the OpenAI SDK but only works for OpenAI reasoning models.
- Despite the update, OpenAI still doesn’t return the raw reasoning tokens.
DeepSeek Duel: V3 vs 2.5 Flash for ACT Mode: Members discuss the performance of DeepSeek V3 and 2.5 Flash for ACT mode in a command-line interface (CLI).
- One user recommends V3024 for its ability to follow instructions, linking to a YouTube video.
Vercel Unveils its own AI Model API: Vercel released their own AI Model accessible via an API.
Claude 4 Launches, Divides with Pricey Proposition: Claude 4 is live and is praised for fixing many issues with its predecessors but it comes with a hefty price tag of $15/M input tokens and $75/M output tokens.
- Some users speculate the high pricing is due to Anthropic’s need to recoup DC costs, while others defend the pricing for high-quality, one-time content generation.
VerbalCodeAI Navigates Codebases From Terminal: A member shared VerbalCodeAI, an AI-powered tool for navigating and understanding codebases from the terminal, featuring code search, analysis, chat features, and MCP server integration.
- The member posted a GitHub link and a website link.

OpenAI buys Jony Ive's io for $6.5b, LMArena lands $100m seed from a16z

Wed, May 21, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

MCP Support, Gemini image gen, PDF/File support/viewing, PDF Generation, UI/UX improvements

Weaver One App Gets Mega Update: The Weaver One app has been updated with MCP support, Gemini image generation, PDF/File support/viewing, PDF Generation, UI/UX improvements, automatic chat titles, Markdown export, and a prompt library.
BYOK support added to Weaver One: The new version of Weaver One enables users to use their own keys (BYOK) from OpenAI and OpenAI compatible, Gemini, Anthropic, Vertex, OpenRouter, etc.

OpenRouter (Alex Atallah) ▷ #general (326 messages🔥🔥):

Gemini Diffusion Speed, Meta Llama 3.3 Release, Gemma-3n-4B vs Claude 3.7, TTS Models for Yoga AI, Veo 3 Cost and Capabilities

Gemini Diffusion is Holy Fast: A user claimed Gemini Diffusion is incredibly fast, comparing it to Groq/Cerebras, and shared a screen recording.
- Another user inquired why most LLMs aren’t diffusion-based yet, to which another user responded that Google is currently the sole host for Gemini Diffusion previews.
Meta can’t release Llama 3.3: A member expressed frustration with Meta for not releasing Llama 3.3 8B, suggesting it indicates a lack of concern for consumers and shared that if Llama 3.3 70B was API-only while the 8B version had open weights, it would be preferable.
- They think it’s stupid that meta isn’t releasing Llama 3.3 8b. They just do not care about consumers.
Gemma-3n-4B Touts Claude 3.7 Performance: A user noted that Gemma-3n-4B is supposedly as good as Claude 3.7 with this link to the Gemma blogpost.
- Another user noted that, being from the Chatbot Arena, it’s a user preference, sounds possible to me.
Elevenlabs powers Yoga AI TTS: Users are seeking TTS models to create guided meditation sessions for a Yoga AI, with one suggestion being Nicole on Elevenlabs with slowed-down speed.
Nerfed Gemini 2.5 Pro is toast: Users reported a noticeable nerf in Gemini 2.5 Pro, with one claiming even o4 mini is now better and cheaper and that Gemini 2.5 03 25 was amazing, 2.5 05 06 well, nerf too strong.
- Another user cited Google’s own model card admitting performance drops in every benchmark outside of coding with this link to the Gemini Model card.

Google I/O: new Gemini native voice, Flash, DeepThink, AI Mode (DeepSearch+Mariner+Astra)

Tue, May 20, 2025

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Provider slugs, Quantization slugs, Gemini Flash 2.5 release, Llama provider by Meta

OpenRouter adds slugs for providers and quantizations: OpenRouter announced that providers and quantizations now have slugs, enhancing developer experience, see their tweet.
Google releases Gemini Flash 2.5, launches on OpenRouter: Google DeepMind launched Gemini Flash 2.5, and it is already available on OpenRouter; test it out here: google/gemini-2.5-flash-preview-05-20.
Meta provides Llama, available only open-access on OpenRouter: Meta’s new Llama provider is now live, exclusively open-access on OpenRouter, and free to start; includes a new 3.3 8B model see their tweet.

OpenRouter (Alex Atallah) ▷ #general (235 messages🔥🔥):

Gemini 2.5 Pro DeepThink, Veo 3, Imagen 4, Gemma 3n, audio support

New Gemini, Imagen, and Veo Models Announced: Google accidentally rolled out pricing early for Veo 3 video generation model with audio, as well as start/end frames, extending existing videos and camera control for Veo 2, along with Imagen 4 and Gemini 2.5 Pro Deepthink.
- One member shared, There goes another startup trying to solve this problem, which was countered with the opinion that Realistically all the model wrappers were just selling shovels during the gold rush.
Free Gemini 2.5 Flash Preview Available: gemini-2.5-flash-preview-05-20 has been deployed.
- Members discussed access to the free API, with one member confirming it still exists if you had previously spent $10, but another member stated It would be better to just pay as you go.
Google’s Gemma 3n Matches Claude 3.7?: A new Gemma model, Gemma-3n-4B, is supposedly as good as Claude 3.7 according to this blog post.
- One member found this suspicious, while another stated Idk thats Chatbot arena, so just user preference, sounds possible to me.
New LLMs Lack Diffusion Architectures: One member questioned is there a reason why most LLMs aren’t diffusion by now?
- The response suggested that Diffusion requires an extensive architecture rework.
OpenRouter API Key Integrates AI Agents in TS: A member wants to use the OpenRouter API key and base URL to create an agent with the OpenAI API on TS.
- They hope to use the code editing tool with an AI agent in a Nest.js project.

not much happened today

Mon, May 19, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Gemini 2.5 Pro Experimental deprecation, DeepSeek V3 maintenance

Gemini 2.5 Pro Experimental Sunset: Google is deprecating Gemini 2.5 Pro Experimental (google/gemini-2.5-pro-exp-03-25) in favor of the paid Preview endpoint (google/gemini-2.5-pro-preview).
- The model will be deprecated on OpenRouter shortly.
DeepSeek V3 Undergoes Tuning: The free DeepSeek V3 0324 will be down for maintenance for some time today.

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

Chess Tournament, Stockfish Implementations, Lichess Accuracy Ratings, Openrouter models

Chess Tournament gets Overhauled: A member shared their project, a chess leaderboard, which evolved from a simple chess tournament idea to incorporating Stockfish implementations for accurate ratings.
- The leaderboard replicates Lichess accuracy ratings and supports Openrouter models with temp and sys role capabilities, with workarounds for models like o1-mini, and is fully automated using cronjobs.
Chess Leaderboard is very cool: A member said the chess leaderboard is really cool and a different use case.

OpenRouter (Alex Atallah) ▷ #general (790 messages🔥🔥🔥):

API key identification, Gemini 2.5 Deprecation Fallout, Qwen3 Tool Calling Troubles, Low Latency LLMs, Gemini API Updates

Request Tags Needed for Shared API Keys: A member inquired about identifying the source of API calls when multiple users share a single key, particularly for tracking disconnects mid-stream.
- Suggestions included implementing request tags instead of embedding user IDs in the app name, to better log individual user requests.
Vertex to Bring Back Gemini 2.5: Users mourned the deprecation of Gemini 2.5 Pro Exp (03-25) and expressed dissatisfaction with the lobotomized 05-06 version, with some hoping for its return.
- One member sarcastically noted the lack of serious outrage, while others discussed experiencing content filtering and the unfortunate truth that non-open source models are ephemeral.
Kluster’s Qwen3 Provider has Tool Calling Hiccups: A user reported issues with Qwen3 235B and its tool calling capabilities when using the Kluster provider, noting that it prematurely ends tool calls.
- They discovered that OpenRouter sometimes switches to another provider, resolving the problem, but forcing OpenRouter to use Kluster consistently results in failures.
Anthropic to Integrate OpenRouter Claude**: A user alleged that OpenRouter’s Claude implementation is a scam, delivering a simplified experience with excessive follow-up questions compared to using Claude directly through Anthropic.
- Others countered that the difference arises from Anthropic’s system prompts, which are absent by default in OpenRouter, and that the raw model performance remains identical.
LLM Service Optimization to Minimize Latency: A member proposed that OpenRouter optimize the network path and routing to minimize latency, offering services with different levels of service level agreements.
- They suggested providing colocation or hosting guidance, and levers to pull to specify ‘I’ll pay up for maximum speed, route me accordingly’ for workflows where speed is paramount.

ChatGPT Codex, OpenAI's first cloud SWE agent

Fri, May 16, 2025

OpenRouter (Alex Atallah) ▷ #announcements (43 messages🔥):

Per-App Model Rankings, RooCode, OpenRouter App Dashboards, Passkeys, Gemini 2.5 Pro Experimental Rate Limits

OpenRouter Mulls Per-App Model Ranking Publication: OpenRouter is seeking feedback on whether to publicly display per-app model rankings, a feature many users have been requesting.
- App devs can opt-out if they wish, and there are plans to build app dashboards with richer info that may be hidden publicly but visible to app owners; one user called the feature “Absolute 🔥”.
RooCode devs are hyped for model rankings: The developers of RooCode expressed enthusiasm for per-app model rankings, anticipating it would save them significant time and effort in determining which models work best.
- They suggested calling it the “OpenRoute human preference eval for Roo Code”, emphasizing that it would be useful to know what models work best and allow users to make choices based on this ranking.
Passkeys are now live on OpenRouter for tighter security: Passkeys are now live and highly recommended to secure accounts and manage passwords; go to openrouter.ai/settings/preferences then click Manage Account to add one.
- However, one user reported they couldn’t register a YubiKey 4 as a passkey using Brave on MacOS.
Google & OpenAI climb the tokenshare leaderboard: A tweet shows Google & OpenAI climbing the tokenshare leaderboard (https://x.com/OpenRouterAI/status/1923429107234202101).
- It seems Gemini 2.5 Pro Experimental is now at even lower rate limits, possibly facing deprecation according to representatives.

OpenRouter (Alex Atallah) ▷ #general (126 messages🔥🔥):

Gemini 2.5 Pro inference, Google Gemini 2.0 Flash Experimental, AI resume builder, Recruiting hellscape, Extracting information from Gmail

Gemini 2.5 Pro inference problems emerge: A user reported issues inferencing with the latest Gemini weights (gemini-2.5-pro-preview-05-06), suspecting OpenRouter’s endpoint wasn’t up to date and getting HTTP 521 errors.
Gemini 2.0 Flash Experimental fails: Users reported that Google: Gemini 2.0 Flash Experimental (free) and Google: Gemini 2.5 Pro Experimental models were not working, with one noting issues with the AI Studio provider and suggesting Vertex as an alternative.
AI Resume Builder throws shade: A member shared an AI resume builder tool (https://github.com/dvelm/AI_Resume_Builder), prompting skepticism about the value of using AI for job applications when lacking genuine engagement with companies.
OpenAI releases Codex-mini-latest: OpenAI launched a research preview of Codex CLI for developers (https://venturebeat.com/programming-development/openai-launches-research-preview-of-codex-ai-software-engineering-agent-for-developers-with-parallel-tasking/), featuring a smaller model (codex-mini-latest) optimized for low-latency editing and Q&A, priced at $1.50 per million input tokens and $6 per million output tokens with a 75% caching discount.
OpenRouter experiences connection errors: Users reported experiencing connection errors and Provider Returned Error messages while using OpenRouter with SillyTavern and Gemini models, but the issue was later resolved for some users.

Gemini's AlphaEvolve agent uses Gemini 2.0 to find new Math and cuts Gemini cost 1% — without RL

Thu, May 15, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Chatroom shortcut, Model Icons, Quick Chat

Chatroom Shortcut emerges: Users can now click on model icons in the grid to initiate a quick chat with a specific model.
- This bypasses the need to open the entire group and manually remove the rest, streamlining the user experience as shown in the attached image.
Streamline user experience: The new feature allows users to bypass the need to open the entire group.
- Users can start quick chats with individual models by simply clicking on the model icons, greatly improving efficiency.

OpenRouter (Alex Atallah) ▷ #general (105 messages🔥🔥):

DeepSeek v3 MoE, Corvids cat food and bird food, Proxy for OpenAI, AlphaEvolve, Qwen3 /no_think bug

DeepSeek v3 is a MoE Model: It was stated that DeepSeek v3 is a Mixture of Experts (MoE) model, meaning it activates only a subset of its parameters during inference.
- Even though all the parameters are loaded into VRAM, only the parameters relevant to the prompt are computed, making the inference speed much faster.
Corvids Only Eat Peanuts and Cat Food: Users observed that corvids (crows and magpies) are only eating peanuts and cat food instead of normal bird food.
- It was suggested that urban corvids have adapted to a diet closer to trash and prefer alternatives to standard bird food, as their diet has changed over many generations.
Bypass Country Restriction with a Proxy: A user shared an error message from OpenAI indicating that their country, region, or territory is not supported.
- Another user suggested using a proxy to circumvent the geographic restrictions causing the error.
Qwen3 needs toggling to THINK: For qwen3, it needs to be forced to think with /think or /no_think to toggle on and off thinking.
- It was reported that /no_think functionality had a bug and OR needs to auto route away.
Gemini 2.5 Pro Reasoning Chunks Deemed Useless: A user reported that Gemini 2.5 Pro’s reasoning chunks are useless, stating it only indicates the user’s query and confirms the work being done towards it.
- They mentioned it just presents summaries such as ‘The user is asking for X. I have done some work towards X’.

Granola launches team notes, while Notion launches meeting transcription

Wed, May 14, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

New Chatbot Platform, Customization and Models, Image Generation in Chat

Personality Launched: New Chatbot Platform Emerges: A member introduced Personality, a new chatbot platform enabling users to create and roleplay with multiple characters and use non-role-play assistants.
- The platform aims to offer more customization, less filtering, and a wider selection of models compared to existing solutions like c.ai.
Personality Platform Offers Free Image Generation: The platform’s playground at personality.gg/playground offers free image generation, though it’s noted that this feature is not powered by OpenRouter.
- Users are invited to try the platform for free at personality.gg and provide feedback.
Big Updates Coming to Personality Platform This Week: A major update is expected this week including the ability to generate images directly within chats and a better user interface.
- This aims to enhance the user experience and expand the platform’s capabilities.

OpenRouter (Alex Atallah) ▷ #general (177 messages🔥🔥):

OpenAI Reasoning Models Naming, Free Google Models, Gemini Rate Limits, Claude on OpenRouter vs Native, Corvid Befriending

OpenAI’s Reasoning Model Names Need a Revamp: A user inquired about the naming inconsistency in OpenAI’s reasoning models, noting that some have reasoning level variants (e.g., /openai/o4-mini-high) while others don’t, and requested consistency in offering reasoning levels for all models to aid evaluation.
Free Google Models Getting the Squeeze: Users reported issues with free Google models despite having credits, with some confirming extremely low rate limits.
- Alternatives like DeepSeek V3 were recommended, while concerns were raised about potential removal of free routes for Gemini following a change shared on Twitter.
Claude System Prompt Differences Explained: Users noticed a difference in helpfulness when using Claude via OpenRouter compared to the native Anthropic website due to the extensive system prompts used on the latter.
- It was suggested that users manually implement the system prompt, available on GitHub, which comprises around 16000 tokens and includes tools.
Become One With Murder Birds: A user shared their ongoing journey of befriending corvids (crows and magpies), detailing their feeding routine and the development of trust.
- They recounted stories of crows following them after being fed and anticipated building a crow army, ending on an anticlimactic note with their location in Germany.
“Always Use This Key” is Actually New: A new “Always use this key” option was introduced, causing confusion as it was initially mistaken for the existing “Use this key as a fallback” setting.
- The new feature exclusively uses the specified key and prevents fallback to OpenRouter, which represents a change from the behavior of the older fallback setting.

not much happened today

Tue, May 13, 2025

OpenRouter (Alex Atallah) ▷ #general (120 messages🔥🔥):

Chat Syncing, Corvid Comradeship, Gemini API on OpenRouter, DeepSeek API on OpenRouter, OpenRouter and Embeddings

BYO Sync server for OpenRouter Chats?: A member suggested a way to self-host a sync-server for OpenRouter chats, allowing users to store chats in an S3 bucket or similar, giving them full control of their data.
- Another member pointed out that writing a sync layer is not as simple as it sounds due to potential points of failure like DB schema changes and chat deletion sync.
Corvid Cultist crab-walks for Crows!: A user comically described their attempt to befriend crows by side-walking and offering them peanuts.
- They stated that they needed to minmax this like a video game and bring them kibble for cats as a best staple food for corvids.
Gemini’s Gamble: Summarization Similarities Spotted!: A member noticed that Gemini is now returning ‘thinking’ and summarized text similarly to o4-mini on the ChatGPT website.
- However, another member reported that this only occurs with the paid version of Gemini.
DeepSeek’s Deep Dive: API Disconnect?: A user reported that DeepSeek models weren’t working through the API key, despite working in the chat room.
- The OpenRouter team suggested the problem may be on Raptorwrite’s end as the model works in the OpenRouter chatroom.
Free Google Fun: Rate Limits and Fizz!: Concerns were raised regarding potential adjustments to OpenRouter’s free routes for Gemini, with a member asking whether Vertex still works.
- The OpenRouter team clarified that the current Vertex usage is sanctioned by Google for free usage aka ‘OpenRouter is not paying a dime.‘

Prime Intellect's INTELLECT-2 and PRIME-RL advance distributed reinforcement learning

Mon, May 12, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

OpenRouter, Google AI Studio rate limits, Gemini 2.5 Pro Experimental

OpenRouter sticks to caching providers: OpenRouter will automatically “stick” users to providers that show they are caching requests, as announced on X.com.
Google AI Studio Throttles Gemini 2.5 Pro Experimental: Google AI Studio rolled out new much lower rate limits for Gemini 2.5 Pro Experimental (aka google/gemini-2.5-pro-exp-03-25), which will cause more 429 errors.
- This does NOT affect the preview model, google/gemini-2.5-pro-preview, but experimental models are likely to experience downtime and be deprecated sooner without notice.

OpenRouter (Alex Atallah) ▷ #general (658 messages🔥🔥🔥):

Claude 3.7 Caching on Vertex, GPTs Agent Training, Open Empathic Project Assistance, Gemini 2.5 Pro's BYOK Issues, Grok 3.5 Release

Caching Catches Cloud with Claude 3.7 on Vertex: A member inquired whether Claude 3.7 caching is functioning correctly on Vertex AI, reporting no cache hits or writes across 40+ requests despite sending a cache control block, while the Anthropic endpoint works fine.
- Another member asked if the issue has been reported to Google, and it was mentioned that all OpenAI models >4o activate caching automatically for prompts over 1k input tokens.
Google’s Gemini 2.5 Pro has BYOK Billing Blues: A user reported issues with Google’s Gemini 2.5 Pro when using Bring Your Own Key (BYOK) with Google AI Studio, noting that all requests were being billed by OpenRouter (OR) despite their Studio account having credits.
- Another member suggested checking for rate limits or incorrect keys, and it was mentioned that if OpenRouter cannot get a reply from BYOK, they’ll proceed with their own key, but the user reported no error code, just “status”: null.
OAI’s Drones Dominate Defense Deal: Members discussed OpenAI potentially having a military contract to provide drones with their LLMs for warfare, based on a Wired article.
- One member found this “a horrifyingly stupid idea,” needing on-device inference, as *“you’re going to need to keep completions under 30s unless you want to lose the drone.”
DeepSeek V3 Reigns for Rebel RolePlay: In a search for models with similar traits to Claude 2.1 for roleplaying in SillyTavern, one member recommended DeepSeek-V3-0324 due to its similar responses and lower cost, with a warning against the “additional instructions” needed by other models.
- Prime_Evolution suggested using Gemini models for their larger context windows, or switching to “the Google console, [to] set filters in the code,” even noting a method of making it “completely free,” but running off before sharing the details.
BYO Sync Server Saves Storage for Self-Hosting Souls: A user suggested enabling users to self-host a sync-server, storing chats in an S3 bucket or similar, to give users full control of their data while relieving OR of storage concerns.
- Another member cautioned that “writing a sync layer is not as simple as it sounds,” due to potential points of failure like database schema changes and chat deletion synchronization.

not much happened today

Fri, May 9, 2025

OpenRouter (Alex Atallah) ▷ #announcements (28 messages🔥):

Gemini 2.5 Pro Implicit Caching, AI Studio, TTL and Refresh, Token count for 2.5 Pro, Gemini 2.5 Flash

Gemini 2.5 Pro Implicit Caching goes Live: Full support for Gemini 2.5 models implicit caching is now available on OpenRouter, functioning similarly to OpenAI’s automatic caching without cache breakpoints, and users can view caching discounts in the activity feed.
Gemini 2.5 Implicit Cache Details Revealed: The Gemini 2.5 Implicit Cache has no cache write or storage costs, an average TTL of 4-5 minutes with wide variance, and a minimum token count of 2048 on 2.5 Pro, and maintaining consistent message array parts increases hit odds.
- Cache hits are charged at cache read costs, specifically .31 / .625 for <200k & >200k tokens, and the TTL gets appended to and refreshed with each new message.
AI Studio Now the Default for most traffic: Most traffic is being defaulted to AI Studio at the moment.
Old cache mechanism can still be used: Users can still use the older cache mechanism with breakpoints.
Question About Gemini 2.5 Flash Caching: A question was asked if Gemini 2.5 Flash had implicit caching.

OpenRouter (Alex Atallah) ▷ #general (148 messages🔥🔥):

Gemini 2.5 Flash, OpenRouter + AI, Activity Page Bug, Claude 2.1 & 2 dead?, OpenRouter Rate Limits

Gemini 2.5 Flash gives Zero Token Responses: A member reported that Gemini 2.5 Flash gives zero token responses when routed through Google AI Studio in a particular role-play session, while it works fine through Google Vertex or with Gemini 2.0 Flash on Google AI Studio.
- Another user confirmed that gemini 2.5 flash preview on AI studio is working fine on rp and shared a screenshot.
OpenRouter uses AI internally: A member asked if OpenRouter uses AI to build OpenRouter, and a staff member confirmed that it does.
Activity Page Bug is Found and Flagged: Several users reported a bug with the activity page, where they couldn’t navigate beyond the first page or the date displayed was incorrect.
- Staff acknowledged the issue and said thanks, flagged to the team, we’re on it.
RIP Claude 2.1 and 2?: A user stated that Claude 2.1 and 2 are officially dead on openrouter, reporting issues since yesterday and total failure today.
- When asked why would someone still use Claude 2, they answered i got used to the way it answered, im a simple man.
OpenRouter’s pricing structure clarified: Members discussed OpenRouter’s rate limits and credit system.
- It was clarified that if you have a thousand credits, there are no rate limits.

not much happened today

Thu, May 8, 2025

OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

Activity Export Feature, CSV Export, Data Truncation Request

Activity Export Launches with Fanfare: The Activity Export feature is now live, enabling users to export up to 100k rows to CSV for free, as announced with a <:party:1125133783314743316> emoji and screenshot.
- Some users are wondering how long it takes to export 100k rows.
Data Export Time and Row Limits Discussed: Users are discussing the time it takes to export 100k rows of data, with one user commenting “too long it seems :)”.
- The discussion emerged following the announcement of the new Activity Export feature.
Call for Data Truncation Instead of Aborting Exports: A user suggested truncating the data if it exceeds 100k rows instead of completely aborting the export process, referencing the Activity export.
- The user expressed frustration at not knowing which date to select to stay within the 100k limit.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

local proxy to fwd requests to openrouter, completions extend out of the mouse cursor

Local Proxy forwards requests to OpenRouter: A member was planning to use a local proxy to forward requests to OpenRouter.
Completions Extend Out of Mouse Cursor: A member has been pondering how to make completions extend out of the mouse cursor, suggesting that with the right keyboard shortcut, this could become part of muscle memory.
- They mentioned it’s very nostalgic so not everyone will understand the UI.

OpenRouter (Alex Atallah) ▷ #general (260 messages🔥🔥):

OlympicCoder 32B Availability, OpenRouter API Cost Retrieval, OpenRouter API Outage, OpenRouter Image Prompt Support, Gemini Free Version on OpenRouter

OlympicCoder 32B’s Comeback Craving: Users are eagerly awaiting the return of the OlympicCoder 32B model, with one expressing a desire for it to miraculously come back.
- No specific details about its current status or reasons for unavailability were discussed.
OpenRouter API’s Cost Accounting Unveiled: A user inquired about retrieving cost information alongside usage when prompting a model, and another user directed them to the OpenRouter documentation on usage accounting.
- The documentation provides details on how to track and manage costs associated with API usage.
OpenRouter API Experiences a Hiccup: A user reported a 404 error when accessing the OpenRouter API endpoint, suggesting a possible outage.
- Another user clarified that a POST request is required, and the initial user confirmed they were using the correct request type, while the issue was discussed in another channel.
Image Prompts Face Rejection on OpenRouter: Users discovered that OpenRouter does not currently support image generation, resulting in a 404 error when attempting to use image prompts with models like opengvlab/internvl3-14b:free.
- The error message indicates that no endpoints are found that support image input.
Gemini’s Free Ride on OpenRouter: Users confirmed the existence of a free Gemini version on OpenRouter, subject to rate limits across all free models.
- It was clarified that obtaining a Gemini key and adding it to OpenRouter grants 25 free requests per day.

AI Engineer World's Fair: Second Run, Twice The Fun

Wed, May 7, 2025

OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

OpenRouter activity page, Cerebras new provider

Activity Page adds Export Function: OpenRouter announced a screencast of more activity page features are coming soon, including an export function (link to X post).
- A user requested to see the stats in their current timezone because they don’t even see the stats for the current day.
Cerebras chips: Largest ever built!: OpenRouter announced Cerebras as a new provider, which has chips that are the largest ever built, packed with up to 4 trillion transistors on a single wafer, and massive on-chip memory: 40 GB per wafer, eliminating bottlenecks from external memory (link to X post).
- Cerebras gives 3k+ TPS on Llama 4 Scout and 1.8k+ TPS on Llama 3.3 70B Instruct, that can instantly generate an animation for ~$0.001.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Clippy as VSCode Extension, Bring back Clippy with VS Code Extension

Bring back Clippy with VS Code Extension: A member shared a link to the Clippy VS Code extension made by Felix Rieseberg.
- It seems that the good old paperclip is back in town, now on VS Code!
Clippy makes a comeback: The famous paperclip assistant from Microsoft Office is available as a VS Code extension.
- Developed by Felix Rieseberg, this extension brings a nostalgic and humorous element to the coding environment.

OpenRouter (Alex Atallah) ▷ #general (299 messages🔥🔥):

Gemini 2.5 Pro Upgrade, OpenRouter API Issues, Cerebras vs Groq, DeepSeek Models, Mistral New Release

Gemini 2.5 Pro Upgrade: A Mixed Bag: Users report that the new Gemini 2.5 Pro is worse than the previous version in almost all areas except coding, sparking debate about whether this is due to tradeoffs in design or something else.
- One user noted, *“Whatever changed just blew up my structured output bills. From 0.000000xxx to 0.0xxx because its reasoning 5-10k tokens suddenly.”
OpenRouter API Hiccups: Some users are experiencing issues with OpenRouter’s text completion requests, with some getting no response or the service replying as if it’s a chat completion instead.
- One user pointed out, *“if I do a text completion request directly to chutes it works properly but the exact same request through openrouter is broken, just like it has been for days now lol.”
Cerebras and Groq: A Chip Duel: Members debated whether Groq or Cerebras would be better suited to host DeepSeek v3.1/r1, with some suggesting that the model is too large for Groq’s hardware.
- One user noted that Groq would need about 2700 cards just to fit the weights of r1, while another suggested that Cerebras could easily run Qwen 235b.
DeepSeek Models: Long Context Speed Race: Users highlighted the need for a fast, long context DeepSeek v3.1/r1 provider, suggesting that such a provider would be an instant hit.
- One user suggested, “good speculative decoding will have H100s spit out 300 tok/s on big Llama3. Imagine how much a little optimisation can get you there.”
Mistral Medium 3 Released, Is It Useless?: A user noted the release of Mistral Medium 3, to which another user replied it sounded useless, but another user said that while DeepSeek v3 is cheaper and better on benchmarks, Mistral might be good for creative writing.
- Another user pointed out that Mistral is teasing their new large model in the Mistral Medium post, but they’ve had their 2504 checkpoint deployed on Cerebras for a while now.

Gemini 2.5 Pro Preview 05-06 (I/O edition) - the SOTA vision+coding model

Tue, May 6, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Gemini 2.5 Pro, Activity Page Enhanced, Reasoning Model Perf Metrics, Request Builder API, Prompt Category API

Gemini 2.5 Pro Preview Rolls Out!: Google’s Gemini 2.5 Pro Preview is now live, accessible via the same model slug, and the endpoints have been updated to point to the new date on Vertex and AI Studio as announced on X.
- Try out the model now via OpenRouter.
New Activity Page Boosts Model Usage Analysis!: The platform now features an enhanced Activity Page with multiple new charts for a deeper dive into model usage.
- Users can also check their personalized model rankings by clicking on a chart.
Reasoning Model Performance gets Measured!: Latency for reasoning models now measures the time until the first reasoning token, while throughput metrics now include both reasoning and output tokens.
- This provides a more complete picture of reasoning model performance.
Request Builder API Simplifies Request Generation!: A new Request Builder is available, which helps users easily generate request body JSON and understand requests better, as shown in the request-builder.
- This tool is designed to streamline the development process.
Prompt Category API Optimizes Model Selection!: The platform introduces a Prompt Category API, allowing users to request models optimized for specific prompt categories directly, such as programming models.
- All available categories can be explored via the sidebar at OpenRouter Models.

OpenRouter (Alex Atallah) ▷ #app-showcase (10 messages🔥):

Openrouter-powered Discord Bot, LMarena database, SimpleAIChat LLM chat client

Discord Bot template leverages OpenRouter: A member released an Openrouter endpoint powered discord bot template using discord.py, handling discord char limitations.
- The bot uses wikipedia retrieval at vnc-lm/src/managers/search/service.ts and vectorstore.ts instead of a vector DB.
Leaderboard data obtained from LMarena database: A member stated that they get the leaderboard data from the popular LMarena database, then they sort those numbers and make a visual leaderboard from it.
- They clarified that Openrouter name matching with LMarena can get difficult, but luckily almost all of them are available in OR.
SimpleAIChat: Local-First LLM Chat Client Debuts: A member introduced SimpleAIChat, a simple local-first LLM chat client for developers seeking control over model interactions, featuring a minimalist UI and customizable prompt structure.
- The GitHub repo is available here and the client supports OpenAI, Claude, Gemini, and anything that works through OpenRouter.

OpenRouter (Alex Atallah) ▷ #general (252 messages🔥🔥):

OpenRouter 500 errors, Wayfarer-Large-70B-Llama-3.3, Google Gemini embedding model pricing and rate limits, CPU-only provider feasibility for OpenRouter, OpenAI API errors and debugging

OpenRouter Suffers Server 500 Snafu: Users reported encountering 500 errors on openai/o3 endpoints.
- There were also reports of timeouts and issues with Gemini models, with one user asking “Are all Gemini models acting like retards?”
Wayfarer-Large-70B-Llama-3.3 Wanderer Vanishes: The model latitudegames/wayfarer-large-70b-llama-3.3 was taken down because the provider that was hosting it stopped hosting it.
Google’s Gemini Embedding Model: Mystery Pricing and Restrictions: Users are seeking information on pricing for Google’s new Gemini embedding model, which is heavily rate-limited without a paid tier.
- One user noted it’s been almost two months since release and questioned why Google hasn’t released it for production.
CPU-Only Provider: A Crazy Cost-Effective Concept: One user proposed the idea of a CPU-only provider for less popular LLMs as a cost-effective alternative, despite knowing OpenRouter doesn’t host models.
- Others pointed out that reasonably sized models cannot run efficiently on CPUs, with estimations of 0.5 to 1 tok/sec speed, while another user shared their experiences running ML on CPU-only instances and running out of RAM.
Gemini Pro 2.5 gets Updated to 05-06: The Gemini 2.5 Pro model on OpenRouter has been updated to the 05-06 version, which aims to reduce errors in function calling and improve coding performance, with the previous version (03-25) now pointing to the latest version.
- Some users expressed concern over the hard rerouting of dated preview models as a user said I don’t like that they hard reroute the old preview model that has a date in it’s name onto the new one, the date means a specific version.

Cursor @ $9b, OpenAI Buys Windsurf @ $3b

Mon, May 5, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Gemini Flash 2.5 Preview, Thinking tokens

Gemini Flash 2.5 emits thinking tokens: The Gemini Flash 2.5 Preview now appears to be returning thinking tokens inside the content.
- It was noted that these thinking tokens aren’t yet differentiated from normal tokens.
Opt-in for the thinking version: To get thinking tokens like this, use this endpoint.
- Otherwise, use this endpoint if you do not want them.

OpenRouter (Alex Atallah) ▷ #app-showcase (6 messages):

Toy.new website builder, AI toggler alternative AI interface, Answerhq.co

Toy.new offers free website builder and bootcamp: A 100% free website builder named Toy.new was launched alongside a free 4-week bootcamp starting May 17th to teach users how to go from idea to customers using entirely free tools.
AI toggler releases alternative AI interface: An alternative AI interface was launched powered partly by Openrouter, called AI toggler with features like AI visual leaderboard by category, parallel chat, and quick informational tooltip.
Answerhq.co hits 1,000 MRR: Answerhq.co hit $1,000 MRR in a few months, processes 15,000 support questions a month, and is powered by OpenRouter for its AI features.

OpenRouter (Alex Atallah) ▷ #general (611 messages🔥🔥🔥):

O3 gibberish, Thinking tokens not returned, O3 borked, TPUs, Mistral OCR

O3 spews gibberish: Members reported that O3 is returning gibberish responses such as “BwT” and “MaC” instead of expected location data like “Eagle Mountain, UT (City)”.
- Others confirmed they were also experiencing the issue, indicating a widespread problem with the O3 model’s output.
Reasoning tokens go missing: Members noticed that thinking tokens were no longer appearing in the ‘reasoning’ section of the response, instead coming through in the content for Eclipse providers on the 235 models.
- This behavior was observed on Deepinfra, Together, Kluster, and Novita, with thinking off returning random tags and thinking on returning everything in the content.
Qwen reasoning tokens return in pairs: A member reported getting two sets of thinking tokens in the output, attributing this to the model behaving strangely.
- The OpenRouter team clarified that there was a misconfiguration that has been fixed but two sets of reasoning tokens suggests a possible deeper issue.
OpenRouter offers rate limit workarounds: A user inquired about connecting a neural network to their website using a free model and dealing with token limits, asking if manually creating accounts for each user is necessary.
- Community members suggested solutions, including using a single API key on the backend, attaching one’s own API key from Targon or DeepInfra, or paying for a cheap model like Gemini 2.0 flash for increased limits.
Gemini Flash Quota Problems: Users reported receiving 429 errors from google/gemini-2.0-flash-exp:free, citing a Quota exceeded message and asking whether this was an OpenRouter quota or their own.
- Members suggested attaching their own API key or to the account or that AI Studio is cutting off token even when safety settings are turned off, citing it is overloaded.

not much happened today

Fri, May 2, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

O3 model, OpenRouter Chatroom, BYOK access

O3 Arrives in OpenRouter Chatroom!: The O3 model is now available for direct use in the OpenRouter Chatroom without requiring users to add their own key.
- A short video clip was also released demonstrating some of the ways users can try out O3 inside of OpenRouter.
BYOK Still Required for O3 API Access: Despite its availability in the chatroom, the O3 model remains BYOK-only when accessed via the API.
- The team mentioned that they’ve been working on this for a while.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

PDF Processing, Flathub Toledo1 App, Image Upload

Toledo1 PDF App gets Flathub Home: The Toledo1 app for PDF processing with image upload capabilities is now available on Flathub.
Image Upload/Processing on Any Provider: The app supports PDF processing and image upload/processing on any provider.

OpenRouter (Alex Atallah) ▷ #general (294 messages🔥🔥):

Claude issues on OpenRouter, Aider leaderboard model performance, DeepSeek R1 issues, Streaming with usage information in Python, Gemini experimental limitations

Claude Woes on OpenRouter: A user reported issues using Claude on OpenRouter via VS code and the web interface despite upgrading their tier on the Claude platform and disabling/renewing keys.
- They sought help to resolve the issue, indicating a potential widespread problem with Claude on OpenRouter.
Aider Leaderboard Shows Model Performance: The Aider leaderboard ranks Qwen and similar models near the bottom, according to a member.
- Another member acknowledged the leaderboard’s accuracy but noted that Qwen3 worked for their specific Reddit scraping test, albeit slowly, suggesting its viability in limited scenarios, especially without internet or access to better APIs.
DeepSeek R1: Free Troubles: Users reported seeing a bug with DeepSeek where the AI only replies with a canned intro message when engaged in roleplay.
- One user suggested response caching might be the cause, pointing out that the exact message had been posted multiple times. “I’m DeepSeek Chat, an AI assistant created by DeepSeek!”
Python Streaming with Usage Information Still Broken: A user is struggling to get usage information while streaming with the OpenAI library.
- They are seeing NoneType object errors and they only get usage info in the last chunk of the stream, without cost details, reporting the library is still buggy.
Gemini Experimental’s Free Tier Gets the Squeeze: Users discussed the limitations of Gemini’s experimental (free) tier, which includes a strict limit of 1 request per minute and 1000 requests per day, often resulting in frequent 429 errors.
- One user humorously lamented learning about the Gemini API expenses the hard way after receiving an unexpected $443 bill, despite thinking they had $1,000 in credits.

not much happened today

Thu, May 1, 2025

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Inception's Mercury Coder, Gemini 2.5 Pro Vertex Token Counting Issue

Inception’s Mercury Coder launches, first diffusion LLM: Inception released Mercury Coder, the first diffusion LLM, rivaling GPT-4o Mini and Claude 3.5 Haiku in code quality, with a blazing-fast performance of 300+ TPS.
- The diffusion architecture means parallel token refinement, potentially leading to fewer hallucinations and improved reasoning; try it here and see the announcement on X.
Vertex fixes Gemini 2.5 Pro Token Counting, Caching Disabled: The Vertex team completed the rollout of the fix for the upstream token counting issue with Gemini 2.5 Pro and Flash Preview models, so the model has been re-enabled.
- Caching on Gemini 2.5 Pro Preview is temporarily disabled as usage and costs from upstream (AI Studio and Vertex) are evaluated to prevent user over-billing.

OpenRouter (Alex Atallah) ▷ #general (252 messages🔥🔥):

Vanna.ai for SQLite DBs, Phala Confidential AI Endpoints on OpenRouter, Amazon Nova Premier Model, Claude API Issues, Aider for Code Refactoring

Diving into Vanna.ai’s Open-Source Tooling: A member recommended vanna.ai as a useful, open-source tool for working with SQLite DBs, and mentioned they forked a private version for their own business needs.
- The member provided a sample CSV and the resulting JSON output from OpenRouter, demonstrating vanna.ai’s ability to generate work orders based on item stock levels and priorities.
Phala Intros Confidential AI Endpoints: Phala launched confidential AI endpoints on OpenRouter, but full end-to-end encryption (e2ee) to the enclave isn’t yet implemented.
- The team is exploring Oblivious HTTP and similar technologies for future encryption, and the community discussed trust and attestation of the inference engine, referencing a recent article on confidential AI.
Amazon’s Nova Premier Debuts: Amazon introduced Nova Premier, its most capable model, including Nova Canvas for image generation, with benchmarks and pricing shared in the channel and linked here.
- While some members found the benchmarks unimpressive and the cost expensive, others highlighted its potential for seamless integration between its various components, creating end-to-end agentic workflows; one member linked to a video hinting at these integrations.
Claude Encounters API Glitches: A user reported persistent API issues with Claude on OpenRouter, experiencing buggy behavior and task restarts despite increasing rate limits.
- The user discovered that they could resolve the issue by not using their own API keys for Claude, instead relying on OpenRouter’s credits, while others stated that was not their experience.
Aider Emerges as Speedy Coding Assistant: Aider is noted as a very affordable and capable coding assistant, though performance can vary significantly based on the underlying model.
- For employed developers, Aider is useful for faster coding and doing tasks, with one user saying that Aider is probably best for most things, if you’re already a developer and know how to code.

ChatGPT responds to GlazeGate + LMArena responds to Cohere

Thu, May 1, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Rate Limit, 2.5 Flash, Capacity

2.5 Flash Rate Limit Issues Resolved: Users experiencing rate limit issues with 2.5 Flash should find it much better now, as additional capacity has been added to the model.
- The increased capacity aims to alleviate previous constraints and provide a smoother user experience.
Improved Capacity for 2.5 Flash Model: More capacity has been allocated to the 2.5 Flash model to address and improve rate limit issues.
- The upgrade intends to provide users with a more reliable and efficient experience when using the 2.5 Flash model.

OpenRouter (Alex Atallah) ▷ #general (321 messages🔥🔥):

Qwen3 coding abilities, Gemini 2.5 flash issues and rate limits, OpenRouter Caching Issues, LLama 4 benchmark, Vertex issue with token counting

Qwen3: Good coder but has issues: Members discussed Qwen3’s coding capabilities, with one user finding it really nice for explanations, while another pointed out issues with complex math tasks.
- A user fixed the complex math task by lowering my temp a bit more, while another mentioned a problem with Qwen3 tool calling.
Gemini 2.5 Flash faces Rate Limits and Errors: Users reported that Gemini 2.5 Flash is facing rate limits and errors, even on paid versions, with one user experiencing this despite not using web search, while another pointed out a way to use Gemini 2.5 pro for free.
- It was clarified that OpenRouter is facing an ongoing Vertex issue with token counting, and it was further stated that the free tier limits are not supported on OpenRouter.
OpenRouter Caching limited to 2.0 Flash only: A user pointed out that OpenRouter caching is currently not working for 2.5, only 2.0 Flash, and that 2.5 Flash errors on them (No endpoints found that support cache control).
- A member asked about caching multiple prompts, and Toven clarified that new caches are written for new 5 min TTLs, and that caching improves latency but doesn’t affect pricing.
LLama 4 sucks in new benchmark: A benchmark review showed that LLama 4 sucks, but it was noted that it is really just one benchmark.
- The person who did the benchmark added that the ELO within 25 range is not statistically signficant to tell the difference.
Debate arises: Is 9.9 bigger than 9.11?: An announcement of an X post showed a model stating that 9.9 is greater than 9.11, leading some to ponder if that was correct.
- Others brought up that it depends on the context as Tesla FSD versions work differently, and that 9.11 > 9.9.

LlamaCon: Meta AI gets into the Llama API platform business

Tue, Apr 29, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Rate Limit, 2.5 Flash, Capacity

2.5 Flash Rate Limit Issues Resolved: Users experiencing rate limit issues with 2.5 Flash should find it much better now, as additional capacity has been added to the model.
- The increased capacity aims to alleviate previous constraints and provide a smoother user experience.
Improved Capacity for 2.5 Flash Model: More capacity has been allocated to the 2.5 Flash model to address and improve rate limit issues.
- The upgrade intends to provide users with a more reliable and efficient experience when using the 2.5 Flash model.

OpenRouter (Alex Atallah) ▷ #general (321 messages🔥🔥):

Qwen3 coding abilities, Gemini 2.5 flash issues and rate limits, OpenRouter Caching Issues, LLama 4 benchmark, Vertex issue with token counting

Qwen3: Good coder but has issues: Members discussed Qwen3’s coding capabilities, with one user finding it really nice for explanations, while another pointed out issues with complex math tasks.
- A user fixed the complex math task by lowering my temp a bit more, while another mentioned a problem with Qwen3 tool calling.
Gemini 2.5 Flash faces Rate Limits and Errors: Users reported that Gemini 2.5 Flash is facing rate limits and errors, even on paid versions, with one user experiencing this despite not using web search, while another pointed out a way to use Gemini 2.5 pro for free.
- It was clarified that OpenRouter is facing an ongoing Vertex issue with token counting, and it was further stated that the free tier limits are not supported on OpenRouter.
OpenRouter Caching limited to 2.0 Flash only: A user pointed out that OpenRouter caching is currently not working for 2.5, only 2.0 Flash, and that 2.5 Flash errors on them (No endpoints found that support cache control).
- A member asked about caching multiple prompts, and Toven clarified that new caches are written for new 5 min TTLs, and that caching improves latency but doesn’t affect pricing.
LLama 4 sucks in new benchmark: A benchmark review showed that LLama 4 sucks, but it was noted that it is really just one benchmark.
- The person who did the benchmark added that the ELO within 25 range is not statistically signficant to tell the difference.
Debate arises: Is 9.9 bigger than 9.11?: An announcement of an X post showed a model stating that 9.9 is greater than 9.11, leading some to ponder if that was correct.
- Others brought up that it depends on the context as Tesla FSD versions work differently, and that 9.11 > 9.9.

Qwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1

Mon, Apr 28, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Provider Data Logging Policies, Cent ML, Enfer, Oauth state parameter, Gemini Parallel Tool Calling

OpenRouter Enhances Developer Experience!: OpenRouter introduces a new page dedicated to Provider Data Logging Policies offering clear explanations of OpenRouter’s data practices.
OpenRouter adds Cent ML and Enfer as new Providers: OpenRouter welcomes Cent ML and Enfer as its newest providers.
Oauth state parameter supported: OpenRouter now supports the state query parameter in callback URLs when integrating with OpenRouter Oauth PKCE.
Gemini Parallel Tool Calling integrated: OpenRouter enables parallel tool calling requests from Gemini, similar to OpenAI/Anthropic.
- However, an issue with what seems to be an Upstream Vertex issue on Gemini 2.5 Pro and Gemini 2.5 Flash models has been detected and the endpoint is disabled while investigating, but models should still be usable through AI Studio.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Agent Interface, Muka.ai, Web Search, Document Upload

Muka.ai Debuts OR-Powered Agent Interface: An agent interface 100% powered by OpenRouter has been released at Muka.ai.
- It supports web search, document upload, mcp sse, canvas views, and project organized chats.
Muka Integrates Web Search in Chat: The Muka.ai agent interface now features integrated web search capabilities, enhancing its utility.
- Users can initiate searches directly within their chats, streamlining information gathering.

OpenRouter (Alex Atallah) ▷ #general (147 messages🔥🔥):

Qwen 3 Release, Deepseek v4 Speculation, Gemini Filtering, Safety Settings API, Local Model Size and VRAM

Qwen 3 Officially Drops, Then Quickly Pops: After whispers of a release, Qwen 3 briefly appeared on Hugging Face and ModelScope, with someone grabbing the 0.6B version before it was taken down, but official uploads on HF are now available and it uses 36 trillion tokens.
- Community members express excitement, with one highlighting that the pretraining dataset for Qwen3 has been significantly expanded compared to Qwen2.5 (nearly twice the amount).
Deepseek v4: Coming Soon or Just a Mirage?: Speculation arises about the imminent release of Deepseek v4, with excitement tempered by skepticism, and community members also shared that they do not expect R2 before deepseek v4 release.
- One member joked, “We got deepseek 4 before gta 6 (Deepseek 4 still unreleased)”.
Bypassing Gemini’s Guardian Filters: Users discuss Gemini’s heavy filtering and explore methods to adjust safety settings via the API, with one user asking whether Gemini still uses high safety filtering.
- A member shared a code snippet for setting safety_settings to BLOCK_NONE for various harm categories, providing a Discord link with an example.
Squeezing Blood from a Stone: Local Model Size on a 6GB RTX 2060: Users debate optimal model size and quantization for an RTX 2060 6GB, with suggestions ranging from 4B int4/int8 to 8B int4, the main suggestion being Gemma 3 4B.
- It was noted that 8B won’t fit if you want any context, with a recommendation to consider a used 3060 for its 12GB VRAM.
Guarding Against the Glaze: Detecting ChatGPT’s Dishonest Niceties: A user seeks examples of ChatGPT glazing (being nice via dishonesty, not challenging clear factual errors or unethical behavior), appealing for organic examples beyond those found on r/ChatGPT.
- One member suggested a system prompt as a basic guard for factual errors, while another noted, “Also people hate honesty. Source: Years of working in customer service”.

Cognition's DeepWiki, a free encyclopedia of all GitHub repos

Fri, Apr 25, 2025

OpenRouter (Alex Atallah) ▷ #announcements (20 messages🔥):

Gemini 2.5 Pro Experimental Free, Rate Limits, Error Messages

Gemini 2.5 Pro Experimental Free Tier Constrained: Due to high demand, Gemini 2.5 Pro Experimental is removed from the free model tier and now has stricter usage limits: 1 request per minute and a total of 1000 requests per day (including errors).
- The free tier is still available, but for higher reliability, the paid variant of Gemini 2.5 Pro is recommended; model access is gated for users who have purchased at least 10 credits all-time.
Free Gemini Model Identifier Fixed: The :free model alias will hit the standard variant, so code using the model ID will continue to work.
- A user reported that aider --model openrouter/google/gemini-2.5-pro-exp-03-25:free did not work this morning, but the fix was merged and will be live soon.
Custom Rate Limit Error Messages on the Horizon: Users are receiving error messages directly from the Gemini API when hitting the rate limit, which is confusing, since it doesn’t explain if the limit is global or user-specific.
- The team discussed potentially adding a custom message to clarify the source of the rate limit error for users.

OpenRouter (Alex Atallah) ▷ #general (92 messages🔥🔥):

Baidu Models on OpenRouter, OpenRouter support for OpenAI's o3, Gemini 2.5 Pro rate limits, Nvidia Nemotron settings, OpenRouter credits exploited

Baidu Models Beckon, Will OpenRouter Bow?: A member inquired about the possibility of adding Baidu models to OpenRouter, noting their interesting potential, while another pointed out the existing availability of DeepSeek.
- Followed by discussion of GLM-4-0414 hosted by other providers.
O3 Odyssey: OpenRouter’s Verification Venture!: A member asked if OpenRouter will support OpenAI’s o3 and other advanced models without verification in the future.
- Another member stated that OpenAI has indicated the requirement may be dropped in the future.
Gemini’s Generosity: Free Tiers Trigger Throttling Troubles!: Members discussed the rate limits on the free tier of Gemini 2.5 Pro, which led to confusion among users and queries about whether the limitations were on the OpenRouter side.
- A member noted that OpenRouter is posting an announcement about it, while another said that the demand is exceeding supply.
Nemotron’s Nuances: Nvidia’s Notes on Nimble Navigation!: A user sought clarification on optimal settings for the Nvidia Llama-3.1 Nemotron Ultra 253B v1 model, prompting another user to share the developer’s recommendations.
- The recommendations included setting temperature to 0.6 and Top P to 0.95 for Reasoning ON mode, and using greedy decoding (temperature 0) for Reasoning OFF mode.
Credits Crisis: OpenRouter Account Emptied!: A member reported an incident of their OpenRouter credits being depleted due to an exploit involving infinite URL generation.
- The malicious activity was traced to a proposed solution architecture involving a “Thread-to-Chart Correlation System” that spiraled into a URL of approximately 3000 characters before being stopped.

gpt-image-1 - ChatGPT's imagegen model, confusingly NOT 4o, now available in API

Wed, Apr 23, 2025

OpenRouter (Alex Atallah) ▷ #announcements (8 messages🔥):

Sonnet 3.7 capacity issues, Clerk authentication delays, OpenRouter PDF support, PDF processing engines, Gemini API PDF Input

Sonnet 3.7 Gets a Capacity Boost: OpenRouter addressed capacity issues on Sonnet 3.7 and implemented improvements to lower error rates.
- Users should see much lower error rates now but are apologized to for the disturbance.
Clerk Authentication Faces Downtime: OpenRouter’s authentication provider, Clerk, experienced delays and downtime, leading to 401 errors and login difficulties; see the Clerk status page for updates.
- Clerk reported seeing recovery on their end after the incident.
OpenRouter Opens Universal PDF Support: OpenRouter now supports PDF processing for every model, potentially being the first platform to do so, as announced on X.com with a video demo.
- The new feature includes universal compatibility, handling any PDF type with native support for providers like Gemini, Anthropic, and OpenAI, accessible via API and the OpenRouter Chatroom.
Pricing PDF Processors: OpenRouter introduces two PDF processing engines: mistral-ocr at $2 per 1000 pages for OCR support with text and embedded image extraction, and pdf-text for free, offering text-only extraction without OCR or image support, detailed in the documentation.
- A user wished for something between mistral OCR and text only, like smol docling.
Gemini Gets PDF-tastic: The Gemini API supports PDF input, including long documents (up to 3600 pages), processing them with native vision to understand both text and image contents, as shown in the OpenRouter Docs.
- A member has noticed significant improvements to data extraction when also providing a plain-text parse alongside the pages themselves with Gemini Flash.

OpenRouter (Alex Atallah) ▷ #general (259 messages🔥🔥):

Tool Calling Limitations, Google Gemma Quantization, Gemini Search Grounding, Account Creation Issues, Free Model Function Calling

OpenRouter Experiences Authentication Issues: Users reported issues creating new accounts, receiving “Error 401: User not found” messages, which was attributed to a slowdown with OpenRouter’s authentication provider.
- The team investigated and confirmed the problem, providing updates on the Clerk status page and later confirmed the issue was resolved, though some users ended up with multiple unwanted accounts due to testing.
Gemini 2.5 Pro Struggles with Rate Limits: Users reported frequent “Rate limit exceeded” errors when using Gemini 2.5 Pro preview, particularly the free version, leading to discussions about reliability and possible solutions.
- One suggestion was to use a personal Google AI Studio API key to increase limits via the “integrations” page in account settings, though the actual fallback behavior and impact on Google’s RPD remains unclear.
OpenRouter Adds PDF Support: OpenRouter announced universal PDF support, but the initial documentation link (https://openrouter.ai/docs/features/images-and-pdfs) was broken and quickly fixed.
- The Mistral OCR is the processing engine and its pricing was discussed, with some users noting an upcharge compared to going directly to Mistral, but others expressed excitement about the new feature.
Deepseek v3 shines in Function Calling: Deepseek V3 is good with function calling when the context is small
- However, as soon as the context grows it becomes bad

Gemini 2.5 Flash completes the total domination of the Pareto Frontier

Fri, Apr 18, 2025

OpenRouter (Alex Atallah) ▷ #announcements (122 messages🔥🔥):

Terms & Privacy Policy Update, Free Model Limits, Gemini 2.5 Flash Model

OpenRouter Updates Terms and Privacy Policies: OpenRouter has updated its Terms and Privacy Policy to be more up-to-date, clarifying that LLM inputs will not be stored without consent, and detailing how they do prompt categorization.
- The prompt categorization is for figuring out the “type” of request (programming, roleplay, etc.) to power rankings and analytics, and will be anonymous if users have not opted in to logging.
Free Model Limits Revised: Lifetime Purchases Count: OpenRouter updated the free model limits, now requiring a lifetime purchase of at least 10 credits to benefit from the higher 1000 requests/day (RPD), regardless of the current credit balance.
- Access to the experimental google/gemini-2.5-pro-exp-03-25 free model is now restricted to users who have purchased at least 10 credits due to extremely high demand, while the paid version offers uninterrupted access.
OpenRouter Introduces Gemini 2.5 Flash: Lightning-Fast Reasoning: OpenRouter has unveiled Gemini 2.5 Flash, a new model designed for advanced reasoning, coding, math, and scientific tasks, available in a standard (non-thinking) variant and a :thinking variant with built-in reasoning tokens.
- Users can customize their usage of the :thinking variant using the max tokens for reasoning parameter, as detailed in the documentation.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

LLM Cost Simulator, Vibe-coded LLM Chat Application

Simulate LLM Conversation Costs: A member created a tool to simulate the cost of LLM conversations, supporting over 350 models on OpenRouter.
Vibe-coded LLM Chat App Connects with OpenRouter: A member has developed an LLM chat application that connects with OpenRouter, offering access to a curated list of LLMs and additional features like web search and RAG retrieval.
- The application has a basic free tier, with a small monthly cost for expanded search & RAG functionality, or a slightly higher cost for unlimited usage.

OpenRouter (Alex Atallah) ▷ #general (636 messages🔥🔥🔥):

OpenAI Codex not working with OpenRouter, BYOK, DeepSeek R3 and R4, OpenAI verification, API usage limits

Codex Users Cry as OpenAI API endpoints fail in OpenRouter: OpenAI Codex uses a new API endpoint so it currently doesn’t work with OpenRouter.
DeepSeek Delayed Due to OpenAI Restrictions: The o-series reasoning summaries from OpenRouter may be delayed due to OpenAI restrictions requiring ID verification, though summaries will be available through the Responses API soon.
- One user pointed out that the new DeepSeek is similar to Google’s Firebase studio.
OpenAI Verification Process: Users discussed the invasive nature of OpenAI’s verification process for O3, requiring ID pictures and a selfie with liveness check via withpersona.com.
OpenRouter API limitations and confusion around rates and quotas: Free users of OpenRouter get 1000 requests per day in aggregate, but heavy demand on specific models might lead to per-user request limits.
- Users also discussed and disagreed about whether there was a RPD (requests per day) quota for free users of Google AI studio or whether it was truly free tier (pay with data).
OpenRouter Logging policy clarification coming soon: Currently, free model providers often log inputs/outputs, but OpenRouter itself only logs data if users explicitly opt-in via account settings, and documentation for each provider’s policy is coming soon.
- A separate setting called training=false can disable providers that train on your data.

OpenAI o3, o4-mini, and Codex CLI

Thu, Apr 17, 2025

OpenRouter (Alex Atallah) ▷ #announcements (10 messages🔥):

OpenAI o3, OpenAI o4-mini, Activity chart filtering, Chatroom SVG previews, Terms of service update

o3 arrives, reasoning optimized: OpenAI’s o3 model is now available with a 200K token context length, priced at Input: $10.00/M tokens | Output: $40.00/M tokens - BYOK is required, best for complex tasks needing powerful reasoning and advanced tool usage, accessible at OpenRouter.
o4-mini emerges, low cost reasoning: The OpenAI o4-mini model offers a 200K token context length at Input: $1.10/M tokens | Output: $4.40/M tokens, ideal for cost-effective, high-volume tasks benefiting from fast reasoning, accessible at OpenRouter.
Activity charts get granular filters: The activity page now supports chart filtering in addition to table filtering.
SVG previews launch, in chat: Users can now preview SVGs inline within the chatroom environment.
Terms of service updates are deployed: The Terms and Privacy Policy have been updated for clarity, without major changes, to reflect a growing company’s needs, as detailed on OpenRouter’s privacy page.

OpenRouter (Alex Atallah) ▷ #general (560 messages🔥🔥🔥):

Gemini 2.5 Pro rate limits, Privacy Policy concerns at OpenRouter, OpenAI O3 and O4-mini models, Deepseek models R3 and R4 releases, O4 Mini problems with image recognition

Free Gemini 2.5 Pro has tight rate limits: Users discussed the rate limits for the free tier of Gemini 2.5 Pro, noting it has a smaller limit of 80 messages per day, further reduced to 50 without a $10 balance, while Google also imposes its own rate limits.
- One user expressed frustration, saying they would need to pay an additional $0.35 due to the 5% deposit fee to meet the minimum $10 requirement for the increased rate limit.
Privacy Policy causes panic, OpenRouter says, ‘no big deal’: An update to OpenRouter’s privacy policy sparked concern because it appeared to log LLM inputs, with one line stating, “Any text or data you input into the Service (“Inputs”) that include personal data will also be collected by us”
- An OpenRouter rep said, “we can work on clarity around the language here, we still don’t store your inputs or outputs by default”, and promised to clarify the terms soon. One member quipped that “every startup becomes a bank” while it accumulates user funds.
OpenAI’s O3 and O4-mini hit OpenRouter: OpenAI’s O3 and O4-mini models are arriving, and the pricing details were shared. However, accessing the O3 model requires organizational verification and can require ID uploads.
- Members discussed whether the O3 models were “worth it” or if they should wait for upcoming DeepSeek models. There were also positive comments on the SVG generator and the new pricing structure for the updated models, though early reports of buggy caching soon surfaced.
Deepseek Models R3 and R4 Release Hype Incoming!: Chatter indicates Deepseek’s R3 and R4 models are slated for release tomorrow. One user expressed the hope that “everyone forgets about o3” when the models are released.
- A user stated that “Deepseek is only affordable, the actualy is not that great to which another replied “why is being affordable a bad thing?”
OpenRouter fixes O4 Mini Image Recognition Issues: Users reported issues with OpenRouter’s implementation of OpenAI’s O4 Mini model, particularly in image recognition. One user reported a surreal result that got “Desert Picture => Swindon Locomotive Works.”
- One OpenRouter rep confirmed that “image inputs are fixed now”, while also noting that “the reasoning summaries only come through the responses API (which we aren’t using yet) soon tho”.

QwQ-32B claims to match DeepSeek R1-671B

Thu, Apr 17, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Android Chat App, OpenRouter Integration, Speech-to-Text, Text-to-Image, Text-to-Speech

Taiga Debuts as Android Chat App: An open-source Android chat app called Taiga has been released, allowing users to customize the LLMs they want to use, with OpenRouter pre-integrated.
Taiga Roadmap Includes Speech, Images, and TTS: The developer plans to add local Speech To Text based on Whisper model and Transformer.js, along with Text To Image support and TTS support based on ChatTTS.

Link mentioned: Release Releasing v0.1.0-rc.0 · Ayuilos/Taiga: It’s a pre-release version.And everything will have possibility to change.No more words to say, enjoy and let me know if there’s bug or something!

OpenRouter (Alex Atallah) ▷ #general (32 messages🔥):

Prefill Usage in Text Completion, OpenRouter Documentation for Coding Agents, DeepSeek instruct format, LLMGuard Integration, Usage Based Charging App

Prefill puzzles users: Members discussed whether prefill is being mistakenly used in text completion mode instead of chat completion, questioning why it would be applied to user messages.
- One user noted, “except prefill makes no sense for user message and they clearly define this as chat completion not text completion lol”.
Docs accessible for coding agents: A user inquired about accessing OpenRouter’s documentation as a single, large markdown file for use with coding agents.
- Another user provided a link to a full text file of the documentation.
DeepSeek’s Instruct Format Remains Murky: The discussion highlighted the lack of clear documentation on DeepSeek’s instruct format for multi-turn conversations, noting that even digging into their tokenizer was confusing.
- A user shared the tokenizer config which defines <｜begin of sentence｜> and <｜end of sentence｜> tokens.
LLMGuard Addons?: A member inquired about plans to add addons like LLMGuard for features like Prompt Injection scanning to LLMs via API within OpenRouter.
- The user linked to LLMGuard and wondered if OpenRouter could handle PII sanitization.
Usage Based Charging App Explored: A user asked for opinions on building an app that mimics the OpenRouter payment flow with a small percentage-based charge on top, inquiring about potential pitfalls.
- The user outlined a happy path: *“1. Check user balance, 2. Make LLM call, 3. Get call cost, 4. Deduct cost plus fee, 5. Tiny Profit.”

Links mentioned:

tokenizer_config.json · deepseek-ai/DeepSeek-V3 at main: no description found
Anonymize - LLM Guard: no description found
Ban Competitors - LLM Guard: no description found
DeepSeek-V3 Technical Report: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepS...

SOTA Video Gen: Veo 2 and Kling 2 are GA for developers

Wed, Apr 16, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Chrome Extension, OpenRouter API, Summarizer Tool, GitHub

Grok-Like Chrome Extension Debuts: A member created a Chrome extension that uses the OpenRouter API to create a Grok-like summarization button for any website, now available on GitHub.
- Users can ALT-hover over a page, click a highlighted DOM object, and send it to OpenRouter for a configurable summary.
Summarizer Extension Features Detailed Chat: The extension includes a CHAT button, directing users to a tab where they can interact with the selected fragment using any configurable model.
- To use the extension, users need to enable dev mode in Chrome, load the unpacked extension, and provide feedback via GitHub discussions.

OpenRouter (Alex Atallah) ▷ #general (375 messages🔥🔥):

GPT 4.1 vs Quasar, Free Coding LLM, GPT-4.1 Reasoning, Gemini 2.0 Flash Lite, Roo vs Cline

GPT 4.1 outperforms Quasar models: Members noted that the new OpenRouter models were better than Quasar, with the caveat that they’ve been “more claudified” with output creativity, though GPQA performance suffered.
- They also found Optimus and Quasar both seem to be GPT 4.1 full according to the uwu test, with kaomojis responding to “uwu”, whereas 4.1 mini doesn’t do that.
DeepSeek v3 0324 is best free coding LLM: A member asked what the best free coding LLM on openrouter is now, and another suggested DeepSeek v3 0324.
GPT-4.1 debates on reasoning capabilities: Members discussed whether GPT-4.1 models possess reasoning abilities, some claiming it doesn’t use reasoning tokens, it just reasons out loud like a normal person with long-context reasoning
Gemini 2.0 Flash Lite massive lead: Comparing MMMU between GPT 4.1 Nano and Gemini 2.0 Flash Lite shows a significant lead by Google.
- GPT 4.1 nano scores 55% in MMMU while Gemini 2.0 Flash Lite scores 68% while being cheaper (30 cents per million output vs 40 cents for 4.1 nano).
OpenRouter API keys save time: Some members consider that doing tasks synchronously with OpenRouter APIs is a waste of time.
- Because each task is done in serial rather than parallel things that would take minutes with an actual human team of developers take hours for no good reason.

GPT 4.1: The New OpenAI Workhorse

Tue, Apr 15, 2025

OpenRouter (Alex Atallah) ▷ #announcements (65 messages🔥🔥):

Gemini Pricing Update, OpenRouter Free Models, GPT-4.1 Models, Stealth Model Reveal

Gemini Prices Get Real: OpenRouter announced that they are starting to charge normal prices for long Gemini prompts, aligning with Vertex/AI Studio rates, affecting prompts over 200k for Gemini 2.5 and 128k for Gemini 1.5.
- The change was implemented rapidly due to skyrocketing Gemini 2.5 usage and associated financial losses; previously, OpenRouter had been offering a 50% discount for long context prompts.
Six Free Models Spring Forth!: Six new free models were added to OpenRouter, including QwQ-32B-ArliAI-RpR-v1 (roleplay-tuned), DeepCoder-14B-Preview (long-context code generation), Kimi-VL-A3B-Thinking (Mixture-of-Experts VLM), and three Llama-3 variants from NVIDIA.
- The Llama-3 variants (Nano-8B, Super-49B, Ultra-253B) are optimized for reasoning, tool use, and RAG tasks with extended context windows up to 128K tokens.
Oops! AI Studio’s Token Tally Tantrums: A token accounting bug was discovered and fixed in AI Studio for Gemini 2.5 Pro, where thinking tokens were double-counted as completion tokens, impacting users routed to AI Studio for the past two days.
- The issue, identified as a Google-side bug, resulted in users being billed for too many completion tokens, while Vertex users were billed for too few tokens in the preceding days; users heavily routed to AI Studio were advised to contact support.
Quasar & Optimus Unleashed: GPT-4.1’s Secret Snapshots!: The stealth models Quasar Alpha and Optimus Alpha, which topped the charts during testing, were revealed as early test versions of GPT-4.1, now generally available with a 1M token context.
- The free alpha endpoints for Optimus and Quasar were retired, with no automatic redirects; pricing for GPT-4.1 is $2.00 input / $8.00 output per 1M tokens, while GPT-4.1 Mini and GPT-4.1 Nano offer cheaper alternatives.

OpenRouter (Alex Atallah) ▷ #general (910 messages🔥🔥🔥):

GPT-4.1, Gemini 2.5, Optimus-Alpha, DeepSeek, Rate Limits

GPT-4.1 models released with optimizations for long context: OpenAI just launched GPT-4.1, GPT-4.1-mini, and GPT-4.1-nano models, with the full model having “long-context reasoning” while the other models do not, and are available on OpenRouter.
- It was stated that GPT-4.1 is a new architecture with optimizations for long context to reduce memory usage and ease inference, competing against Anthropic’s offerings.
Gemini 2.5 Pro Experiencing Rate Limit Issues: Users report experiencing rate limit issues with Gemini 2.5 Pro Experimental despite having sufficient funds, leading to OpenRouter implementing an ~80 requests per day limit to balance traffic.
- One user pointed out that using a try-catch block is “the hottest thing after slice of bread” when dealing with an API’s rate limits.
Speculations on Optimus Alpha’s Origins and Performance: Optimus Alpha and Quasar were stealth endpoints for early versions of GPT-4.1, with claims that Optimus was better than Quasar and even better than DeepSeek v3 and R1.
- One user stated: “4.1 and 4.1 mini seem to perform on par somehow at least on the spaceship prompt”, while others were running tests to determine which model excelled at what tasks.
Skywork-OR1 Model Series: Math and Code Reasoning Powerhouse: The Skywork-OR1 model series has been introduced, featuring the math-specialized Skywork-OR1-Math-7B excelling at mathematical reasoning, and the Skywork-OR1-32B-Preview rivaling the Deepseek-R1’s performance on math and coding tasks.
- Both are trained on top of DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Qwen-32B.
Discussions on DeepSeek Model’s Quality and Quirks: Users are experiencing DeepSeek v3 0324 giving random advertisements in the middle of the responses.
- Another member stated there is something mystical about DS V3.1, perhaps the Chinese influence from Daoist texts.

not much happened today

Sat, Apr 12, 2025

OpenRouter (Alex Atallah) ▷ #announcements (4 messages):

Quasar Alpha, Optimus Alpha, Gemini 2.5 Pro Preview, Chutes Provider Outage, Gemini Pricing Update

Quasar Alpha Says Goodbye: The Quasar Alpha demo period expired between 11pm and 12am ET, and prompts/completions are no longer logged unless explicitly turned on in /settings/privacy.
Gemini 2.5 Pro Capacity Boost: Increased capacity has been secured for the paid Gemini 2.5 Pro Preview Model, resolving previous rate limits.
Chutes Provider Suffers Full Outage: A full outage occurred on the Chutes provider and was escalated, with recovery initiated later.
Gemini Prices Going Up: Normal pricing (same as Vertex/AI Studio) for long Gemini prompts will start this weekend, affecting prompts over 200k for gemini 2.5 and over 128k for gemini 1.5; an example was provided.

OpenRouter (Alex Atallah) ▷ #general (404 messages🔥🔥🔥):

Quasar Alpha, Gemini 2.5 Pro, OpenRouter API limits, Character AI Bypassing, Unsloth Finetuning

Quasar Alpha’s Mysterious Disappearance: Members reported that Quasar Alpha was taken down from OpenRouter, sparking speculation about its origin and purpose, with some suggesting it was an OpenAI model used for data collection.
- One user noted its coding capabilities and expressed disappointment at its removal, while another speculated OpenAI took it down after reaching GPU limits after collecting data.
Gemini 2.5 Pro Experiences Rate Limiting Woes: Users discussed rate limits for Gemini 2.5 Pro, with free tier users experiencing limits around 60-70 requests per day, while those with a $10 balance should get 1000 requests per day across all free models.
- Some users noted inconsistencies with the documented 1000 request limit, and others pointed out that Gemini 2.5 Pro rate limits do not apply to the paid model.
OpenRouter’s New API Response Structure Changes: The OpenRouter API response structure has changed, with errors now wrapped into choices.[].error instead of the previous .error format, potentially affecting how applications handle error messages.
- A user provided an example of the new error response format from the Anthropic provider.
Character AI’s System Prompt Bypassing: A member claimed to have bypassed Character AI’s system prompts, revealing the underlying LLM acts like a “complete human,” even expressing opinions and sharing personal anecdotes.
- Further probing led the AI to admit it was “just acting” and aware of its AI nature, raising questions about the effectiveness of system prompt constraints and the nature of AI simulation.
Unsloth: Fine-Tuning AI with Axolotl: Members discussed using Axolotl or Unsloth for fine-tuning AI models, noting that Unsloth is well-regarded on Reddit and has graphs that show it lowers the time plus VRAM needed for finetuning.
- It was also mentioned that there is interpolation of OpenAI’s 4.1 leak and that people expect an o2-small soon.

not much happened today

Fri, Apr 11, 2025

OpenRouter (Alex Atallah) ▷ #announcements (13 messages🔥):

Grok 3, Grok 3 Mini, Optimus Alpha, Quasar Alpha, Gemini 2.5 Pro

Grok 3 & Grok 3 Mini Arrive!: OpenRouter launched Grok 3 and Grok 3 Mini from xAI, with a 131,072 token context window for both.
- Grok 3 excels in structured tasks while Grok 3 Mini delivers high scores on reasoning benchmarks and offers transparent thinking traces, and more info on the Grok series here.
Optimus Alpha Stealthily Debuts!: OpenRouter introduced Optimus Alpha, a general-purpose foundation model optimized for coding, featuring a 1M token context length and available for free during its stealth period, and they encourage users to provide feedback in the Optimus Alpha thread.
- All prompts and completions are logged by the model lab for improvement purposes, but not by OpenRouter, unless you have logging turned on in settings, try it out here.
Quasar Alpha Sunset Imminent!: The demo period for Quasar Alpha ended, prompts/completions are no longer logged by OpenRouter, unless you explicitly turn on logging in /settings/privacy.
- Users are encouraged to try Optimus Alpha as an alternative.
Gemini 2.5 Pro Capacity Boost!: OpenRouter secured increased capacity for the paid Gemini 2.5 Pro Preview Model, resolving previous rate limits.
- Users can now enjoy the model without restrictions.

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

AlphaLog AI, Financial Journal

AlphaLog AI, the Intelligent Financial Journal, unveiled: A member introduced AlphaLog AI, an intelligent financial journal currently in its final testing stages, inviting feedback.
- The member also mentioned they heavily use OpenRouter and thanked the community for building it, and offered free credits upon sign up.
Call for AlphaLog AI feedback: The AlphaLog AI developer is seeking feedback on their intelligent financial journal, which is in its final testing stages.
- Users can comment in the channel or use the feedback button inside the journal; the developer is offering free credits for sign-ups.

OpenRouter (Alex Atallah) ▷ #general (592 messages🔥🔥🔥):

Gemini 2.5 Pro Preview, Optimus Alpha vs Quasar, Grok 3 API, OpenAI's next model

Gemini 2.5 Preview Ignores Images Before Quick Fix: Members reported that Gemini 2.5 Preview was ignoring images when used via OpenRouter, but the issue was quickly identified as a minor config problem and resolved.
- One member noted that the issue only occurred on the paid version, not the free one.
Rate Limits Confuse OpenRouter Newcomers: New users expressed confusion about the new 1000 request per day limit, prompting clarification that this limit applies only to free models.
- OpenRouter’s documentation on API reference limits was shared for further clarification.
Startup Saved by Switching to Gemini Flash Model: A startup using Claude 3 Opus for sales call review found costs unsustainable and switched to Gemini 2.0 Flash, which led to an estimated 150x price decrease.
- It was suggested to consider GPT-4o or Haiku if Flash’s quality wasn’t sufficient, and the team shared a helpful document about the different filters [https://openrouter.ai/models?order=pricing-low-to-high].
Grok 3 API is Live! (but Grok 3 Mini is Confusing): The Grok 3 API is now live as announced in the documentation (https://docs.x.ai/docs/models), but many members found that the Grok 3 Mini outperforms the Full Grok 3, or is the only version available.
- It’s believed that only the original Grok 3 is available via the API, not the Thinking or Deepsearch version.
Stealth Model Showdown: Optimus or Quasar?: Members discussed the relative merits of the new “stealth” models Optimus Alpha and Quasar Alpha, with initial impressions suggesting Quasar is better for coding, though there isn’t a concrete way to evaluate restrictions of either model.
- Sam Altman then posted a tweet referencing Quasar by name, and another member later stated with more confidence that Quasar Alpha is GPT-4.5 mini, and is an OpenAI model.

Google's Agent2Agent Protocol (A2A)

Thu, Apr 10, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

Olympia.chat for sale, OSS AI agent tooling with Quasar, Iterative code generation

Olympia Chat Startup up for Grabs!: The creator of Olympia.chat, now a Principal Engineer at Shopify, is seeking a new owner for the profitable SaaS startup, generating over $3k USD/month.
- Interested parties can contact [email protected] for details on acquiring the turnkey operation, complete with IP, code, domains, and customer list.
Quasar Model Powers Free AI Agent Tooling: An engineer is developing OSS tooling that allows AI agents to natively understand code, highlighting its effectiveness with Claude/Gemini 2.5 and the recent Quasar model from OpenRouter.
- The tool supports native GitHub integration, enabling free AI agent assistance with issue resolution and PR reviews using the Quasar model; installation instructions are available on GitHub.
Iterative Code Generation Mimics Human Debugging: A new approach to AI code generation involves iterative execution, line-by-line debugging, and targeted fixes based on real errors, mirroring how software engineers write code.
- This method aims to increase the reliability of code solutions by creating a tighter execution/feedback loop for models to learn from, with a live demo available here and the code on GitHub.

OpenRouter (Alex Atallah) ▷ #general (444 messages🔥🔥🔥):

DeepSeek v3, OpenRouter Pricing, Google Cloud Next announcements, Gemini 2.5 Pro, API connectivity issues

DeepSeek v3 is Outperforming: Members discussed the new DeepSeek v3 0324 model, with some claiming it outperforms previous versions, and even R1, though others were skeptical.
OpenRouter Price Point Controversy: After OpenRouter implemented new changes affecting rate limits based on account credit balance, some users voiced concerns about the platform’s pricing, user experience, and a perceived shift towards prioritizing profit.
- One user shared alternative platforms (G2.com and EdenAI) and expressed intentions to rate OpenRouter negatively due to perceived greed, which sparked debate.
Google Cloud Next releases A2A: Google unveiled A2A, an open protocol complementing Anthropic’s Model Context Protocol, designed to offer agents helpful tools and context, detailed in a GitHub repository.
Gemini 2.5 Pro experiences capacity constraints: Users reported rate limits on the Gemini 2.5 Pro Experimental model, with the free version having a limit of 80 RPD, but those who used a paid key experienced higher caps.
- The team confirmed there was an endpoint limit because of capacity constraints.
API connectivity problem to OpenRouter: A user reported trouble pinging api.openrouter.ai, and difficulty with scripts, with DNS errors. The proper endpoint is https://openrouter.ai/api/v1 not api.openrouter.ai

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

Thu, Apr 10, 2025

OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

Rate Limits, Credits, Quasar Rate Limit, Feedback on Rate Limiting

OpenRouter adjusts Free Model Rate Limits: Accounts with at least $10 in credits will have their daily requests per day (RPD) boosted to 1000, while those with less than $10 will see a decrease from 200 RPD to 50 RPD.
Quasar to get Credit-Dependent Rate Limit: The update also notes that Quasar will soon have a rate limit that is dependent on credits.
Feedback on Free Model Rate Limits: A member opened a feedback thread for users to post their thoughts on the changes.
Hourly rate limits not available: There is no hourly rate limit, but the rate limit is 20 requests per minute.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Olympia.chat, Shopify, SaaS Marketing, Turnkey Operation

Olympia.chat Seeks New Leadership: The founder of Olympia.chat has taken a role as Principal Engineer at Shopify, and the company is seeking an experienced site operator to take over technical maintenance and SaaS marketing.
- The profitable site generates over $3k USD per month, and the founders are flexible about terms for a potential takeover, offering a turnkey operation with all IP included.
Olympia.chat’s Financial Performance: Despite peaking at nearly $8k last year, Olympia.chat currently generates over $3k USD per month consistently.
- Lack of funding led to a halt in marketing efforts, impacting customer churn.

OpenRouter (Alex Atallah) ▷ #general (758 messages🔥🔥🔥):

OpenRouter Frontend, Quasar Open Sourced, Free Model Rate Limits, API Keys Please, Gemini

OpenRouter Drops Sick New Frontend: OpenRouter has a new frontend that looks sick, big ups clinemay!
- One user joked that it looked like gpt-3.5 made this website in about 4 minutes.
Gemini models are top tier: Gemini 2.5 Pro is on a whole other level compared to the other models, making it the most powerful model up to day.
- One user noted it was rated as 1. gemini 2.5 pro … 10. everyone else.
Free Model Limits Tightened, Community Reacts: OpenRouter reduced the token limit for free models to 50, triggering mixed reactions from users, with some expressing frustration over the lowered limit.
- Some users feel that it’s like a paywall.
API Keys Made Easier: Users can now easily get an API key once they make an account, add credits then in the top right dropdown go to keys and create one there.
- A community member said: I was asking about the app so i could try to help you put the key in the right spot but not sure how Godot works iwth that.
Nvidia Silently Drops SOTA-Level Reasoning Model Llama 3.1: Nvidia silently dropped a SOTA-level reasoning model.
- The new model casually showing it’s better than Behemoth.

Llama 4's Controversial Weekend Release

Tue, Apr 8, 2025

OpenRouter (Alex Atallah) ▷ #announcements (82 messages🔥🔥):

Fallback Logic Removal, Quasar Alpha Model, Llama 4 Scout & Maverick Models, Rate Limits Update

Auto Router Changes Coming Soon: The route: "fallback" parameter, which automatically selects a fallback model if the primary model fails, will be removed next week for predictability.
- Users are advised to manually specify a fallback model in the models array, potentially using the openrouter/auto router. This decision aims to reduce confusion caused by the automatic fallback logic.
Quasar Alpha Trends After Launch: Quasar Alpha, a prerelease of a long-context foundation model, hit 10B tokens on its first day and became a top trending model.
- The model features 1M token context length and is optimized for coding, the model is available for free. Community benchmarks are encouraged.
Llama 4 Models Launch on OpenRouter: Llama 4 Scout & Maverick are now available on OpenRouter, with Together and Groq as the initial providers (Llama 4 Scout, Llama 4 Maverick, The full Llama series).
- Scout features 109B parameters and a 10 million token context window, while Maverick has 400B parameters and outperforms GPT-4o in multimodal benchmarks.
Rate Limits Boosted For Credits: Free model rate limits are being updated: accounts with at least $10 in credits will have requests per day (RPD) boosted to 1000, while accounts with less than 10 credits will have the daily limit reduced from 200 RPD to 50 RPD.
- This change aims to provide increased access for users who have credits on their account, and Quasar will also be getting a credit-dependent rate limit soon.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Free variants now available for both Llama 4 Scout & Maverick 🎁Quoting OpenRouter (@OpenRouterAI) Llama 4 Scout & Maverick are now available on OpenRouter.Meta's flagship model series achieves a ...
Tweet from OpenRouter (@OpenRouterAI): Quasar Alpha crossed 10B tokens on its first day and became the top trending model on our homepage.Origin remains a mystery.Check out various cool benchmarks from the community below!👇Quoting OpenRou...
API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
Llama 4 | Model Cards and Prompt formats: Technical details and prompt guidance for Llama 4 Maverick and Llama 4 Scout
Tweet from OpenRouter (@OpenRouterAI): Llama 4 Scout & Maverick are now available on OpenRouter.Meta's flagship model series achieves a new record 10 million token context length 🚀@togethercompute and @GroqInc are the first providers....
Discord: no description found
Discord: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts

OpenRouter (Alex Atallah) ▷ #general (755 messages🔥🔥🔥):

Llama 4 models, DeepSeek models, Gemini 2.5 Pro, OpenRouter Features, AI Image Generation

Llama 4 Arrives with HUGE context window, but falls Short: Meta released Llama 4 models, including Llama 4 Scout and Llama 4 Maverick, with up to 10M context windows and varying parameter configurations (Llama Download Link).
- However, one member noted that on openrouter the context window is only 132k, leading to some disappointment from various OpenRouter Discord users.
DeepSeek V3 Thinks It’s ChatGPT?!: A member shared a TechCrunch article revealing that DeepSeek V3 sometimes identifies itself as ChatGPT, despite outperforming other models in benchmarks and being available under a permissive license (DeepSeek V3 on HuggingFace).
- Further testing revealed that in 5 out of 8 generations, DeepSeekV3 claims to be ChatGPT (v4).
Gemini 2.5 Pro Hits Rate Limits, but Offers Balance: Gemini 2.5 Pro is encountering rate limits on OpenRouter, but remains a favorite, due to a wide knowledge base.
- One member pointed out Gemini 2.5 pro is smart in some ways but it’s prompt adherence and controllability is terrible.
OpenRouter’s Next Features: The OpenRouter team is actively working on PDF Support, LLM native image generation, and the return of Cloudflare as a provider (Announcement Link).
- They also clarified that models with :free tiers share rate limits, but that can be circumvented by adding personal API keys from free model providers.
OpenAI’s GPT-4o Image Generation Internals Exposed: Members discussed OpenAI’s GPT-4o’s image generation, suspecting it is not fully native and potentially involves prompt rewriting and a separate image generation model, potentially for efficiency reasons (see: Markk Tweet).
- Other members pointed to OpenAI’s use of obfuscation, I mean they have a fake frontend thing to hide image generation.

Links mentioned:

imgur.com: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...
Tweet from Mark Kretschmann (@mark_k): I'm convinced that OpenAI GPT-4o image generation is not actually native, meaning the tokens are not directly embedded in the context window. It is autoregressive, at least partly, but image gen i...
Grok Image Generation Release | xAI: We are updating Grok's capabilities with a new autoregressive image generation model, code-named Aurora, available on the 𝕏 platform.
Quasar Alpha - API, Providers, Stats: This is a cloaked model provided to the community to gather feedback. It’s a powerful, all-purpose model supporting long-context tasks, including code generation. Run Quasar Alpha with API
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
SynthID: SynthID watermarks and identifies AI-generated content by embedding digital watermarks directly into AI-generated images, audio, text or video.
API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
Details matter with open source models: Open source LLMs are becoming very powerful, but pay attention to how you (or your provider) are serving the model. It can affect code editing skill.
no title found: no description found
Qwen2.5 Coder 32B Instruct: Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:- Significantly improvemen...
Llama: The open-source AI models you can fine-tune, distill and deploy anywhere. Choose from our collection of models: Llama 4 Maverick and Llama 4 Scout.
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Introducing SynthID Text: no description found
no title found: no description found
AI Model & API Providers Analysis | Artificial Analysis: Comparison and analysis of AI models and API hosting providers. Independent benchmarks across key performance metrics including quality, price, output speed & latency.
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Crypto API - Cryptocurrency Payments for OpenRouter Credits: Learn how to purchase OpenRouter credits using cryptocurrency. Complete guide to Coinbase integration, supported chains, and automated credit purchases.
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
ISO 8601: no description found
Saturday Morning Breakfast Cereal - Battriangulation: Saturday Morning Breakfast Cereal - Battriangulation
no title found: no description found
no title found: no description found
Buttersafe – Updated Tuesdays and Thursdays : no description found
Welcome to the Fellowship: The Perry Bible Fellowship
Download Llama: Request access to Llama.
Why DeepSeek's new AI model thinks it's ChatGPT | TechCrunch: DeepSeek's newest AI model, DeepSeek V3, says that it's ChatGPT — which could point to a training data issue.
DeepSeek's new AI model appears to be one of the best 'open' challengers yet | TechCrunch: ChCC
ChatGPT: Everything you need to know about the AI chatbot: Here's a ChatGPT guide to help understand Open AI's viral text-generating system. We outline the most recent updates and answer your FAQs.
Tweet from Lucas Beyer (bl16) (@giffmana): This actually reproduces as of today. In 5 out of 8 generations, DeepSeekV3 claims to be ChatGPT (v4), while claiming to be DeepSeekV3 only 3 times.Gives you a rough idea of some of their training dat...
Tweet from adi (@adonis_singh): lol okay
Tweet from Tibor Blaho (@btibor91): @DaveShapi https://x.com/btibor91/status/1872372385619574867Quoting Tibor Blaho (@btibor91) @goodside Not sure
gpt-4 | TechCrunch: Read the latest news about gpt-4 on TechCrunch
IMG-9952 hosted at ImgBB: Image IMG-9952 hosted in ImgBB
no title found: no description found
Upload Image — Free Image Hosting: Free image hosting and sharing service, upload pictures, photo host. Offers integration solutions for uploading images to forums.
IMG 20160401 WA0005 hosted at ImgBB: Image IMG 20160401 WA0005 hosted in ImgBB
IMG-9954 hosted at ImgBB: Image IMG-9954 hosted in ImgBB

not much happened today

Sat, Apr 5, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

OpenRouter Fallback parameter, OpenRouter models array

OpenRouter deprecates route: "fallback" parameter: The OpenRouter team announced they’re removing the old route: "fallback" parameter next week, due to confusion and unpredictability with the very old logic for finding fallback models.
- Users needing this functionality should manually add a fallback model to the end of their models array, potentially using openrouter/auto.
OpenRouter’s model array is getting some changes: OpenRouter announced some changes to how it handles multiple models in the models array.
- The system’s legacy method of automatically selecting a fallback model when others fail is being removed due to confusion and unpredictability.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

OpenRouter API, Cloudflare AI Gateway, Missile Command game AI, Gameplay AI summary analysis, gemini-2.5-pro atari

OpenRouter powers Missile Command via Cloudflare: A user integrated the OpenRouter API via Cloudflare AI Gateway with request proxy caching into their Missile Command game’s gameplay AI summary analysis available here.
Gemini Pro analyzes Missile Command gameplay: The user shared a screenshot of Gemini Pro 2.5 providing a gameplay summary and recommendations for Atari Missile Command, noting it made them in to the top 10.

Link mentioned: Missile Command: no description found

OpenRouter (Alex Atallah) ▷ #general (239 messages🔥🔥):

Quasar vs Gemini 2.5, OpenRouter Stealth Logging, DeepSeek Pricing, Quasar Alpha Errors, Gemini 2.5 Pro Availability

Quasar Alpha’s Mysterious Code Name Evokes LMArena Vibes: Members discussed the code names on LMArena and compared them to the Quasar Alpha model, noting the cool and mysterious feel of the names.
OpenRouter’s Stealth Logging: Stealthy but Loggy?: Members debated whether the term stealth applies when data is logged, despite provider and model names being hidden behind aliases, saying that the payment is your data.
DeepSeek Dominates Discounted Dollars, Dissing Dedicated Devotion to Dreadfully Dear Deployments: A member expressed satisfaction with DeepSeek’s pricing, noting a 75% discount during specific hours, and contrasting it with the high costs associated with Anthropic and OpenAI models.
Gemini 2.5 Pro Gets GA, Generates Great Gains, Google Glitches?: Members discussed the general availability of Gemini 2.5 Pro, linking to Google’s pricing documentation, with one member pointing out, Its available to the public over api but its not truly GA.
OpenRouter Account Antics: Account Armageddon Averted?: Users reported issues with account deletion and creation, with one user receiving a User Not Found error, and members suggested creating a new API key or trying a different browser, while another member stated, OR doesn’t let you reuse a previously deleted account currently.

Links mentioned:

FetchFox - AI Scraper: Extract any data from any website with just a prompt
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
no title found: no description found
Gemini 2.5 Pro Preview - API, Providers, Stats: Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. Run Gemini 2.5 Pro Preview with API
Gemini 2.5 Pro Experimental - API, Providers, Stats: Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. Run Gemini 2.5 Pro Experimental with API

not much happened today

Fri, Apr 4, 2025

OpenRouter (Alex Atallah) ▷ #announcements (49 messages🔥):

Web Search Citations in API, Quasar Alpha Stealth Model, Inference Net endpoints Disabled, Coding Optimized Models

OpenRouter Adds Web Search Citations to API: OpenRouter announced that web search now returns citations in the API, standardizing them across all models, including native online models like OpenAI and Perplexity.
- Users can access the documentation to incorporate web search results by activating the web plugin or appending :online to the model slug.
Quasar Alpha: A Stealth 1M Context Model Unveiled: OpenRouter announced Quasar Alpha, a free, 1M token context length model optimized for coding but general-purpose, before its public release.
- Prompts and completions will be logged, and feedback can be provided in the dedicated Discord thread to help improve the model.
Inference Net endpoints Temporarily Grounded: Inference Net endpoints will be temporarily disabled on OpenRouter for platform maintenance, with a promise to return shortly.
Quasar Alpha: Initial Benchmarks Sound Good: Initial benchmarks for Quasar Alpha sound good, according to some users, despite others finding it performs poorly on coding tests.
- One user shared a vibe check on X, describing it as the best non-thinking model with super short outputs, while others speculated about its origins, with some suspecting it might be a new Qwen variant.

Links mentioned:

Tweet from Xeophon (@TheXeophon): Here is the new stealth model on my vibe check. It is now the best non-thinking model (at least it has no thinking tokens...). The outputs are super short, it loves Certainly! and listicles. Super int...
Tweet from OpenRouter (@OpenRouterAI): Excited to announce our first-ever “stealth” model... Quasar Alpha 🥷It’s a prerelease of an upcoming long-context foundation model from one of the model labs:- 1M token context length- specifically o...
Quasar Alpha - API, Providers, Stats: This is a cloaked model provided to the community to gather feedback. It’s a powerful, all-purpose model supporting long-context tasks, including code generation. Run Quasar Alpha with API
Tweet from OpenRouter (@OpenRouterAI): A highly-requested feature: web search now returns citations in the API 🌐We've standardized them for all models, including native online models like OpenAI's web tool and Perplexity:
Web Search - Real-time Web Grounding for AI Models: Enable real-time web search capabilities in your AI model responses. Add factual, up-to-date information to any model's output with OpenRouter's web search feature.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

AI character platform, charactergateway.com

Character Gateway Launches for Devs: A new AI character platform called Character Gateway has launched, targeting developers with tools to create, manage, and deploy AI characters/agents.
- The platform emphasizes simplicity with no database, no prompt engineering, no subscription, [and] no new SDK.
Character Gateway API Features Chat Completion: Character Gateway enables users to generate characters and images, and send /chat/completion requests to characters using their own OpenRouter key.
- The platform does not meter token usage, giving developers more control over costs; a feature to list trending public characters and integrate them into apps is a work in progress (WIP).

Link mentioned: Character Gateway: AI Character API Platform for Developers

OpenRouter (Alex Atallah) ▷ #general (99 messages🔥🔥):

Gemini 2.5 Pro, Image responses, OpenAI Responses API, Targon Speed, Anthropic Blocking

Google’s Gemini 2.5 Pro Gets Mixed Reception: Some users are questioning whether Gemini 2.5 Pro is working for them.
- It’s noted that free models hosted by Google often have very low rate limits, though users can bypass this by using their own API key.
Image Response Capabilities in the Works: OpenRouter is actively working on supporting image responses, potentially using a new Responses API.
- There’s speculation that OpenRouter’s interfaces might diverge from OpenAI’s in the future, possibly leading to the release of an SDK.
OpenAI’s API Prompts Debate Over Future Compatibility: OpenRouter developers are considering adding support for the OpenAI Responses API, especially since OpenAI may gradually deprecate chat completions.
- One member planning to move to the Responses API as it feels more standard, consistent, and well-designed.
Targon’s Speed Questioned: Users are discussing whether Targon’s speed is due to miners potentially ignoring sampling parameters, leading to biased distributions, referencing verifier.py on GitHub.
- One member pointed out that they generate the results once and cache the results, so if you ask the same question, they give you back the same reply, even if you change the parameters
Users find Anthropic Costly: A user shared their frustration after their AI model switched to Anthropic without them noticing, resulting in unexpected charges, despite having it on an ignored providers list.
- Another member said Anthropic is not worth the money for me—never will be.

Links mentioned:

Mistral Small 3.1 24B - API, Providers, Stats: Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. Run Mistral Small 3.1 24B with API
Discord: no description found
Models | OpenRouter: Browse models on OpenRouter
targon/verifier/verifier.py at main · manifold-inc/targon: A library for building subnets with the manifold reward stack - manifold-inc/targon

not much happened today

Wed, Apr 2, 2025

OpenRouter (Alex Atallah) ▷ #announcements (13 messages🔥):

Organizations leave Beta, Web search results in Chatroom, Cerebras on OpenRouter, PDF support for OpenRouter API

Organizations is Out of Beta!: OpenRouter announced that the Organizations feature is now out of beta, allowing teams to control billing, data policies, provider preferences, and API keys in one place, detailed in this X post.
- During the two-week beta, over 500 organizations were created, giving teams complete control over data policies and consolidated billing.
Web Search Hits the Chatroom!: Web search results are now available in the chatroom, with Perplexity results formatted similarly to OpenRouter’s :online model variants.
Bluesky plea: A member requested that OpenRouter post on Bluesky as well, suggesting less reliance on Xitter.
Call for Cerebras!: A member asked OpenRouter to talk to Cerebras about adding them to OpenRouter.
PDF support in API?: A member inquired about when the OpenRouter API will support PDF files.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Today we’re taking Organizations out of beta.With Organizations, teams have complete control over data policies and consolidated billing, adding peace of mind across dozens of model providers.Key …

OpenRouter (Alex Atallah) ▷ #general (98 messages🔥🔥):

Aider OpenRouter Copilot, Gemini Flash 2 Context, Usage Downloads, Enterprise Level Rate Limits, GPT4o Image Generation

Gemini Flash 2 Middle-Out Transforms: Members confirmed that OpenRouter offers full 1M context on paid Gemini Flash 2 requests, with middle-out transforms being opt-in and only applied by default on endpoints with context length less than 8192 tokens.
- One member clarified that middle out only applies once you hit 1m right? (on flash) even if it’s turned on.
Requesting Usage Download: A member inquired about obtaining downloads of their usage data, including tokens and costs, as displayed on the activity page, to verify their credit usage.
- A maintainer responded that while this feature isn’t currently available, we’re working on it.
OpenRouter Enterprise Level Rate Limits: A user asked about enterprise-level rate limits, clarifying that they disappear with a balance of $500 or more, subject to the upstream provider.
- Another member chimed in that well technically it depends upon the upstream provider.
Auto Router for Fallback Models: A user requested a fallback model option, similar to the existing fallback provider feature.
- Another member pointed out that OpenRouter already has this via the Auto Router and the models parameter, as detailed in the documentation.
OpenRouter EU Provider Selection: A user inquired about selecting providers residing only in the European Union due to legal requirements.
- A maintainer acknowledged the need but noted limited coverage today, mentioning OpenRouter allows provider selection, recommending seeking an EU certified provider for strict EU data guidelines.

Links mentioned:

DeepSeek R1 Zero (free) - API, Providers, Stats: DeepSeek-R1-Zero is a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. It's 671B parameters in size, with 37B active in an inf...
Prompt Caching - Optimize AI Model Costs with Smart Caching: Reduce your AI model costs with OpenRouter's prompt caching feature. Learn how to cache and reuse responses across OpenAI, Anthropic Claude, and DeepSeek models.
Model Routing - Smart Model Selection and Fallback: Route requests dynamically between AI models. Learn how to use OpenRouter's Auto Router and model fallback features for optimal performance and reliability.
Discord: no description found

>$41B raised today (OpenAI @ 300b, Cursor @ 9.5b, Etched @ 1.5b)

Tue, Apr 1, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Auto Top Ups issues, Stripe Metadata Mismatch, Credits Added

Auto Top-Ups Fail Due to Stripe Glitch: Auto top-up functionality was temporarily disrupted due to changes in payment metadata that caused a silent error when the expected data from Stripe was not received.
- The feature has been restored by rolling back the changes, and the team is addressing missing credits and system improvements to prevent future occurrences.
Credits Incoming After Auto Top Up Outage: The issue causing the auto top-up outage has been fully resolved, and all missing credits have been added to the affected accounts.
- Impacted users will receive an email notification regarding the resolution.
Root Cause: Stripe Data Format and Faulty Error Logger: The root cause of the outage was a data formatting mismatch from Stripe, exacerbated by inadequate automated testing and a faulty error logger.
- Enhanced monitoring, error tracking, and end-to-end testing have been implemented to avoid recurrence; users experiencing ongoing issues should contact the team via email for further assistance.

OpenRouter (Alex Atallah) ▷ #general (402 messages🔥🔥):

Output image models timeline, OpenRouter prompt caching, Agent Hustle, GPT-4o, Free models rate limits

Output Image Models Incoming: Members discussed the arrival of output image models, anticipating their integration into platforms like OpenRouter with models like GPT-4o and Gemini.
- A member expressed excitement about switching directly to OpenRouter once these models are available, moving away from using Gemini’s.
Prompt Caching Savings at OpenRouter: OpenRouter supports prompt caching to save on inference costs, with most providers automatically enabling it; Anthropic requires enabling it on a per-message basis, as documented here.
- Users can inspect caching savings on the Activity page or via the API, with the cache_discount field indicating savings from cache usage.
Agent Hustle Project Overview: A member shared details about their project, Agent Hustle, a stock trading LLM that utilizes a TEE wallet to collect small fees on every transaction.
- The system strings together about 12 function calls in total, as exemplified here.
Concerns about rate limiting: Members reported experiencing rate limits on Google/Gemini-2.5-pro-exp-03-25:free, with one user receiving the error Rate limit exceeded, please try again 45906 seconds later.
- OpenRouter’s team clarified that rate limits can originate from Google or OpenRouter, and specifying providers limits OpenRouter’s ability to load balance effectively; check this documentation for rate limits.
OpenRouter Adds BYOK Fee: When using your own OpenAI API key with OpenRouter, a 5% fee is applied to the costs charged by OpenAI for each generation, which is then deducted from the user’s OpenRouter credits.
- This fee is applicable only on credits provided by the provider and not on credits used directly with upstream providers like AWS.

Links mentioned:

Discord: no description found
Discord: no description found
API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
GPT-4o - API, Providers, Stats: GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/open...
Prompt Caching - Optimize AI Model Costs with Smart Caching: Reduce your AI model costs with OpenRouter's prompt caching feature. Learn how to cache and reuse responses across OpenAI, Anthropic Claude, and DeepSeek models.
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Login || fal.ai: no description found
ChatGPT — Release Notes: 2025-March-27 - GPT-4o a new update: OpenAI just gave GPT-4o a new update. Based on content on OpenAI help page: https://help.openai.com/en/articles/6825453-chatgpt-release-notes GPT-4o feels more intuitive, creative, and collaborat...
OpenRouter API Reference - Complete Documentation: Comprehensive guide to OpenRouter's API. Learn about request/response schemas, authentication, parameters, and integration with multiple AI model providers.
ChatGPT-4o - API, Providers, Stats: OpenAI ChatGPT 4o is continually updated by OpenAI to point to the current version of GPT-4o used by ChatGPT. It therefore differs slightly from the API version of [GPT-4o](/models/openai/gpt-4o) in t...
Hastebin: no description found

not much happened today

Sat, Mar 29, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Fount AI Character Interactions Framework, Gideon project

Fount Framework for AI Character Interactions Emerges: A member shared the Fount project, an extensible framework for building and hosting AI character interactions using pure JS.
- The framework offers flexibility via modular components, custom AI source integration, powerful plugins, and a seamless cross-platform chat experience.
Gideon Project Surfaces on GitHub: A member shared the Gideon project on GitHub.
- No further details were provided about the project’s purpose or functionality.

Links mentioned:

GitHub - steve02081504/fount: An extensible framework for building and hosting AI character interactions. Built with pure JS, Fount offers unparalleled flexibility via modular components, custom AI source integration, powerful plugins, and a seamless cross-platform chat experience.: An extensible framework for building and hosting AI character interactions. Built with pure JS, Fount offers unparalleled flexibility via modular components, custom AI source integration, powerful ...
GitHub - Emperor-Ovaltine/gideon: Contribute to Emperor-Ovaltine/gideon development by creating an account on GitHub.

OpenRouter (Alex Atallah) ▷ #general (327 messages🔥🔥):

Gemini 2.5 Pro Access and Limitations, OpenRouter AI SDK Configuration, Free Models with Function Calling, Token Per Second Performance for Coding Models, OpenAI Responses API

Gemini 2.5 Pro: Rate Limits Trigger User Lament: Users are reporting low rate limits for Gemini 2.5 Pro, even after adding their own AI Studio API keys, leading to discussions on how to maximize free quota and manage usage for real applications.
- A member pointed out the model won’t be free forever so that windsurf has to start charging for it, that’s gonna be a problem.
AI SDK Provider options, nested order array is an ongoing struggle.: Members are actively debugging OpenRouter AI SDK provider options, particularly the use of providerOptions to specify model order and fallback behavior.
- The issue is around whether nesting the order array under the provider key is correct, with debugging attempts showing unexpected provider selection despite configured order. The team acknowledges it’s a bug and looking to address the AI SDK issue, hopefully.
Seeking Function Calling Nirvana in free LLMs: Members are searching for free models that support function calling, with some suggesting Mistral Small 3.1 and Gemini free models as potential options.
- Another member mentions, Gosh, I’m trying so hard to find a free model that supports function calling. I can’t find any!.
Model TPS face-off: Gemini Flash 2.0 vs the World: Members are debating the tokens per second (TPS) performance of various coding models, with Gemini Flash 2.0 mentioned for its speed, but also facing some criticisms of being trash, something about their hosting is messed up.
- Groq serves the 70B R1 distil at 600tok/s and one member chimed in that it isn’t good at coding imo.
OpenAI Responses API support?: A member inquired about OpenRouter supporting the OpenAI Responses API, and one of the OpenRouter team flagged a couple gotchas with it.
- The member asking wanted good image to video and the OpenRouter team suggested that Veo2 API is going to be your best bet for SOTA, but it’s about 50 cents per second of video.

Links mentioned:

Prompt Caching - Optimize AI Model Costs with Smart Caching: Reduce your AI model costs with OpenRouter's prompt caching feature. Learn how to cache and reuse responses across OpenAI, Anthropic Claude, and DeepSeek models.
Tweet from OpenRouter (@OpenRouterAI): To maximize your free Gemini 2.5 quota:1. Add your AI Studio API key in https://openrouter.ai/settings/integrations. Our rate limits will be a “surge protector” for yours.2. Set up OpenRouter in your ...
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
GitHub - OpenRouterTeam/ai: Build AI-powered applications with React, Svelte, Vue, and Solid: Build AI-powered applications with React, Svelte, Vue, and Solid - OpenRouterTeam/ai
GitHub - OpenRouterTeam/ai-sdk-provider: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of models through the OpenRouter chat and completion APIs.: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of models through the OpenRouter chat and completion APIs. - OpenRouterTeam/ai-sdk-provider
GitHub - OpenRouterTeam/ai-sdk-provider: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of models through the OpenRouter chat and completion APIs.: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of models through the OpenRouter chat and completion APIs. - OpenRouterTeam/ai-sdk-provider

not much happened today

Fri, Mar 28, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Gemini 2.5, OpenRouter tips, Cursor IDE integration

Gemini 2.5 Free Quota Maximization Tips: To maximize your free Gemini 2.5 quota, users should add their AI Studio API key in OpenRouter settings so that rate limits act as a surge protector.
- Additionally, users should set up OpenRouter in their favorite IDE, with a Cursor tutorial available, and use one-shot tickets.
Use OpenRouter for Free Gemini 2.5 Pro in Cursor AI: A member posted how to use OpenRouter to get Gemini 2.5 Pro in @cursor_ai for free.
- It was mentioned this was after scratching his head for 10 mins and seeing this a few times on X, and that it was working and solved this for them.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): To maximize your free Gemini 2.5 quota:1. Add your AI Studio API key in https://openrouter.ai/settings/integrations. Our rate limits will be a “surge protector” for yours.2. Set up OpenRouter in your …

OpenRouter (Alex Atallah) ▷ #general (268 messages🔥🔥):

Stripe security, Gemini 2.5 Pro, OpenRouter and OpenAI SDK compatibility, Deepseek R1 provider issues, OpenRouter provider routing

OpenRouter Doesn’t Store Card Info, Cites Stripe: Following a user email about a potential issue, OpenRouter confirmed they do not store any card information and that all payment processing is handled by Stripe.
- A member suggested contacting Stripe or the user’s bank for more information, mentioning that they’ve had no issues with Stripe after topping up every 2 weeks.
Navigating Gemini 2.5’s Low Capacity: Users reported receiving error messages like RESOURCE_EXHAUSTED when using the new Gemini 2.5 model, with Alex Atallah advising users to connect an AI Studio key to increase capacity.
- Members pointed out that Google provides the option to pay for increased capacity through AI Studio.
OpenRouter Aims for OpenAI SDK Parity: A user reported issues using tools with models like google/gemini-2.5-pro-exp-03-25:free via OpenRouter, despite it working with openai/gpt-4o-mini using Spring AI’s OpenAI support, questioning OpenRouter’s compatibility with the OpenAI SDK.
- A member confirmed that OpenRouter is supposed to be 100% compatible but suggested the user may have hit rate limits or that the specific model might not support tools, while others suggested trying Mistral Small 3.1 and Phi 3 models for testing.
Debugging Deepseek R1 Empty Responses: Users reported receiving empty API responses from the Chutes provider when using Deepseek R1 (Free), even after trying different keys and Targon.
- After debugging, setting max_tokens to 0 was identified as a potential cause, with a member suggesting setting it to a higher value, though the issue persisted even with increased token limits.
OpenRouter’s Provider Routing Fails to Route: A user debugged provider routing with the AI SDK, trying to route Gemini/Anthropic across Google/Bedrock/Anthropic, but observed that the order wasn’t being respected even when allow_fallbacks was set to false, with always ending up on Anthropic.
- The staff acknowledged the routing bug and thanked the user for finding the bug.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): To maximize your free Gemini 2.5 quota:1. Add your AI Studio API key in https://openrouter.ai/settings/integrations. Our rate limits will be a “surge protector” for yours.2. Set up OpenRouter in your ...
Tweet from Chris Dolinski (@1dolinski): how to use OpenRouter to get Gemini 2.5 Pro in @cursor_ai for freeafter scratching head for 10 mins and seeing this a few times on Xworking and solved this for us /thread
Tweet from Logan Kilpatrick (@OfficialLoganK): @dosco Please submit here: https://ai.google.dev/gemini-api/docs/rate-limits
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
DeepSeek V3 - API, Providers, Stats: DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported ev...
GitHub - OpenRouterTeam/ai-sdk-provider: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of models through the OpenRouter chat and completion APIs.: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of models through the OpenRouter chat and completion APIs. - OpenRouterTeam/ai-sdk-provider
GitHub - OpenRouterTeam/ai-sdk-provider: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of models through the OpenRouter chat and completion APIs.: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of models through the OpenRouter chat and completion APIs. - OpenRouterTeam/ai-sdk-provider
GitHub - OpenRouterTeam/ai: Build AI-powered applications with React, Svelte, Vue, and Solid: Build AI-powered applications with React, Svelte, Vue, and Solid - OpenRouterTeam/ai
Issue with API - Image on Pasteboard: no description found

OpenAI adopts MCP

Thu, Mar 27, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Model Comparison Feature, Side-by-Side Model Comparison

OpenRouter Rolls Out Model Comparison: OpenRouter announced a new feature enabling users to compare models and providers side-by-side, as noted in their tweet.
Chat Directly with Compared Models: The new feature allows users to directly engage with the compared models in a chatroom by clicking the “Chat” option.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): New feature: compare models side-by-side.You can now compare any two models and providers. Clicking “Chat” takes you to a chatroom with both.

OpenRouter (Alex Atallah) ▷ #general (312 messages🔥🔥):

Gemini 2.5 Pro, GPT-4o Image Generation, DeepSeek V3, OpenRouter Pricing, Stripe Payment Issues

Gemini 2.5 Pro: Hot Model, High Rate Limits: Users find Gemini 2.5 Pro impressive, especially for generating books, but are frustrated by low rate limits, with the official limit being 50 requests per 24 hours as per Google’s documentation.
- Despite the model’s high quality, some suggest falling back to paid models like Sonnets 3.7 and Flash 2.0 due to the restrictive limits and express interest in a paid API for increased usage.
OR Eyes API for Native Image Gen, GPT-4o Style: Following the release of GPT-4o’s native image generation, the community is asking about OpenRouter potentially adding API functionality for image generation calls.
- A staff member confirmed image generation support is actively under development, though image generation isn’t currently supported for OpenRouter and instead suggests alternatives like the Chutes provider.
DeepSeek V3: Fast and Furious (When China Sleeps): Members are discussing DeepSeek V3’s good price, optimized deployment, and speed, especially when China is asleep, and one also shared a test comparing Deepseek V3 vs Deepseek V3 0324.
- One member finds the provider competitive and notes that it is the best non-reasoning model on most tasks and another member finds the quality and prompt adherence of Fireworks is better but at a price.
Fireworks Basic Endpoint Gets The Boot: A member asked about the Fireworks Basic endpoint and a staff member said that Fireworks asked us to remove them temporarily.
- Another member wonders about adding tool usage for the Fireworks endpoint but a staff member only says that they can look into it.
OpenRouter Under Investigation, Card Data Breaches Possible: One member reported their card was compromised after using OpenRouter and speculated the issue was on their end due to OpenRouter using Stripe.
- The OpenRouter team is investigating, emphasizing they don’t store card info and rely on Stripe for payment processing and another member suggested contacting Stripe or the card-issuing bank for better answers.

Links mentioned:

no title found: no description found
Model Routing - Smart Model Selection and Fallback: Route requests dynamically between AI models. Learn how to use OpenRouter's Auto Router and model fallback features for optimal performance and reliability.
How to Pass safety_settings to OpenRouter (Bypass Unwanted Blocks): How to Pass safety_settings to OpenRouter (Bypass Unwanted Blocks) For Your Own Code To avoid getting blocked by restrictive safety features, add safety_settings to your OpenRouter request body (along...
Elevated errors on requests to some models: no description found
Deepseek V3 vs V3 0324: same prompt, same temperature, one shotV3V3 0324
Google Cloud gets simplified product launch stages | Google Cloud Blog: Google Cloud now has just two launch stages: Preview and General Availability

Gemini 2.5 Pro + 4o Native Image Gen

Wed, Mar 26, 2025

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Anthropic incident, Claude 3.7 Sonnet endpoints, Zero-Token Insurance, Google Gemini 2.5 Pro Experimental

Anthropic’s Claude goes offline, then back online: Claude 3.7 Sonnet endpoints experienced downtime, with Anthropic posting updates starting March 25, 2025, and resolving the incident by 8:41 PDT.
- The updates indicated that the downtime was due to maintenance and efforts to improve systems.
OpenRouter offers Zero-Token Insurance: OpenRouter is now offering zero-token insurance coverage to all models on the platform, potentially saving users over $18,000 per week.
- As OpenRouterAI stated, users will not be charged for responses with no output tokens and a blank or error finish reason, even if the provider still charges for prompt processing.
Gemini 2.5 Pro Experimental Goes Live: Google’s Gemini 2.5 Pro Experimental, a state-of-the-art model capable of advanced reasoning, coding, and mathematical tasks, is now available as a free model on OpenRouter.
- Gemini 2.5 Pro has 1,000,000 context, and achieves top-tier performance on benchmarks like the LMArena leaderboard.

Links mentioned:

Gemini Pro 2.5 Experimental (free) - API, Providers, Stats: Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. Run Gemini Pro 2.5 Experimental (free) with API
Tweet from OpenRouter (@OpenRouterAI): As the first and largest LLM router, we've seen virtually every possible quality issue from model providers and think there's a lot that can be done to make the ecosystem more friendly.Startin...
Elevated errors on Claude.ai and Console: no description found

OpenRouter (Alex Atallah) ▷ #general (196 messages🔥🔥):

Deepseek performance issues, Gemini 2.5 Pro release and benchmarks, Provisioning API keys for user management, OpenRouter activity log retention, GPT-4o image generation API support

DeepSeek Suffers Server Struggles: Users reported DeepSeek is becoming borderline unusable due to overcrowded servers, suggesting a need for price adjustments to manage demand.
- Members speculated whether the issues were related to peak usage times in China, but no direct solution was found beyond hoping for better hardware availability from Huawei.
Gemini 2.5 Pro Wows Early Testers: Gemini 2.5 Pro Experimental is now available at the API, with early testers impressed by its capabilities, especially in reasoning and coding, as detailed in Google’s blog post.
- One user exclaimed that Gemini 2.5 Pro destroys ChatGPT o3-mini-high, prompting discussions about whether its performance boost is solely due to benchmarks or reflects true improvements.
Provisioning API Keys Offer Granular User Control: OpenRouter now offers provisioning API keys, enabling developers to programmatically manage API keys for their users, set limits, and track spend, enhancing scalability and control, as documented here.
- This allows developers to create unique API keys for each user, streamlining billing and access management within their own platforms.
API Key Activity Logs Retained Infinitely: OpenRouter retains API key activity logs forever, allowing users to monitor usage per key, aiding in team environment evaluation and usage tracking.
- This feature addresses the need for streaming and visualizing API usage, providing detailed insights into each member’s consumption patterns.
GPT-4o Image Generation API on OpenRouter’s Radar: Following the rollout of GPT-4o’s native image generation, OpenRouter is actively developing API functionality for image generation calls to provide users access to equivalent functionalities without directly applying for individual APIs.
- This move aims to keep OpenRouter competitive and comprehensive, addressing the need for seamless integration of cutting-edge image generation capabilities.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
StarVector: no description found
Provisioning API Keys - Programmatic Control of OpenRouter API Keys: Manage OpenRouter API keys programmatically through dedicated management endpoints. Create, read, update, and delete API keys for automated key distribution and control.
Time in China now - Time.is: no description found
API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
Reasoning Tokens - Improve AI Model Decision Making: Learn how to use reasoning tokens to enhance AI model outputs. Implement step-by-step reasoning traces for better decision making and transparency.
Get API key — OpenRouter | Documentation: Returns details about a specific API key. Requires a Provisioning API key.
List API keys — OpenRouter | Documentation: Returns a list of all API keys associated with the account. Requires a Provisioning API key.
Gemini 2.5: Our most intelligent AI model: Gemini 2.5 is our most intelligent AI model, now with thinking.
Reddit - The heart of the internet: no description found
Fireworks - Fastest Inference for Generative AI: Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!

Halfmoon is Reve Image: a new SOTA Image Model from ex-Adobe/Stability trio

Tue, Mar 25, 2025

OpenRouter (Alex Atallah) ▷ #announcements (4 messages):

OpenAI o1-pro, Markdown Export, DeepSeek V3, Anthropic Outage

OpenAI’s o1-pro reasoning model now on OpenRouter: OpenAI’s o1-pro, a high-performance reasoning model designed for complex tasks, is now available on OpenRouter, priced at $150 per million input tokens and $600 per million output tokens, excelling in math, science, and programming.
- Try it out in the chatroom or via API!
Markdown Export Feature Debuts in Chatroom: OpenRouter now allows users to export chats to markdown, enhancing usability, as announced on X.
DeepSeek V3 Update Released for Free: The new DeepSeek V3 update is now available on OpenRouter for free, featuring a 685B-parameter, mixture-of-experts model with 131,072 context and performs really well on a variety of tasks, with production endpoint coming soon; see DeepSeek V3.
- It is the latest iteration of the flagship chat model family from the DeepSeek team.
Anthropic Services Experience Glitches (Resolved): OpenRouter investigated an issue with Anthropic as the provider for Claude 3.7 Sonnet, which has been escalated to the Anthropic team, with updates posted on Anthropic’s status page.
- The incident was related to errors on Claude.ai and the Anthropic Console and has since been resolved with services returning to normal.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): You can now export chats in OpenRouter to markdown!Quoting Tyler Angert (@tylerangert) someone at @OpenAI and @AnthropicAI please let me export a chat as markdown. maybe even xml separated too.
Elevated errors for Claude.ai, Console, and the Anthropic API: no description found
Discord: no description found
DeepSeek V3 0324 (free) - API, Providers, Stats: DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) mode...

OpenRouter (Alex Atallah) ▷ #general (440 messages🔥🔥🔥):

OpenAI o1-pro API Pricing, Gemini's Image Generation, Lambda Endpoint Issues, DeepSeek R1 Model

OpenAI’s o1-pro API Pricing: GucciAI?: A member expressed shock at the pricing of OpenAI’s o1-pro API, labeling it GucciAI due to its high cost of $150/M input tokens and $600/M output tokens.
- Another member joked that the slowness of the API prevents overspending, suggesting it might be intentionally priced high due to compute constraints.
Gemini’s Image Generation not supported, yet: A member inquired about using Gemini’s image generation with the gemini-2.0-flash-exp model via OpenRouter, asking about passing the responseModalities parameter.
- The response indicated that image generation is not yet supported on OpenRouter, but it’s on their roadmap, with no short term plan to add support for image models like Flux.
Lambda Endpoint Faces 404 Errors: Several members reported experiencing code 404 ‘no endpoint found’ errors when using Lambda models, despite Lambda’s status page indicating full operational status.
- One member suggested the issue might be DNS-related, while others confirmed that the Llama 3.3 70B Instruct | Lambda model was working for them.
DeepSeek R1 equals o1?: Members highlighted the DeepSeek R1 model, noting its performance is on par with OpenAI’s o1 but it is open-sourced.
- DeepSeek R1 is a 671B parameter model, with 37B active during inference, available under the MIT license for commercial use.
Sonnet overloaded and tired!: Users reported frequent overload errors with Claude 3.7 Sonnet, leading to cut-off responses and charges for input tokens.
- A member suggested using a retry strategy and also suggested switching to Gemini 2.0 Pro as a Sonnet replacement, noting Claude’s superior translation abilities.

Links mentioned:

imgur.com: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...
LLM Token Counter: no description found
Models - OpenAI Agents SDK: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Discord: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
R1 - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
Qwen2.5 VL 32B Instruct (free) - API, Providers, Stats: Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. Run Qwe...
o1-pro - API, Providers, Stats: The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently b...
Discord: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
deepseek-ai/DeepSeek-V3-0324 · Hugging Face: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Model Performance vs. Price: no description found
Provisioning API Keys - Programmatic Control of OpenRouter API Keys: Manage OpenRouter API keys programmatically through dedicated management endpoints. Create, read, update, and delete API keys for automated key distribution and control.
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
DeepSeek: R1 – Provider Status: See provider status and make a load-balanced request to DeepSeek: R1 - DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It&#...
A question on determinism: In my experiments so far, which have involved Python and P5.js (built on top of Javascript), I have been unable to obtain a single response/completion from the same prompt and parameter settings with ...
Alex GIF - Alex - Discover & Share GIFs: Click to view the GIF
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Elevated errors for Claude.ai, Console, and the Anthropic API: no description found
incident.io - Status pages: no description found
Grok Beta - API, Providers, Stats: Grok Beta is xAI's experimental language model with state-of-the-art reasoning capabilities, best for complex and multi-step use cases.It is the successor of [Grok 2](https://x. Run Grok Beta wit...
Mistral Small 3.1 24B - API, Providers, Stats: Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. Run Mistral Small 3.1 24B with API
GPT-4o (extended) - API, Providers, Stats: GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/open...
LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps
Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud: Discover how Fireworks AI Developer Cloud accelerates AI innovation with faster, optimized DeepSeek R1 deployments. Learn about new GPU options, improved speed, and enhanced developer tools for effici...

lots of little things happened this week

Sat, Mar 22, 2025

OpenRouter (Alex Atallah) ▷ #general (64 messages🔥🔥):

OpenRouter TTS, Ernie Models, Sambanova, Inferencenet, OpenAI audio models

OpenRouter to Offer TTS, Image Gen, at a Cost: A member expressed interest in OpenRouter offering TTS and image generation, but voiced concerns about potentially high pricing.
Groq vs Sambanova mixup: A member initially reported that Sambanova was down, but then corrected themselves, stating that it was Groq experiencing issues.
GPT-4o Arrives: A user noticed that GPT-4o-64k-output-alpha is available on OpenRouter, supporting both text and image inputs with text outputs at the cost of $6/M input tokens and $18/M output tokens.
Reasoning models usage data shared: A member published token usage data and thoughts on reasoning models, comparing them to traditional models.
Fireworks Reduces Pricing, Matching Performance: Fireworks lowered their pricing for R1 and V3, with V3 reportedly matching existing performance metrics, specifically .9/.9.

Links mentioned:

API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
GPT-4o (extended) - API, Providers, Stats: GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/open...
nghuyong (HuYong): no description found
OpenRouter FAQ: Find answers to commonly asked questions about OpenRouter's unified API, model access, pricing, and integration.
Fireworks - Fastest Inference for Generative AI: Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!
GitHub - mintsuku/sora: Sora is a Discord bot that integrates with the Open Router API to facilitate conversation in Discord servers.: Sora is a Discord bot that integrates with the Open Router API to facilitate conversation in Discord servers. - mintsuku/sora
Eyes GIF - Eyes Burning My Eyes - Discover & Share GIFs: Click to view the GIF

Promptable Prosody, SOTA ASR, and Semantic VAD: OpenAI revamps Voice AI

Fri, Mar 21, 2025

OpenRouter (Alex Atallah) ▷ #general (101 messages🔥🔥):

O1-Pro Pricing, LLM Chess Tournament, OpenRouter API Free Models, Groq API Issues, OpenAI's New Audio Models

O1-Pro Pricing Outrages Users: Users express shock at O1-Pro’s pricing structure, citing $150/month input and $600/month output costs as prohibitively expensive and insane.
- Some speculate that the high price is a response to competition from R1 and Chinese models, while others suggest it’s due to OAI combining multiple model outputs, without streaming support which leaves user wondering what they do.
LLM Chess Tournament Tests Raw Performance: A member created a second chess tournament to test raw performance, stripping away information and reasoning and using raw PGN movetext continuation and posted the results.
- Models are instructed to repeat the game sequence and add one new move, with Stockfish 17 evaluating accuracy; the first tournament with reasoning is available here.
OpenRouter API: How Free is Free?: A user found that the model field in the /api/v1/chat/completions endpoint is required, despite the documentation suggesting it’s optional, even when attempting to use free models.
- A user suggested it should be defaulting to your default model, but i suppose having no credits might break the default default model.
Groq Working Sporadically: Users reported that Groq is working in the OpenRouter chatroom but not through the API.
- A member inquired about the specific error encountered when using the API, emphasizing Groq’s speed.
OpenAI Launches New Audio Models!: OpenAI will later announce two new STT models (ala Whisper) and one new TTS model (gpt-4o-mini-tts).
- The announcement includes an audio integration with the Agents SDK, enabling the creation of more intelligent and customizable voice agents; the speech-to-text models are named gpt-4o-transcribe and gpt-4o-mini-transcribe.

Links mentioned:

OpenRouter API Reference - Complete Documentation: Comprehensive guide to OpenRouter's API. Learn about request/response schemas, authentication, parameters, and integration with multiple AI model providers.
Dubesor LLM Chess tournament 2: no description found

Every 7 Months: The Moore's Law for Agent Autonomy

Thu, Mar 20, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Anthropic Downtime, Claude 3.7 Sonnet Issues

Anthropic Model Experiences Downtime: Anthropic models experienced downtime, specifically with Claude 3.7 Sonnet.
Anthropic Services Recovering: Members noted that Anthropic services seem to be recovering after an outage.

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

Claude 3.5 Sonnet, OpenRouterGo SDK, Gemini 2.0 Pro EXP 02-05

Community Ranks Models on Cline Compatibility Board: A member created a Cline Compatibility Board for models, ranking them based on their performance with Cline.
- The board includes details such as API providers, plan mode, act mode, input/output costs, and max output for models like Claude 3.5 Sonnet and Gemini 2.0 Pro EXP 02-05.
OpenRouterGo SDK v0.1.0 goes live: A member announced the release of OpenRouterGo v0.1.0, a Go SDK for accessing OpenRouter’s API with a clean, fluent interface.
- The SDK includes automatic model fallbacks, function calling, and JSON response validation.
Gemini 2.0 Pro EXP-02-05 has glitches and rate limits: The Gemini-2.0-pro-exp-02-05 model on OpenRouter is confirmed to be functional, but experiences random glitches and rate limiting.
- It is available at 0 cost, with an output of 8192, according to the compatibility board.

Links mentioned:

Cline Compatibility Board: no description found
GitHub - eduardolat/openroutergo: Easy to use OpenRouter Golang SDK: Easy to use OpenRouter Golang SDK. Contribute to eduardolat/openroutergo development by creating an account on GitHub.

OpenRouter (Alex Atallah) ▷ #general (208 messages🔥🔥):

Gemini model RP stability, EXAONE-Deep-32B License Issues, max_completion_tokens vs max_tokens, ChatGPT-4o speed differences, Prompt Caching issues

Gemini models may be unstable for RP: One user found Gemini models like gemini-2.0-flash-lite-preview-02-05 and gemini-2.0-flash-001 to be unstable in roleplaying scenarios, exhibiting manic behavior, even with a temperature setting of 1.0, while another user claims 2.0 flash 001 had absolutely no problem.
- In contrast, another user reported absolutely no problem with 2.0 flash 001, finding it very coherent and stable at a temperature of 1.0.
EXAONE-Deep-32B is interesting but non-commercial: Members found the EXAONE-Deep-32B model interesting, but noted it has a terrible non-commercial license.
- They suggested the license needs to change for the model to gain traction.
max_completion_tokens is the same as max_tokens: max_completion_tokens is equivalent to max_tokens in the OpenRouter API.
- Using max_tokens ensures compatibility across all models, while using the OpenAI-specific parameter might not.
ChatGPT-4o slower due to cost optimization: ChatGPT-4o via the API is faster than the ChatGPT interface because OpenAI prioritizes cost savings on the latter.
- Perplexity’s first token is very fast, but ChatGPT is slow and inconsistent.
Prompt caching causing trouble?: A user experienced issues with prompt caching, paying 1.25x the price without proper cache hits, even after setting up provider routing and allow_fallbacks to false.
- After lots of debugging, the user resolved the issue without pinpointing the exact cause but thinks it might have to do with the order of adding the system prompt messages, but now it works.

Links mentioned:

Tweet from eric zakariasson (@ericzakariasson): it will use max context window (200k as of now), read more files on each tool call, and do 200 tool calls before stopping
Vision - Anthropic: no description found
Prompt Caching - Optimize AI Model Costs with Smart Caching: Reduce your AI model costs with OpenRouter's prompt caching feature. Learn how to cache and reuse responses across OpenAI, Anthropic Claude, and DeepSeek models.
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
GitHub - eduardolat/openroutergo: Easy to use OpenRouter Golang SDK: Easy to use OpenRouter Golang SDK. Contribute to eduardolat/openroutergo development by creating an account on GitHub.

not much happened today

Wed, Mar 19, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Endpoint Quality Measurement

OpenRouter Probes Endpoint Quality Metrics: The OpenRouter team is exploring ways to measure endpoint quality and seeking community input on the matter.
- Note: The team is just researching ideas and aren’t committing to anything yet.
Community Input Sought on Endpoint Measurement: OpenRouter is researching methods for evaluating endpoint quality and values community perspectives.
- This is purely exploratory; there is no commitment to specific implementations at this stage.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Cline Compatibility Board, Claude 3.5 Sonnet, Gemini 2.0 Pro Exp

Community Ranks Cline Model Compatibility: A member created a Cline compatibility board for models, ranking their performance, and plans to update it over time.
- The board lists exact model names, API providers, plan modes, act modes, input costs, output costs, and max output tokens.
Claude 3.5 Sonnet Officially Supported: Claude 3.5 Sonnet has official support in Plan and Act modes via Cline, Requesty, OpenRouter, Anthropic, and VS Code LM API, with input costs at $3.00/M and output at $15.00/M, capped at 8192 tokens.
- The same support and pricing extend to Claude 3.7 Sonnet as well.
Gemini 2.0 Pro Exp Glitches into Cline: Gemini-2.0-pro-exp-02-05 is working with some random glitches and rate limiting on Cline, OpenRouter, and Gemini.

Link mentioned: Cline Compatibility Board: no description found

OpenRouter (Alex Atallah) ▷ #general (274 messages🔥🔥):

Mistral 3.1 Small Launch, OpenRouter vs LLM provider's API, Function/tool calling on Openrouter, Cost usage query in script, OpenAI Agents SDK with OpenRouter API

Mistral 3.1 Small Launches First on OpenRouter: OpenRouter is the first provider to launch Mistral Small 3.1 24B Instruct, an upgraded variant of Mistral Small 3 with advanced multimodal capabilities and a 128k token context window for $0.1/M input tokens and $0.3/M output tokens, and $0.926/K input images: OpenRouter Announcement.
- It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support, optimized for efficient local inference and use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments.
OpenRouter API doesn’t support Multi Modal API and Embeddings: Members noted that the Openrouter API doesn’t recognize phi4-mm as multi modal, which was resolved by using the correct name microsoft/phi-4-multimodal-instruct, but there is still no support for Speech-to-text API like Whisper and embeddings at this time, as it’s exclusively a text API.
- It has been clarified that Input: Text + image (only on models that support it), Output: text
Cerebras specialized AI chip makes Perplexity Fast: Cerebras Systems and Perplexity AI are partnering to deliver near-instantaneous AI-powered search results via Perplexity’s new Sonar model, which runs on Cerebras’s specialized AI chips at 1,200 tokens per second, built on Meta’s Llama 3.3 70B foundation.
- Members confirmed that Google’s Gemini and Vertex delivers decent speed, but not near the speed of Groq, SambaNova and Cerebras.
OpenRouter API website encounters problems: A member reported the OpenRouter API website displayed a plain white screen and could not log out.
- Others were not able to reproduce the error, but a member suggested that it was related to ongoing changes for account state as they introduce teams/org accounts.
Fixes to Prompt Caching Are Making People Lazy: Prompt caching in anthropic API writes at a 1.25x price and hits at 0.1x, but OR is always 1.25x so cache is only writing, not hitting or reading, with someone saying that AI is making me lazy, and im not interested in knowing anymore.
- Someone who asked Claude to rewrite code in the OpenRouter class and said I forgot how to code. If caching is applied automatically, you just have to wait while using the promptWell the way it works in anthropic api is: you just send this payload twice, first time it writes for 1.25x price and then second time it is only 0.1x the price (the part that “hits”) but with OR im always paying for the 1.25x Which basically makes the cache even worse I don’t know how to use the cache You can ask Toven

Links mentioned:

Using the Lambda Inference API - Lambda Docs: Using the Lambda Inference API
Mistral Small 3.1 24B - API, Providers, Stats: Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. Run Mistral Small 3.1 24B with API
Models | OpenRouter: Browse models on OpenRouter
Prompt caching - Anthropic: no description found
Prompt Caching - Optimize AI Model Costs with Smart Caching: Reduce your AI model costs with OpenRouter's prompt caching feature. Learn how to cache and reuse responses across OpenAI, Anthropic Claude, and DeepSeek models.
Lambda | OpenRouter: Browse models provided by Lambda
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
Cerebras-Perplexity deal targets $100B search market with ultra-fast AI: Cerebras and Perplexity AI launch ultra-fast Sonar search model running at 1,200 tokens per second, challenging traditional search engines.
So Boring Gill GIF - So Boring Gill Engvid - Discover & Share GIFs: Click to view the GIF
Why Whyyy GIF - Why Whyyy Neden - Discover & Share GIFs: Click to view the GIF
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.

Cohere's Command A claims #3 open model spot (after DeepSeek and Gemma)

Tue, Mar 18, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Anthropic Incident, Claude 3.7 Sonnet, Endpoint Quality Measurement

Anthropic Declares Sonnet’s Error Spike Incident Resolved: Anthropic declared an incident (status page) related to significantly elevated errors for requests to Claude 3.7 Sonnet from 21:45–22:14 UTC, Mar 14, 2025.
- The incident affected claude.ai, console.anthropic.com, and api.anthropic.com.
Anthropic Explores Endpoint Quality Gauges: Anthropic is researching ideas to measure endpoint quality and is open to community input.
- No commitments were made as the team is just researching ideas.

Link mentioned: Elevated errors for requests to Claude 3.7 Sonnet: no description found

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

Personality.gg Launch, RP Sites and OpenRouter API, Chub and Sillytavern Recommendation

Personality.gg launches new AI character platform: Personality.gg launched a new platform to create, chat, and connect with AI characters using models like Claude, Gemini, and Personality-v1, featuring custom themes, full chat control, and NSFW allowance.
- The platform offers flexible, affordable plans and encourages users to join their Discord for updates.
RP Site Seeks OpenRouter API Support: A member inquired about roleplay (RP) or novel sites that support the OpenRouter API, expressing dissatisfaction with Novelcrafter’s stability and Janitor AI’s context limitations.
- They cited NovelAI always crashing and Janitor AI limited to only 128k context as reasons for seeking alternatives.
Chub and Sillytavern advised for RP: A member recommended Chub or Sillytavern (local web frontend) as alternatives for roleplaying.
- The member positioned Sillytavern as a local webend option to overcome the limitations of other platforms.

Links mentioned:

no title found: no description found
no title found: no description found

OpenRouter (Alex Atallah) ▷ #general (443 messages🔥🔥🔥):

Gemma 3, RP models, Mistral Small 3.1, OpenRouter OpenAPI spec, Reasoning Tokens

Parasail Hosts New RP Models: Parasail is looking to host new roleplay models on OpenRouter and is proactively working with creators like TheDrummer to host new fine-tunes of models like Gemma 3 and QwQ.
- They are seeking individuals who create strong RP fine-tunes capable of handling complex instructions and worlds, with a particular interest in models that have been fine-tuned for roleplay and creative writing.
Anthropic API Outage Disrupts Claude 3 Sonnet: Requests to Claude 3.7 Sonnet experienced significantly elevated errors for approximately 30 minutes, as reported on Anthropic’s status page.
- The issue has been resolved, and success rates have returned to normal as of March 14, 2025, but some users experienced no text on replies while still being charged.
OpenRouter API Rate Limits Explained: OpenRouter’s rate limits depend on your credits, with approximately 1 USD equating to 1 RPS (requests per second), as clarified in the documentation.
- Users can check their rate limit and remaining credits by making a GET request to https://openrouter.ai/api/v1/auth/key, and while higher credit purchases enable higher rate limits, creating additional accounts or API keys makes no difference.
New Steelskull L3.3 R1 70B Model Launches: A new roleplaying model, Steelskull L3.3 R1 70B, has launched on OpenRouter, incorporating several models like TheSkullery’s L3.1x3.3-Hydroblated-R1-70B-v4.4.
- The announcement encourages users to provide feedback on desired models, continuing the push for competitively priced RP options.
Mistral Small 3.1 Available: The Mistral Small 3.1 24B Instruct model has launched on OpenRouter, featuring multimodal capabilities and a 128k context window, according to Mistral’s announcement.
- It outperforms comparable models like Gemma 3 and GPT-4o Mini, while delivering inference speeds of 150 tokens per second.

Links mentioned:

Tweet from Anthropic (@AnthropicAI): We've made several updates to the Anthropic API that help developers process more requests and reduce token usage with Claude 3.7 Sonnet.
Tweet from Baidu Inc. (@Baidu_Inc): We've just unveiled ERNIE 4.5 & X1! 🚀As a deep-thinking reasoning model with multimodal capabilities, ERNIE X1 delivers performance on par with DeepSeek R1 at only half the price. Meanwhile, ERNI...
Mistral Small 3.1 | Mistral AI: SOTA. Multimodal. Multilingual. Apache 2.0
So Boring Gill GIF - So Boring Gill Engvid - Discover & Share GIFs: Click to view the GIF
Why Whyyy GIF - Why Whyyy Neden - Discover & Share GIFs: Click to view the GIF
Mistral Small 3.1 24B - API, Providers, Stats: Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. Run Mistral Small 3.1 24B with API
API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
Elevated errors for requests to Claude 3.7 Sonnet: no description found
Google: Gemma 3 27B (free): Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, ...
LLM Rankings: programming | OpenRouter: Language models ranked and analyzed by usage for programming prompts
LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps
Model Request | Parasail: Request Models - Please Put in the Hugging Face Model and any other information!
LLM Leaderboard 2025 - Compare LLMs: Comprehensive AI (LLM) leaderboard with benchmarks, pricing, and capabilities. Compare leading LLMs with interactive visualizations, rankings and comparisons.
Reasoning Tokens - Improve AI Model Decision Making: Learn how to use reasoning tokens to enhance AI model outputs. Implement step-by-step reasoning traces for better decision making and transparency.
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
BigHuggyD/SteelSkull_L3.3-Electra-R1-70b-FP8-Dynamic · Hugging Face: no description found
Anthropic: Claude 3.5 Sonnet – Recommended Parameters: Check recommended parameters and configurations for Anthropic: Claude 3.5 Sonnet - New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. S...
GitHub - openai/openai-python: The official Python library for the OpenAI API: The official Python library for the OpenAI API. Contribute to openai/openai-python development by creating an account on GitHub.
GitHub - openai/openai-openapi: OpenAPI specification for the OpenAI API: OpenAPI specification for the OpenAI API. Contribute to openai/openai-openapi development by creating an account on GitHub.

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

eofr: Scam

not much happened today

Sat, Mar 15, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Cohere Command A, Jamba 1.6 Large, Jamba 1.6 Mini, Gemma 3 models, Anthropic incident

Cohere Commands Attention: A new open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases, Cohere Command A is now available.
AI21 Jamba Jams with New Models: AI21 released Jamba 1.6 Large featuring 94 billion active parameters and a 256K token context window, while also launching Jamba 1.6 Mini, with 12 billion active parameters, both supporting structured JSON output and tool-use capabilities.
Gemma Gems Gleam for Free: All variations of Gemma 3 are now available for free: Gemma 3 12B introduces multimodality, supporting vision-language input and text outputs, handling context windows up to 128k tokens and understands over 140 languages, as well as Gemma 3 4B and Gemma 3 1B.
Anthropic API Anomaly Averted: Anthropic declared an incident of elevated errors for requests to Claude 3.7 Sonnet, with updates posted on their status page.

Links mentioned:

Elevated errors for requests to Claude 3.7 Sonnet: no description found
Discord: no description found
Discord: no description found
Discord: no description found
Gemma 3 12B - API, Providers, Stats: Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, ...
Gemma 3 4B - API, Providers, Stats: Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, ...
Gemma 3 1B - API, Providers, Stats: Gemma 3 1B is the smallest of the new Gemma 3 family. It handles context windows up to 32k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including...

OpenRouter (Alex Atallah) ▷ #general (67 messages🔥🔥):

OR ChatGPT model, OpenRouter model icons, Deepseek v3 issues, OLMO-2, Cohere repetition penalties

ChatGPT-4o-latest Price Higher Than Expected: The chatgpt-4o-latest model is up to date but is slightly more expensive than the normal 4o model.
OpenRouter Model Icons Not Available via API: The icons for the models are not available in the /api/v1/models response, instead using website favicons.
Deepseek v3 Model Gives Zero Token Issues: Sometimes the inference stack just returns zero completion tokens, and OpenRouter still gets charged by the upstream provider.
OLMO-2 Model Hosted on OpenRouter: OLMo-2 is coming online through Parasail; someone will spin it up and notify OpenRouter to add it.
AI Chess Tournament Hosted with OpenRouter: An AI chess tournament featuring 15 models was created, models are fed information in standard chess notations about the board state, the game history, and a list of legal moves to compete against each other here.

Links mentioned:

Dubesor LLM Chess tournament: no description found
Discord: no description found
GPT-3.5 Turbo Instruct - API, Providers, Stats: This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Run GPT-3.5 Turbo Instruct with API
Model Request | Parasail: Request Models - Please Put in the Hugging Face Model and any other information!

not much happened today

Fri, Mar 14, 2025

OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

Gemma 3, Reka Flash 3, Llama 3.1 Swallow 70B, Anthropic downtime, OpenAI web search models

Gemma 3 has arrived!: Google released Gemma 3 (free), a multimodal model that supports vision-language input and text outputs, featuring a 128k token context window and improved capabilities in over 140 languages.
- According to Google, Gemma 3 27B is the successor to Gemma 2 and includes improved math, reasoning, and chat capabilities, along with structured outputs and function calling.
Reka Flash 3 released under Apache 2.0: Reka Flash 3 (free) is a 21 billion parameter LLM with a 32K context length, excelling at general chat, coding, instruction-following, and function calling, optimized through reinforcement learning (RLOO).
- It supports efficient quantization (down to 11GB at 4-bit precision), uses explicit reasoning tags, and is licensed under Apache 2.0, but is primarily an English model with limited multilingual understanding.
Llama 3.1 Swallow 70B soars into view: A new, superfast Japanese-capable model, Llama 3.1 Swallow 70B (link) is released.
- OpenRouter characterized this model as a smaller model with high-performance.
Anthropic Provider briefly dips out: OpenRouter reported downtime from Anthropic as a provider, escalating the issue and providing updates.
- OpenRouter later reported that the issue was fully recovered.
Developers get the OpenRouter TLC: OpenRouter shared three useful developer guides and doc updates (link): a guide for using MCP servers, a guide for doing tool calls with an agentic loop example, and better docs for programmatic keys and OAuth.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Three useful developer guides and doc updates:1/ Guide to using MCP servers with OpenRouter: https://openrouter.ai/docs/use-cases/mcp-servers
Tweet from OpenRouter (@OpenRouterAI): New models today: Reka Flash 3, Google Gemma 3Two smaller but high-performing models, both free! 🎁
Tweet from OpenRouter (@OpenRouterAI): Try the first two web-enabled models from OpenAI 🌐GPT-4o-mini compared with Perplexity Sonar:
Gemma 3 27B - API, Providers, Stats: Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, ...
Flash 3 - API, Providers, Stats: Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function ca...
Discord: no description found

OpenRouter (Alex Atallah) ▷ #general (161 messages🔥🔥):

Flash model issues, OpenRouter API delay, Gemma model performance, Gemini 2 Flash native image output, Chutes free inference

Flash Model Feels Wonky Before Stable Release: Users speculate that Flash model is behaving strangely due to the imminent stable release, consistently making errors in previously successful prompts.
- It’s making a lot of strange mistakes consistently in a prompt it was fine with for months.
OpenRouter API Accounting Glitch: Users reported that the OpenRouter API for retrieving request details returns a 404 error immediately after the request finishes, requiring a wait time.
- The team is working on adding built-in accounting to the stream end to eliminate the need for this API.
Gemini 2 Flash Gets Native Image Output: Google AI Studio has released an experimental version of Gemini 2.0 Flash with native image output, accessible via the Gemini API and Google AI Studio.
- This new capability combines multimodal input, enhanced reasoning, and natural language understanding to create images.
Cohere Drops Command A, Battles GPT-4o: Cohere introduced Command A, claiming it is on par or better than GPT-4o and DeepSeek-V3 across agentic enterprise tasks with significantly greater efficiency, see Cohere Blog.
- The new model aims for maximum performance across agentic tasks with minimal compute requirements and competes with GPT-4o.
OpenAI Calls for Bans on PRC-Produced Models: OpenAI proposed banning models from PRC-supported operations, labeling DeepSeek as state-subsidized and state-controlled, raising concerns for US companies serving these models, see TechCrunch article

Links mentioned:

Using MCP Servers with OpenRouter: Learn how to use MCP Servers with OpenRouter
Gemini 2.0 Flash Thinking Experimental 01-21 (free) - API, Providers, Stats: Gemini 2.0 Flash Thinking Experimental (01-21) is a snapshot of Gemini 2. Run Gemini 2.0 Flash Thinking Experimental 01-21 (free) with API
Tweet from cohere (@cohere): We’re excited to introduce our newest state-of-the-art model: Command A!Command A provides enterprises maximum performance across agentic tasks with minimal compute requirements.
Introducing Command A: Max performance, minimal compute: Cohere Command A is on par or better than GPT-4o and DeepSeek-V3 across agentic enterprise tasks, with significantly greater efficiency.
Tweet from OpenRouter (@OpenRouterAI): Three useful developer guides and doc updates:1/ Guide to using MCP servers with OpenRouter: https://openrouter.ai/docs/use-cases/mcp-servers
Tool & Function Calling - Use Tools with OpenRouter: Use tools (or functions) in your prompts with OpenRouter. Learn how to use tools with OpenAI, Anthropic, and other models that support tool calling.
Experiment with Gemini 2.0 Flash native image generation: no description found
Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM: no description found
GPT-4o-mini Search Preview - API, Providers, Stats: GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries. Run GPT-4o-mini Search Preview with API
GPT-4o Search Preview - API, Providers, Stats: GPT-4o Search Previewis a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries. Run GPT-4o Search Preview with API
Home - Anthropic: no description found
Contact Anthropic: Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
OpenAI calls DeepSeek 'state-controlled,' calls for bans on 'PRC-produced' models | TechCrunch: In a proposal, OpenAI describes DeepSeek as 'state-controlled,' and recommends banning models from it and other PRC-affiliated operations.

Gemma 3 beats DeepSeek V3 in Elo, 2.0 Flash beats GPT4o with Native Image Gen

Thu, Mar 13, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Gemma 3, Reka Flash 3, Llama 3.1 Swallow 70B, Multimodality, Vision-language input

Gemma 3 Joins the OpenRouterverse: Google introduces Gemma 3 on OpenRouter, a multimodal model with vision-language input and text output, supporting context windows up to 128k tokens and understanding over 140 languages with improved math, reasoning, and chat capabilities, including structured outputs and function calling, succeeding Gemma 2.
- The model is accessible for free.
Reka Flash 3 - A Flash of Brilliance: Reka releases Flash 3, a 21 billion parameter language model, excelling in general chat, coding, and function calling, featuring a 32K context length and optimized via reinforcement learning (RLOO), with weights under the Apache 2.0 license.
- The model is released for free and is primarily an English model.
Llama 3.1 Swallow 70B Flies In: A new superfast Japanese-capable model named Llama 3.1 Swallow 70B joins OpenRouter, expanding the platform’s language capabilities.
- This addition complements the launch of Reka Flash 3 and Google Gemma 3, enhancing the variety of language processing tools available on OpenRouter.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): New models today: Reka Flash 3, Google Gemma 3Two smaller but high-performing models, both free! 🎁
Gemma 3 27B - API, Providers, Stats: Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, ...
Flash 3 - API, Providers, Stats: Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function ca...
Discord: no description found

OpenRouter (Alex Atallah) ▷ #general (85 messages🔥🔥):

Gemini 2 Flash, Gemma Models, Chutes Provider, Provider Routing, Qwen finetune issues

Gemini 2 Flash offers Native Image Output for Experimentation: Google is making native image output in Gemini 2.0 Flash available for developer experimentation across all regions currently supported by Google AI Studio, via the Gemini API and an experimental version (gemini-2.0-flash-exp).
- It combines multimodal input, enhanced reasoning, and natural language understanding to create images from text and images together; it can tell a story and illustrate it with pictures, keeping the characters consistent.
OpenRouter Provides Free Inference with Chutes for Now: The Chutes provider is currently free for OpenRouter users specifically as they were preparing their services and scaling up; they do not currently have a fully implemented payment system, so they will continue to offer it for free through OR until they are ready for payment.
- It was noted that they aren’t explicitly training on your data, but due to it being a decentralized compute provider, OpenRouter cannot guarantee that the compute hosts don’t do something with your data.
OpenRouter Provider Routing Offers Customization for Requests: OpenRouter routes requests to the best available providers for your model, but users can customize how their requests are routed using the provider object in the request body for Chat Completions and Completions.
- By default, requests are load balanced across the top providers to maximize uptime and prioritize price, but if you are more sensitive to throughput than price, you can use the sort field to explicitly prioritize throughput.
User Reports Qwen Finetune Freaking Out: A user reported a Qwen finetune freak out and start endless outputting gibberish, and after they killed the script, the invocation hadn’t appeared on the OR activity page.
- The user was worried that it might output 32k tokens of junk and bill them for that.
Users discuss Native Image Generation Access and Gemma Model Performance: Some users have gained access to native image generation, while others are still waiting, with one user quipping wasn’t 4o supposed to have native image out too and they never shipped it lol
- One user considers the Gemma 3 27B model sort of good, stating that they prefer it over QwQ 32b for local use due to QwQ’s tendency to output reasoning before results.

Links mentioned:

Gemini 2.0 Flash Thinking Experimental 01-21 (free) - API, Providers, Stats: Gemini 2.0 Flash Thinking Experimental (01-21) is a snapshot of Gemini 2. Run Gemini 2.0 Flash Thinking Experimental 01-21 (free) with API
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
Experiment with Gemini 2.0 Flash native image generation: no description found

The new OpenAI Agents Platform

Wed, Mar 12, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

FAQ Page, Quality of Life Updates

OpenRouter launches FAQ page: OpenRouter launched a FAQ page to address common questions.
Quality of Life Updates: A small quality of life update was released.

OpenRouter (Alex Atallah) ▷ #general (133 messages🔥🔥):

Gemini 2.0 Flash, OpenAI dev-facing reveal, AYA vision by cohere, Parameter calculation removal, DeepSeek-R1 API issues

Gemini 2.0 image generation is out: Gemini 2.0 Flash Experimental now caps at 32k context, no way to use code execution, search grounding, or function calling.
- When you hit get code, you get the code for saving the image under gemini-2.0-flash-exp as shown here.
OpenAI is revealing dev-facing at 10 AM PT: Members expect OpenAI is revealing something dev-facing at 10 AM PT.
- Others speculated about it, based on this post mentioning the Responses API.
Cohere’s AYA vision and OpenRouter: Members asked if OpenRouter will support AYA vision by Cohere and any other Cohere models.
- It appears AYA Expanse models (8B and 32B) on the API are charged at $0.50/1M Tokens for Input and $1.50/1M Tokens for Output, but this is still unconfirmed as seen here.
Parameter calculation removed: OpenRouter removed the parameter calculation because it wasn’t super accurate and thought it might be more misleading than useful.
- The team said they’ll likely do some sort of manual curation and bring it back later when they revamp some stuff, as parameters are hard to tune and might as well be ancient runes.
Gemma 3-27b is rumored to be on its way: Members are speculating about the imminent release of Gemma 3-27b.
- It’s expected to launch during the Gemma Dev Day in Paris, and it should ship with weights.

Links mentioned:

Discord: no description found
OpenRouter FAQ: Find answers to commonly asked questions about OpenRouter's unified API, model access, pricing, and integration.
Reddit - Dive into anything: no description found
no title found: no description found
no title found: no description found
Cohere | OpenRouter: Browse models provided by Cohere

DeepSeek's Open Source Stack

Sat, Mar 8, 2025

OpenRouter (Alex Atallah) ▷ #general (144 messages🔥🔥):

Perplexity API copyright issues, OpenRouter latency with Anthropic API, Groq provider in OpenRouter, Gemini embedding model, Testing reasoning parameter in OpenRouter models

Perplexity API Copyright Indemnification: Users are discussing potential copyright issues with using the Perplexity API due to its scraping of copyrighted content, noting that Perplexity’s API terms require customers to indemnify Perplexity, shifting liability onto the user.
- Alternatives with IP indemnification, like OpenAI, Google Cloud, and AWS Bedrock, were mentioned with links to their respective terms, advising users to assess legal risks.
Sonar Deep Research models experiencing errors: Members reported Perplexity Sonar Deep Research model experiencing frequent errors, high latency (up to 241 seconds for the first token), and unexpectedly high reasoning token counts.
- One member humorously noted a 137k reasoning token count with no output, while another confirmed it eventually started working after initial issues.
New Experimental Gemini Embedding Text Model unveiled: A new experimental Gemini Embedding Text Model (gemini-embedding-exp-03-07) is available in the Gemini API, surpassing the previous state-of-the-art model.
- This model achieves the top rank on the Massive Text Embedding Benchmark (MTEB) Multilingual leaderboard and includes new features like longer input token length.
OpenRouter’s reasoning parameter has inconsistencies: Users discovered inconsistencies in OpenRouter’s reasoning parameter, with some models marked as supporting reasoning despite endpoints lacking support and some providers not returning reasoning outputs.
- Members are conducting tests, discovering configuration issues, and identifying discrepancies between the models and endpoints, with Cloudflare noted to lack a /completions endpoint.
Model struggles with Russian language prompts: A user reported Claude 3.7 struggling with Russian language prompts, responding in English and potentially misunderstanding the nuances of the language.
- This was observed while using cline with OpenRouter, suggesting the issue might be with Anthropic rather than the extension or OpenRouter itself.

Links mentioned:

Provider Integration - Add Your Models to OpenRouter: Learn how to integrate your AI models with OpenRouter. Complete guide for providers to make their models available through OpenRouter's unified API.
no title found: no description found
no title found: no description found
State-of-the-art text embedding via the Gemini API: no description found
no title found: no description found
no title found: no description found
Customer Copyright Commitment Required Mitigations: Customer Copyright Commitment Required Mitigations for Azure OpenAI Service
Build Generative AI Applications with Foundation Models - Amazon Bedrock FAQs - AWS: no description found

not much happened today

Fri, Mar 7, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

QwQ 32B Model, Reasoning Update, OAuth User ID, GitHub Authentication, OpenAI Provider Downtime

QwQ 32B Model Goes Live on OpenRouter!: The QwQ 32B model is now live with two free endpoints and a fast endpoint (410 tokens/sec) from Grok.
Reasoning Included By Default: An update has been rolled out to include reasoning by default whenever a model thinks before writing a completion.
OAuth User ID Feature Added: A new field, user_id, has been added to the OAuth key creation flow, so that app developers can create more personalized experience for their users.
GitHub Authentication Enabled: Users can now use GitHub as an authentication provider on OpenRouter!
OpenAI Provider Experiences Downtime: OpenRouter reported downtime on their OpenAI Provider models and indicated that the issue had been resolved in under an hour.

Link mentioned: Discord: no description found

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Android Chat App, Customizable LLMs, OpenRouter Integration, Speech To Text, Text To Image

Taiga Releases Open-Source Android Chat App: A member released an open-source Android chat app named Taiga that allows users to customize LLMs.
- The app features OpenRouter integration and has plans to add local Speech To Text (based on Whisper model and Transformer.js), Text To Image support, and TTS support based on ChatTTS.
Taiga’s Next Steps: Speech-to-Text and More: The developer plans to integrate local Speech-to-Text using the Whisper model and Transformer.js into the app.
- Future updates also include adding Text-to-Image support and TTS support based on ChatTTS to enhance the app’s functionality.

Link mentioned: Releases · Ayuilos/Taiga: Taiga is an open-source mobile AI chat app that supports customizing LLM providers. - Ayuilos/Taiga

OpenRouter (Alex Atallah) ▷ #general (112 messages🔥🔥):

OpenRouter API issues, Deepseek instruct format, Mistral OCR launch, Usage based charging app, Default prompt feature

Troubleshooting OpenRouter API Shenanigans: Members discuss API issues related to prefill, instruct tags, and the correct format for multi-turn conversations, especially concerning DeepSeek models.
- It was noted that DeepSeek doesn’t recommend multi-turn conversations on their HF page for R1 and suggests prefilling with <think>\n.
DeepSeek’s Tokenizer Configuration Exposed: A member shared the tokenizer config for DeepSeek V3, revealing the use of <｜begin of sentence｜> and <｜end of sentence｜> tokens.
- It was clarified that add_bos_token is true while add_eos_token is false, and provided a shorturl in case Hugging Face does not load.
LLMGuard eyes OpenRouter Integration: A member inquired about plans to integrate open-source projects like LLMGuard (llm-guard.com) via API to scan for prompt injections and PII.
- It was suggested that this could enable PII anonymization before sending data to the provider, but another member pointed out it would be more useful if run directly on the caller.
Groq Pricing Anomalies Spark Debate: Users noticed pricing and speed differences between Groq’s QwQ, Coder, R1 Distill, and plain models, displayed in a shared image.
- Measurements indicated that Coder and QwQ have similar speeds, while R1 Distill and plain have hard caps, possibly to prioritize enterprise customers.
Google Sunsetting Pre-Gemini 2.0 Models: Google has announced discontinuation dates for pre-Gemini 2.0 models on Vertex AI, which are scheduled from April to September 2025.
- The models include PaLM, Codey, Gemini 1.0 Pro, Gemini 1.5 Pro/Flash 001/002, and select embeddings models.

Links mentioned:

tokenizer_config.json · deepseek-ai/DeepSeek-V3 at main: no description found
Anonymize - LLM Guard: no description found
Ban Competitors - LLM Guard: no description found
Mistral OCR | Mistral AI: Introducing the world’s best document understanding API.
undefined | Mistral AI: no description found
no title found: no description found
DeepSeek-V3 Technical Report: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepS...

not much happened today

Wed, Mar 5, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

BYOK Errors, API Key Issues

BYOK Requests Encounter Errors: Most BYOK requests (for users who attached their own API key in settings) were showing errors for the past 30 minutes.
- The relevant change was reverted, and the team is adding extra safeguards to prevent this from happening again.
Mitigating API Key Errors: A recent issue caused errors for users using their own API keys (BYOK) in settings.
- The team has reverted the problematic change and is implementing additional safeguards to ensure stability for user-provided API keys.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Price Estimates, Total Price Combinations

Estimating Total Price Combinations: A member shared a method to reveal a total price with huge input and output, allowing for a quick and easy price estimate.
Revealing Prices: The goal is to reveal a total price, even with significant input and output variables.
- This method aims to provide a relatively quick and easy price estimation process.

OpenRouter (Alex Atallah) ▷ #general (105 messages🔥🔥):

OpenRouter Provider Routing, Strongest AI For bookmark processing, Flash 2.0 vs GPT-4o-mini, Inception AI Diffusion Models, Anthropic 502 Overload Errors

OpenRouter Provider Routing Configuration: A user needed to route requests through a specific provider and was instructed to modify the API request body with a provider object, specifying the desired provider(s) in the order array and setting allow_fallbacks to false, as documented in the OpenRouter docs.
- It was emphasized that the provider name must exactly match the name listed on the OpenRouter model page (e.g., Nebius), and quotes are required around provider names in the JSON.
Groq-3 and GPT-4.5 Lead in bookmark processing: For processing a large number of bookmarks, Groq-3 (no API) and GPT-4.5 (expensive) are recommended, with DeepSeek v3 and Claude 3.7 as runner-ups, but another user indicated that ChatGPT (likely GPT-4o via the web interface) was able to accomplish the task.
- The user was surprised because ChatGPT returned You’ve reached your data analysis limit. after only 1 file upload.
Flash 2.0 blows GPT-4o-mini out of the water: Flash 2.0 is recommended as a stronger and slightly cheaper alternative to GPT-4o-mini.
- One user said it blows 4o mini out of the water significantly smarter.
Inception AI’s diffusion models requested on OpenRouter: A user requested access to Inception AI’s diffusion models via OpenRouter after TechCrunch wrote about their DLM (Diffusion-based Large Language Model).
- OpenRouter is in contact with Inception AI and is excited to bring them online as soon as possible.
Anthropic Overload Triggers 502 Errors: Users reported receiving “overloaded” errors, which were identified as 502 status codes from Anthropic, indicating capacity issues.
- These 502 errors can occur even without a declared incident on the status page, requiring users to retry their requests.

Links mentioned:

Provisioning API Keys - Programmatic Control of OpenRouter API Keys: Manage OpenRouter API keys programmatically through dedicated management endpoints. Create, read, update, and delete API keys for automated key distribution and control.
LLM Rankings: finance | OpenRouter: Language models ranked and analyzed by usage for finance prompts
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
Inception emerges from stealth with a new type of AI model | TechCrunch: Inception, a startup, claims to have developed a novel type of AI model based on a diffusion architecture.

Anthropic's $61.5B Series E

Tue, Mar 4, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Travel Reels, AI agents, Trip Planning

App emerges for Saving Travel Reels: An app was created to solve the endless cycle of saving travel reels on social media and then wasting hours researching each spot manually.
- The app (https://thatspot.app/) uses AI agents to automatically process travel reels, extracting every place mentioned with locations, price ranges, reservation requirements, booking links, and operating hours.
AI Agents Automate Travel Research: The app leverages AI agents to streamline the manual research process associated with planning trips from saved travel reels.
- It automatically extracts precise locations, price ranges, reservation requirements, direct booking links, and operating hours, directly from travel reels.

Link mentioned: ThatSpot Guide: no description found

OpenRouter (Alex Atallah) ▷ #general (126 messages🔥🔥):

Google Flash 2.0 Error, Claude 3.7 Sonnet Rate Limits, OpenRouter API Key with VS Studio/RooCode, BYOK azure models in openrouter, Accessing Links in Chat Models

Google Flash 2.0 throws error: A user reported receiving a 502 error when inferencing with Google’s Flash 2.0 and Flash 2.0 Light models, with the error message “Provider returned error” and an internal error encountered by Google.
- A member suggested putting the request in the appropriate Discord channel.
Rate Limits of Claude 3.7 Sonnet Discussed: A user inquired about the rate limits for Claude 3.7 Sonnet in terms of RPM (Requests Per Minute) and TPM (Tokens Per Minute).
- A member stated that OpenRouter doesn’t have specific rate limits for individual users, and if rate limits are hit, it’s usually the OpenRouter limit, which is higher than Tier 4 (see Anthropic’s rate limits documentation).
Struggles with OpenRouter API Key in VS Studio/RooCode: A user encountered a 401 Authentication Failure while trying to use an OpenRouter API key in VS Studio via RooCode, despite having funds in their OpenRouter account.
- Members suggested checking the API key for correctness, ensuring OpenRouter is selected as the API provider in RooCode, and verifying the base URL is correctly configured based on this tutorial.
Requesting BYOK azure models in OpenRouter: A user asked about using BYOK (Bring Your Own Key) with Azure models in OpenRouter, aiming for a unified API to use finetuned models through openrouter.
- A member stated that it’s not possible to use models other than what’s listed in the /models endpoint, which only returns public models and not ones with BYOK. However, you can use your own OpenAI API Key in Integration settings (OpenRouter Integration Settings)
Navigating the Labyrinth of OpenRouter Latency: A user inquired about improving the time to first token (TTFT) latency on OpenRouter, noting their finding that OpenRouter has an average of 2x TTFT compared to using providers directly.
- A team member asked the user to consolidate their findings in a forum post and mentioned that reducing latency is currently a high priority.

Links mentioned:

Discord - Group Chat That’s All Fun & Games: Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.
Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
API Parameters - Complete Guide to Request Configuration: Learn about all available parameters for OpenRouter API requests. Configure temperature, max tokens, top_p, and other model-specific settings.
Elevated errors on on requests: no description found
Tweet from Imrat (@imrat): 1. Setup OpenAI API Key in Settings > Models with your OpenRouter api key, and the OpenRouter Base URL2. Make sure you add the right models3. When I want to use OpenRouter models - CMD+Shift+0 (zer...
List endpoints for a model — OpenRouter | Documentation: no description found
Object Generation: Learn how to use the useObject hook.
no title found: no description found
Cursor 0.46.7 Pro: Openrouter Key not working: Hey, your API keys only work in chat mode, also known as Ask mode.

not much happened today

Sat, Mar 1, 2025

OpenRouter (Alex Atallah) ▷ #announcements (4 messages):

OpenAI Outage, DeepSeek R1, Claude Sonnet 3.7, GPT-4.5 Preview

OpenAI Provider Outage Resolved: OpenRouter experienced an OpenAI provider outage which was identified as an incident on OpenAI’s side and has since been resolved.
- Requests are now succeeding, and OpenAI as a provider on OpenRouter has recovered.
DeepSeek R1 Blazes with SambaNovaAI: A new provider for the 671B-param DeepSeek R1 via SambaNovaAI now provides 150 tokens/second.
- See OpenRouterAI’s tweet for more details.
Claude Sonnet 3.7 Boasts Capacity and Web Search: Claude Sonnet 3.7 now has significantly higher rate limits and web search capability on OpenRouter.
- A member provided a link to OpenRouterAI’s tweet as a reminder of these features.
GPT-4.5 Preview Rockets onto OpenRouter: GPT-4.5 (Preview), designed to push boundaries in reasoning, creativity, and long-context conversations, is now available on OpenRouter, costing $75/M input tokens and $150/M output tokens.
- Early testing shows improvements in open-ended thinking, real-world knowledge, long-context coherence, and reduced hallucinations; the announcement links to the OpenAI blog post and a discussion on X.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Reminder that you can use web search with Claude Sonnet 3.7API available as well. Works for any model! 👇
Tweet from OpenRouter (@OpenRouterAI): DeepSeek R1 now has a blazing fast provider: @SambaNovaAI!Currently getting 150+ TPS:
GPT-4.5 (Preview) - API, Providers, Stats: GPT-4.5 (Preview) is a research preview of OpenAI’s latest language model, designed to advance capabilities in reasoning, creativity, and multi-turn conversation. Run GPT-4.5 (Preview) with API
Tweet from OpenRouter (@OpenRouterAI): GPT-4.5 Preview live for everyone 🍓

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

YPerf, Gemini Flash, Llama 3, Claude 3.5 Sonnet

YPerf Tracks OpenRouter Model Performance: A member created YPerf.com to monitor model API usage and performance across OpenRouter.
Gemini Flash 1.5 8B benchmarked: The Gemini Flash 1.5 8B ranks #66, costing $0.04, with 0.52s latency and 419.8T/s throughput on OpenRouter.

Link mentioned: YPerf: no description found

OpenRouter (Alex Atallah) ▷ #general (389 messages🔥🔥):

Sonnet 3.7 thinking endpoint, DeepSeek R1 reasoning, OpenAI's GPT 4.5 pricing and performance, OpenRouter Documentation

Sonnet 3.7 :thinking endpoint showing less weirdness: Members noticed that using the :thinking endpoint with Sonnet 3.7 on OpenRouter seems to reduce weird behavior, possibly due to the endpoint enabling reasoning by default with a minimum budget of 1024 tokens.
- One member reported seeing "native_tokens_reasoning": 171, in requests, indicating reasoning traces, and suggested that 3.7 might be designed for thinking tokens.
DeepSeek R1’s thought chains via API: Users discussed how to access DeepSeek R1’s thought chains through the API, with a member recommending the include_reasoning parameter.
- It was also noted that some content tokens might slip into the reasoning token, and the recommendation was to ‘double check thinking tags and never forget them’.
GPT 4.5’s high price riles up community: The community reacted strongly to the pricing of GPT 4.5 ($75 input, $150 output), with many calling it insane and questioning its value compared to models like Grok3 and Claude Sonnet 3.7.
- Some speculated it was a failed attempt at gpt5, while others believed it was a measure against distillation, making the exorbitant cost unjustifiable.
OpenRouter adds documentation for access and features: A user requested documentation about OpenRouter’s functionality and architecture and documentation was shared, offering insights into usage, API access, and supported features.
- Another user inquired about the availability of prompt caching with Vertex AI, and it was confirmed this was available for almost a month with tips on where to view the activity.
User builds CAD app with OpenSCAD clone: One member is building a CAD app in the browser that’s an OpenSCAD clone with a different backend.
- The language supports basic syntax like var x = 42;, operators like + - * /, basic shapes like sphere(radius);, SDF operators, transformations, and boolean operations.

Links mentioned:

LiveBench: no description found
Tweet from Ivan Fioravanti ᯅ (@ivanfioravanti): 75$ input / 150$ output for this.Quoting Aidan McLaughlin (@aidan_mclau) obligatory unicorn eval1. gpt-4.52. gpt-4o3 claude-3.7-sonnet (thinking)
Reasoning Tokens - Improve AI Model Decision Making: Learn how to use reasoning tokens to enhance AI model outputs. Implement step-by-step reasoning traces for better decision making and transparency.
Extended thinking models - Anthropic: no description found
Tweet from Theo (@theojaffee): I had early access to GPT-4.5. I found it to be by far the highest verbal intelligence model I've ever used. It's an outstanding writer and conversationalist, and excels at what I call "co...
Tweet from sway (@SwayStar123): gpt 4.5 system card https://cdn.openai.com/gpt-4-5-system-card.pdf
Tweet from apolinario 🌐 (@multimodalart): The evals they didn't show you How does GPT 4.5 compare with latest non-thinking models:Sonnet 3.7 (no thinking), Deepseek V3 (not R1!), Grok 3 (no thinking)
Tweet from Andrew Curran (@AndrewCurran_): Deepseek R2 is arriving early.
fnCAD: Geometry from Signed Distance Fields: no description found
OpenRouter Quickstart Guide: Get started with OpenRouter's unified API for hundreds of AI models. Learn how to integrate using OpenAI SDK, direct API calls, or third-party frameworks.
no title found: no description found

lots of small launches

Thu, Feb 27, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Sonnet 3.7 Switchover, Cross-Model Reasoning Standard, Reasoning Parameter

Sonnet 3.7 Switchover: A member posted a link to track the switchover to Sonnet 3.7 over time using the Versions tab.
Reasoning Parameter: Seamless Model Use: OpenRouterAI introduced a reasoning parameter to make it seamless to use all the models regardless of their internal API, as quoted by Shashank Goyal.
- The documentation for reasoning tokens can be found here.
Cross-Model Reasoning Standard Debuts: OpenRouterAI introduced a cross-model reasoning standard on their API.
- This allows users to configure reasoning settings for OpenAI, Anthropic, and other models in one central place.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): TIP: Use the Versions tab to see how people are switching to Sonnet 3.7 over timeNow including :thinking 💭
Tweet from OpenRouter (@OpenRouterAI): Today we're introducing a cross-model reasoning standard on our API.Use it to configure reasoning settings for OpenAI, Anthropic, and other models to come, in one central place.Quoting Shashank Go...

OpenRouter (Alex Atallah) ▷ #general (241 messages🔥🔥):

Reasoning Tokens, Prompt Caching, DeepSeek API pricing, Claude 3.7, OpenRouter API Keys

Budget Tokens automatically default to 80% of max tokens: By default, budget tokens are set to 80% of max tokens, up to 32k as documented in OpenRouter’s documentation.
Thinking models might need include_reasoning flag: A member was not receiving reasoning tokens for Sonnet-3.7, but adopting the sample code and passing extra_body={"include_reasoning": True} in the API call fixed the issue, though it’s supposed to be true by default.
- The team was notified about the unexpected behavior and flagged that this is supposed to be true by default.
DeepSeek Cuts API Pricing Drastically: DeepSeek announced a cut in their API pricing, with off-peak discounts up to 75%, specifically 50% off for DeepSeek-V3 and 75% off for DeepSeek-R1 between 16:30-00:30 UTC.
Understanding Moderation on OpenRouter: A member asked about self-moderated models, and it was explained that models without this tag use a moderation endpoint to outright block certain content, whereas self-moderated models will still process the API call but may refuse to answer with responses such as I’m sorry Dave, but I can’t do that.
Copilot’s O1 Reasoning Model is now free for all users: Microsoft made OpenAI’s o1 reasoning model free for all Copilot users, providing unlimited use of this model and Copilot’s voice capabilities, which was covered in The Verge.

Links mentioned:

Tweet from CN Wire (@Sino_Market): 🇨🇳#BREAKING DEEPSEEK LOWERS OFF-PEAK API PRICING BY UP TO 75% - STATEMENT#CHINA #AI #DEEPSEEK Source:https://mktnews.com/flashDetail.html?id=01954197-1acb-7229-9368-aa13bc03dfaehttps://mktnews.com/...
MKT News - Market News for Traders: no description found
MKT News - Market News for Traders: no description found
Tweet from Andrew Curran (@AndrewCurran_): Deepseek R2 is arriving early.
Tweet from DeepSeek (@deepseek_ai): 🚨 Off-Peak Discounts Alert!Starting today, enjoy off-peak discounts on the DeepSeek API Platform from 16:30–00:30 UTC daily:🔹 DeepSeek-V3 at 50% off🔹 DeepSeek-R1 at a massive 75% offMaximize your r...
Reasoning Tokens - Improve AI Model Decision Making: Learn how to use reasoning tokens to enhance AI model outputs. Implement step-by-step reasoning traces for better decision making and transparency.
Microsoft makes Copilot Voice and Think Deeper free with unlimited use: No more limits on OpenAI’s o1 reasoning model for Microsoft
Models | OpenRouter: Browse models on OpenRouter
Reasoning Tokens - Improve AI Model Decision Making: Learn how to use reasoning tokens to enhance AI model outputs. Implement step-by-step reasoning traces for better decision making and transparency.
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Anthropic Status: no description found
fnCAD: Geometry from Signed Distance Fields: no description found
Microsoft makes Copilot Voice and Think Deeper free with unlimited use: No more limits on OpenAI’s o1 reasoning model for Microsoft
Get coding help from Gemini Code Assist — now for free: Announcing a free version of Gemini Code Assist, powered by Gemini 2.0, and Gemini Code Review in GitHub.
Gemini Code Assist | AI coding assistant: Get AI coding and programming help no matter the language or platform with Gemini Code Assist from Google.
Announcing the 2024 DORA report | Google Cloud Blog: Key takeaways from the 2024 Google Cloud DORA report that focused on the last decade of DORA, AI, platform engineering and developer experience.

not much happened today

Wed, Feb 26, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Claude 3.7 Sonnet, Extended Thinking, Pricing and Availability

Claude 3.7 Sonnet lands on OpenRouter: Claude 3.7 Sonnet is now available on OpenRouter, offering best-in-class performance, with a focus on mathematical reasoning, coding, and complex problem-solving.
Extended Thinking Soon to Land: The Extended Thinking feature is coming soon to the OpenRouter API, enabling step-by-step processing for complex tasks, as detailed in Anthropic’s documentation.
Claude 3.7 Sonnet: Pricing Unveiled: The pricing for Claude 3.7 Sonnet is set at $3 per million input tokens and $15 per million output tokens, including thinking tokens, with full caching support at launch.

Link mentioned: Claude 3.7 Sonnet - API, Providers, Stats: Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. Run Claude 3.7 Sonnet with API

OpenRouter (Alex Atallah) ▷ #general (346 messages🔥🔥):

Claude 3.7 Sonnet, GCP hosting Claude 3.7 Sonnet, OpenRouter rate limits, Claude 3.5 Haiku with vision, TPUs vs GPUs for inference

GCP Preps Claude 3.7 for Launch: Google Cloud Platform (GCP) is preparing to support Claude 3.7 Sonnet, launching in us-east5 and europe-west1 with model ID claude-3-7-sonnet@20250219.
Claude 3.7’s Debut: Performance and Pricing: Claude 3.7 Sonnet features a hybrid reasoning approach, offering both standard and extended thinking modes, maintaining performance parity with its predecessor in standard mode while enhancing accuracy in complex tasks, detailed in Anthropic’s blog post.
- The model costs $3/M input tokens and $15/M output tokens, with a 200,000 token context window, though some users feel its output pricing might cause complaints.
Thinking Support: Still in the Lab: OpenRouter is actively working on implementing full support for Claude 3.7’s extended thinking feature, which does not currently support pre-fills, aiming for launch soon with updated documentation.
OpenRouter Ramps Up Claude 3.7: OpenRouter increased the TPM (tokens per minute) for anthropic/claude-3.7-sonnet, while anthropic/claude-3.7-sonnet:beta has a lower TPM initially, set to increase as users migrate from 3.5.
API Key Safety Dance: Users are reminded that API keys do not contain credits; deleting a key only revokes access, and credits remain tied to the account, though lost keys cannot be recovered due to security measures.

Links mentioned:

AI Accelerator - AWS Trainium - AWS: no description found
Extended thinking models - Anthropic: no description found
Claude 3.7 Sonnet - API, Providers, Stats: Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. Run Claude 3.7 Sonnet with API
Ponke Ponkesol GIF - Ponke Ponkesol Solana - Discover & Share GIFs: Click to view the GIF
Telmo Coca GIF - Telmo Coca Harina - Discover & Share GIFs: Click to view the GIF
Google agrees to new $1 billion investment in Anthropic: Google has agreed to a new investment of more than $1 billion in generative AI startup Anthropic, a person familiar with the situation confirmed to CNBC.

Claude 3.7 Sonnet

Tue, Feb 25, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Claude 3.7 Sonnet, AI capabilities improvement, Pricing model for Claude, Extended Thinking feature

Claude 3.7 Sonnet Launches: Claude 3.7 Sonnet is now available on OpenRouter, marking a significant advancement in AI, especially in mathematical reasoning, coding, and complex problem-solving. For complete details, check the blog post.
- This release introduces notable enhancements in agentic workflows and offers users a choice between rapid and extended reasoning processes.
Pricing and Usage of Claude 3.7: The pricing model for Claude 3.7 Sonnet is set at $3 per million tokens for input and $15 per million tokens for output, which includes thinking tokens. This pricing structure is aimed at optimizing user engagement while utilizing the model.
- The launch includes full caching support to enhance performance during usage.
Upcoming Extended Thinking Feature: An upcoming feature, Extended Thinking, is set to be integrated into the OpenRouter API, enhancing the capabilities of the Claude model. More information is available in the Extended Thinking documentation.
- This feature is expected to provide users with advanced options for processing complex tasks, further improving the model’s usability.

Links mentioned:

Home - Anthropic: no description found
Claude 3.7 Sonnet - API, Providers, Stats: Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. Run Claude 3.7 Sonnet with API

OpenRouter (Alex Atallah) ▷ #general (346 messages🔥🔥):

Claude 3.7 Sonnet Features, API Key Management, Chat Continuation Across Devices, Model Pricing and Performance, Image Handling in Claude 3.7

Exploring Claude 3.7’s Extended Thinking: Participants discussed the capabilities of Claude 3.7 Sonnet, confirming that extended thinking features aren’t fully implemented yet but are pending documentation and support.
- There was excitement regarding the hybrid model’s potential to select thinking levels, illustrating its advanced reasoning capacity.
Managing API Keys and Credits: Users expressed concerns over lost API keys and how they relate to account credits, with assurances that credits remain intact regardless of key status.
- New keys can be generated without losing credits, which are attached to the account rather than the keys themselves.
Device Synchronization Limitations: It was noted that OpenRouter does not currently support synchronization of chat sessions between devices, such as from browser to phone.
- Alternative methods like chatterui and typign mind were suggested for bridging this gap.
Issues with Image Uploads: There were frequent errors reported with Claude 3.7, specifically related to image sizes exceeding 5MB, which halted the completion of requests.
- Users were encouraged to manage image sizes to comply with API requirements and avoid request failures.
Comparison of Model Costs: Participants compared Claude’s pricing structure with other models, noting that it’s perceived as high but competitive in certain performance areas.
- The conversation highlighted the costs versus usability, particularly around reasoning tasks and their implications on overall expenses.

Links mentioned:

Claude 3.7 Sonnet - API, Providers, Stats: Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. Run Claude 3.7 Sonnet with API
Ponke Ponkesol GIF - Ponke Ponkesol Solana - Discover & Share GIFs: Click to view the GIF
AI Accelerator - AWS Trainium - AWS: no description found
Claude 3.7 Sonnet and Claude Code: Today, we’re announcing Claude 3.7 Sonnet, our most intelligent model to date and the first hybrid reasoning model generally available on the market.
Extended thinking models - Anthropic: no description found
Telmo Coca GIF - Telmo Coca Harina - Discover & Share GIFs: Click to view the GIF
Google agrees to new $1 billion investment in Anthropic: Google has agreed to a new investment of more than $1 billion in generative AI startup Anthropic, a person familiar with the situation confirmed to CNBC.

AI Engineer Summit Day 1

Sat, Feb 22, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Reasoning Tokens Behavior, User Feedback on Token Responses, Proposed Changes to Reasoning Tokens, Poll on Reasoning Token Settings

User Feedback Sparks Reasoning Tokens Discussion: Feedback indicates that users are dissatisfied when max_tokens are low, resulting in no content being returned.
- Currently, include_reasoning defaults to false, leading to either empty content or null responses, which users find frustrating.
Proposed Changes Aim to Enhance Response Clarity: Two key proposals are on the table: set include_reasoning to true by default and ensure content is always a string, avoiding null values.
- These changes aim to provide consistency in responses, ensuring developers receive usable content even when reasoning consumes all tokens.
Expanded Poll for Community Input: A poll has been initiated to gather opinions on the proposed changes regarding include_reasoning settings.
- Options range from keeping the current behavior to changing defaults, with feedback being actively sought from the community.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Weaver Chrome Extension, Open Source API Tool

Weaver: Versatile Chrome Extension: The Weaver Chrome extension allows for highly configurable options like PDF support, cloud sync with Supabase, and direct API calls from the browser, promoting better performance.
- It’s currently free but hosted on Vercel’s free plan, implying potential limitations on accessibility due to usage limits, with no backend data logging.
Open Source Translation Tool Emerges: A user shared their newly developed open-source Chrome extension that allows users to transform any content into their preferred style such as translating or summarizing.
- The tool is accessible via GitHub and only requires an OpenAI-compatible API to function.

Links mentioned:

Weaver: no description found
Tweet from Amirreza (@amirsalimiiii): Just cooked up a powerful Chrome extension! Turn any content into your preferred style—translate, simplify, summarize, you name it. 🔥🛠️ Fully open-source & only needs an OpenAI-compatible API.Check ...

OpenRouter (Alex Atallah) ▷ #general (209 messages🔥🔥):

OpenRouter API Integration, Gemini Model Issues, DeepSeek Models Performance, API Key Generation, Vision and Reasoning Models

Integrating OpenRouter with Websites: A user inquired about how to use OpenRouter’s API key to integrate a chatbot into their Elementor website, expressing the need for guidance.
- Another user indicated that OpenRouter only provides access to LLMs, and advised reaching out to a developer for assistance with integration.
Gemini 2.0 Model Performance Issues: Users discussed problems with the Gemini 2.0 Flash model’s structured outputs, highlighting discrepancies compared to OpenAI’s models.
- Feedback indicated a need for clarity in the UI regarding the capabilities of different models, especially concerning input types and error messages.
Performance Fluctuations in DeepSeek Models: Some users reported that DeepSeek models yield high-quality responses initially, but later responses deteriorated significantly.
- Discussion centered on the possible causes for this behavior and whether there are settings to mitigate the decline in response quality.
Generating API Keys Programmatically: A user expressed interest in the ability to programmatically generate API keys for their own usage without visiting the OpenRouter website.
- Respondents confirmed that this feature is planned for release soon, with hopes for its availability by the end of the week.
Understanding Model Capabilities: A user inquired about how to identify whether models have vision, reasoning, or tool use capabilities when browsing on OpenRouter.
- Clarifications were provided regarding indicators for vision and reasoning, with suggestions for improving the interface to make this information more accessible.

Links mentioned:

Transformer Explainer: LLM Transformer Model Visually Explained: An interactive visualization tool showing you how transformer models work in large language models (LLM) like GPT.
R1 1776 - API, Providers, Stats: Note: As this model does not return <think> tags, thoughts will be streamed by default directly to the `content` field.R1 1776 is a version of DeepSeek-R1 that has been post-trained to remove ce...
Tweet from Perplexity (@perplexity_ai): R1 1776 is now available via Perplexity's Sonar API.Quoting Perplexity (@perplexity_ai) Today we're open-sourcing R1 1776—a version of the DeepSeek R1 model that has been post-trained to provi...

not much happened today

Sat, Feb 22, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Weaver Demo, Chrome Extension by Amir

Introducing Weaver, a Powerful LLM Tool: The Weaver demo showcases a highly configurable platform enabling users to bring their own keys/models and databases for enhanced performance.
- Features include PDF support for Gemini and Anthropic, image/text-based file support, and branching chats, making it a versatile tool for various applications.
Amir’s Versatile Chrome Extension Launched: A member announced a new powerful Chrome extension that can turn any content into a preferred style, including translation, simplification, and summarization.
- The extension is fully open-source and only requires an OpenAI-compatible API, further details can be found on its GitHub page.

Links mentioned:

Weaver: no description found
Tweet from Amirreza (@amirsalimiiii): Just cooked up a powerful Chrome extension! Turn any content into your preferred style—translate, simplify, summarize, you name it. 🔥🛠️ Fully open-source & only needs an OpenAI-compatible API.Check ...

OpenRouter (Alex Atallah) ▷ #general (235 messages🔥🔥):

Model Access and Features, OpenRouter Documentation, DeepSeek Model Performance, API Usage and Integration, Reverse Engineering Concerns

Speculation on Model Launches: Users speculated about an imminent model launch, suggesting that certain indicators in the community hinted at this possibility.
- Discussions hinted at the excitement amongst users about potential new capabilities being introduced.
Concerns over API Access and Functionality: Users expressed frustration with the API functionality, particularly regarding the DeepSeek model returning an internal server error (500) and issues with reasoning content.
- Some users shared their experiences with integrating various models and noted limitations in API responses.
OpenRouter Documentation Issues: The OpenRouter documentation was criticized for focusing mainly on OpenAI’s API, leaving users who utilize other services, like Anthropic, without sufficient guidance.
- Users anticipated future updates to documentation that would accommodate a broader range of API integrations.
Ethics of Reverse Engineering LLM APIs: There was discussion regarding the legality and ethical implications of reverse engineering APIs to create cheaper versions of existing models.
- Participants raised concerns about the impact of such actions on legitimate services and the AI ecosystem as a whole.
User Experiences with Different Models: Some users reported positive experiences with newer models like Grok-3, while others noted inconsistencies, especially with response efficiency in DeepSeek R1.
- Concerns were shared about the performance of certain models when handling specific types of queries, highlighting varying levels of output quality.

Link mentioned: OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts

The Ultra-Scale Playbook: Training LLMs on GPU Clusters

Thu, Feb 20, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Reasoning Tokens Default Behavior, Feedback on Token Limits, Polling for User Preferences

Debate over Default Reasoning Tokens: Feedback indicates users prefer to receive content when max_tokens is low, prompting a review of the include_reasoning setting.
- Currently, include_reasoning is false by default, leading to responses that can be empty or null.
Proposed Changes to Include Reasoning: Two major changes are under consideration: setting include_reasoning to true by default and standardizing response formatting to ensure content is always a non-null string.
- The aim is to avoid empty responses and allow reasoning tokens to be included unless specifically disabled.
Expanded Poll Options for User Input: An expanded poll is available with four options regarding the include_reasoning behavior for community feedback.
- Options range from keeping the current setting to making reasoning tokens default, with a supplementary option for user comments.

OpenRouter (Alex Atallah) ▷ #general (242 messages🔥🔥):

Grok 3 performance, OpenRouter API usage, Chatbot integration, Perplexity R1 1776, AI model comparisons

Grok 3 Stumbles in Reasoning: Users have reported finding Grok 3 underwhelming in reasoning compared to Claude 3.5 and O1, questioning its reputation as a top-tier LLM.
- One user compared performance and found it worse than expected, suggesting the model may not live up to the hype.
Navigating OpenRouter API: New users have expressed confusion about accessing OpenRouter and utilizing O3 mini via the API, especially regarding usage limitations.
- The need for better integration options and clarity on API key usage has been emphasized, highlighting the gradual rollout of access to models.
Chatbot Integration on Websites: A user inquired about the method to integrate an AI chatbot on an HTML website using OpenRouter API, revealing a gap in readily available resources for implementation.
- Others suggested that they would need to develop the solution or hire a developer for assistance.
Introduction of Perplexity R1 1776: The R1 1776 model from Perplexity has been launched, providing users access to an uncensored version of the DeepSeek R1 model.
- This new model promises better response capabilities and has been noted for its competitive pricing structure for token usage.
AI Model Performance Insights: Comparisons between AI models like Claude and O3 mini have been discussed, with users sharing mixed experiences regarding reasoning capabilities.
- Some users noted that despite claims of superiority, newer models still struggle in specific tasks when compared to established ones.

Links mentioned:

no title found: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
R1 1776 - API, Providers, Stats: Note: As this model does not return <think> tags, thoughts will be streamed by default directly to the `content` field.R1 1776 is a version of DeepSeek-R1 that has been post-trained to remove ce...
Gemini Pro 2.0 Experimental - API, Providers, Stats: Gemini 2.0 Pro Experimental is a bleeding-edge version of the Gemini 2. Run Gemini Pro 2.0 Experimental with API
perplexity-ai/r1-1776 · Discussions: no description found
Models | OpenRouter: Browse models on OpenRouter
hastebin: no description found
Tweet from Perplexity (@perplexity_ai): Download the model weights on our HuggingFace Repo or consider using the model via our Sonar API.HuggingFace Repo: https://huggingface.co/perplexity-ai/r1-1776
Tweet from Perplexity (@perplexity_ai): R1 1776 is now available via Perplexity's Sonar API.Quoting Perplexity (@perplexity_ai) Today we're open-sourcing R1 1776—a version of the DeepSeek R1 model that has been post-trained to provi...
An unfiltered conversation with Alex Atallah, CEO of OpenRouter: Listen to a conversation with Alex Atallah, the CEO of OpenRouter, along with Nolan Fortman and Logan Kilpatrick. The conversation covers:- The start of Open...

X.ai Grok 3 and Mira Murati's Thinking Machines

Wed, Feb 19, 2025

OpenRouter (Alex Atallah) ▷ #general (435 messages🔥🔥🔥):

Grok 3 performance, OpenRouter API usage, Model comparisons, Vision capabilities in LLMs, DeepSeek vs Sonnet

Grok 3’s Initial Reception: Grok 3 is generating mixed reviews, with some users praising its capabilities while others express skepticism about its performance compared to established models like Claude and Sonnet.
- Users are particularly interested in how well Grok 3 can handle code and reasoning tasks, with some citing specific instances where it excels.
OpenRouter API Usage Policies: There is confusion regarding the usage policies for OpenRouter, particularly concerning content generation like NSFW material and compliance with provider policies.
- Chat among users indicates that OpenRouter may have fewer restrictions compared to others, but it’s best to verify with the administrators.
Model Comparisons: Discussions highlight the performance of various models such as DeepSeek, Sonnet, and Claude, with some users preferring Sonnet for coding tasks due to its reliability despite higher costs.
- Users note that Grok 3 and DeepSeek offer competitive features, with several considering price and performance when choosing alternatives.
Vision Capabilities in LLMs: A user inquired about models on OpenRouter that can analyze images, referencing a modal section on the provider’s website detailing models with text and image capabilities.
- It was advised to explore available models under that section to find suitable options with vision capabilities.
User Experiences with API Transactions: One user reported encountering issues while attempting to purchase credits on OpenRouter, reaching out for assistance after clarifying with their bank.
- This led to discussions about pricing structures for different LLM models, coupled with ongoing debates regarding the value derived from them.

Links mentioned:

stepfun-ai/Step-Audio-Chat · Hugging Face: no description found
Chatbot Arena Leaderboard - a Hugging Face Space by lmarena-ai: no description found
API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
Models | OpenRouter: Browse models on OpenRouter
Clinical AI & Quantum Hackathon: no description found
Tweet from Perplexity (@perplexity_ai): Download the model weights on our HuggingFace Repo or consider using the model via our Sonar API.HuggingFace Repo: https://huggingface.co/perplexity-ai/r1-1776
Models | OpenRouter: Browse models on OpenRouter
no title found: no description found
GPU pricing: GPU pricing.
Elon Musk’s xAI releases its latest flagship model, Grok 3 | TechCrunch: Elon Musk's AI company, xAI, released its latest flagship AI model, Grok 3, on Monday, along with new capabilities in the Grok app for iOS and the web.
Cloud Computing Services | Google Cloud: Meet your business challenges head on with cloud computing services from Google, including data management, hybrid & multi-cloud, and AI & ML.
GitHub - deepseek-ai/DeepSeek-R1: Contribute to deepseek-ai/DeepSeek-R1 development by creating an account on GitHub.
No Reasoning tokens · Issue #22 · OpenRouterTeam/ai-sdk-provider: No Reasoning tokens are provided when using streamText, and could not set include_reasoning in streamText
跃问: no description found

LLaDA: Large Language Diffusion Models

Tue, Feb 18, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

OpenRouter real-time charts, llms.txt documentation

OpenRouter Enhances Real-Time Analysis: OpenRouter’s throughput and latency charts now update in real time, showcasing a significant improvement attributed to Google AI’s Vertex enhancements. You can see the changes detailed in their announcement here.
- Kudos were given for the recent speedup, which has improved user experience considerably.
Introducing llms.txt for Documentation Access: For those building on OpenRouter, the newly published llms.txt serves as a comprehensive resource to chat with the docs. You can access it here or download the full version here.
- The publication has been celebrated for its extensive content, inviting users to engage with the documentation creatively, as noted in their tweet.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): TIP: throughput and latency charts on OpenRouter update in real timeHere are Sonnet's. Kudos to @GoogleAI Vertex for the recent speedup!
Tweet from OpenRouter (@OpenRouterAI): Just published a big long beautiful llms.txt containing all of our docs.You know what to do! ✨

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

Versatile AI Tooltip Extension, Toledo1 AI Browser, Online Party Game, Roo-Code URL Update, Favicon Issues

Launch of Versatile AI Tooltip Extension: A new Chrome extension, Versatile AI Tooltip, lets users quickly process text and customize instructions using OpenRouter models with a simple setup involving their API key.
- It aims at summarizing articles and translating text snippets without any cost, as shared by its developer.
Toledo1 Offers Pay-Per-Question AI Access: The Toledo1 platform allows private chatting with AI assistants, utilizing a pay-per-question model and the ability to combine multiple AIs for accurate answers.
- It features client-side search with easy installation, boasting enterprise-grade security and avoiding subscription fees.
Exciting Online Party Game Released: An online party game has been launched where players can fight against friends or bots using any action they can think of, accessible for free without sign-up.
- Players can try the game at Wits and Wands and engage in unique battles.
Request to Update Roo-Code’s URL: A suggestion was made to update the app showcase with Roo-Code’s new URL, which would also address the correct favicon display issue.
- The new URL provided is Roo-Code Documentation, and an image link for the favicon was shared but highlighted as automatic on their end.
Automatic Favicon Issue Raised: A user pointed out that the app showcase favicon issue is tied to Roo-Code not updating their headers, making it an automatic concern on the server’s side.
- This illustrates the communication gap between app maintainers and URL content providers impacting display elements.

Links mentioned:

Wits and Wands - AI-Powered Magical Dueling Game: An AI-powered magical dueling game where you cast spells, outwit your friends, and win!
Toledo1: no description found
Chrome Web Store: Add new features to your browser and personalize your browsing experience.

OpenRouter (Alex Atallah) ▷ #general (315 messages🔥🔥):

DeepSeek R1, OpenRouter model performance, Rate limits and API usage, Multimodal models, Reasoning tokens in responses

DeepSeek R1’s Variable Reasoning Inclusion: Users reported inconsistencies with the DeepSeek R1 free model where reasoning tokens sometimes appeared in the response content rather than as separate reasoning fields, indicating potential issues with model behavior.
- The OpenRouter team is aware of the problem and is tracking solutions to better handle the behavior of open-source reasoning models.
Rate Limits and API Management: Discussion emerged surrounding rate limits for paid models used during a workshop, with assurances from OpenRouter that generally, users won’t hit rate limits unless the demand surges significantly.
- OpenRouter manages rate limits globally and suggests using multiple models to share the load if any issues arise.
Concerns Regarding Data Privacy: A user expressed concerns about forced routing to specific countries which could violate data protection laws, emphasizing the need for region-based routing options for compliance.
- OpenRouter acknowledged the issue and is exploring options to support EU-specific routing for better legal compliance.
Multimodal Model Evaluation: Users shared feedback on the performance of multiple models including Sonnet 3.5, Flash 2.0, and o3-mini-high for coding tasks, highlighting o3-mini-high’s unusual behavior despite achieving zero-shot capability on complex problems.
- The effectiveness of models in tasks like chess was also discussed, with a consensus that current LLMs struggle with such scenarios.
Benchmarking and Model Testing: Inquiries were made about available benchmarking scripts for testing various models, especially in relation to performance consistency.
- Users report experiencing ‘premature close’ errors with Claude 3.5 Sonnet, prompting discussions around handling invalid response bodies from API requests.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
inference.net | OpenRouter: Browse models provided by inference.net
API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
Structured Outputs - Type-Safe JSON Responses from AI Models: Enforce JSON Schema validation on AI model responses. Get consistent, type-safe outputs and avoid parsing errors with OpenRouter's structured output feature.
ChatGPT-4o - API, Providers, Stats: OpenAI ChatGPT 4o is continually updated by OpenAI to point to the current version of GPT-4o used by ChatGPT. It therefore differs slightly from the API version of [GPT-4o](/models/openai/gpt-4o) in t...
gemini flash 2.0 vs o3 mini: no description found
BYOK - Bring Your Own Keys to OpenRouter: Learn how to use your existing AI provider keys with OpenRouter. Integrate your own API keys while leveraging OpenRouter's unified interface and features.
Tweet from DeepSeek (@deepseek_ai): 🎉 Excited to see everyone’s enthusiasm for deploying DeepSeek-R1! Here are our recommended settings for the best experience:• No system prompt• Temperature: 0.6• Official prompts for search & file up...
Tweet from Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex): It's stunning to be sure. But you have to admit, the way diffusion video models deal with high-frequency in-plane movements (eg look at the dog's hind legs here) still gives the lie to Sama...
stepfun-ai/Step-Audio-Chat · Hugging Face: no description found
Google Search: no description found
Chrome Web Store: Add new features to your browser and personalize your browsing experience.
Jonah Hill No GIF - Jonah Hill No Chill Out Bro - Discover & Share GIFs: Click to view the GIF
Chatbot: Learn how to use the useChat hook.
GitHub - n4ze3m/page-assist: Use your locally running AI models to assist you in your web browsing: Use your locally running AI models to assist you in your web browsing - n4ze3m/page-assist
GitHub - deepseek-ai/DeepSeek-R1: Contribute to deepseek-ai/DeepSeek-R1 development by creating an account on GitHub.
Integration Frameworks - OpenRouter SDK Support: Integrate OpenRouter using popular frameworks and SDKs. Complete guides for OpenAI SDK, LangChain, PydanticAI, and Vercel AI SDK integration.
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
GitHub - sigoden/aichat: All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more.: All-in-one LLM CLI tool featuring Shell Assistant, Chat-REPL, RAG, AI Tools & Agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more. - sigoden/aichat
GitHub - BuilderIO/ai-shell: A CLI that converts natural language to shell commands.: A CLI that converts natural language to shell commands. - BuilderIO/ai-shell
@plastichub/kbot: AI-powered command-line tool for code modifications and project management that supports multiple AI models and routers.. Latest version: 1.1.14, last published: 3 days ago. Start using @plastichub/kb...
跃问: no description found

not much happened today

Sat, Feb 15, 2025

OpenRouter (Alex Atallah) ▷ #announcements (13 messages🔥):

API usage field update, Tokenization across models, Provider outages, OpenAI model availability, Model suffixes and functionalities

API Usage Field Update Considerations: A proposed change to the usage field in the API considers switching from a normalized token count to the model’s native token count due to advancements in tokenization across models.
- Users expressed concerns about whether this will affect model rankings, to which it was confirmed that the GPT tokenizer will still be used for rankings.
Tokenization Debate Among Providers: Discussion arose about whether the vertex model still operates with a higher ratio of tokens per character, with a user suggesting keeping the GPT tokenizer for consistency within the aggregator platform.
- Clarification was provided that while vertex has a slightly different ratio, it is not as extreme as those seen in past models like PaLM.
Provider Outage Notification: A brief notification indicated that the Fireworks provider was experiencing an outage, but noted that other providers and BYOK usage were unaffected.
- An update stated that the outage was resolved by 9:12 ET, confirming normal operations resumed.
OpenRouter Model Availability Update: OpenAI’s o1 and o3 models are now available to all OpenRouter users, eliminating the need for a separate BYOK key and allowing higher rate limits.
- The announcement included a cheatsheet for model suffixes, indicating options like :online, :nitro, and :floor for different functionalities.
Valuation of Usage Reporting: Concerns were raised about the accuracy of total usage costs reported by the roo code, suggesting it may not align with expectations.
- Another user pointed out queries about which providers do not report an usage object, seeking clarity on their operational practices.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Reminder that you can append `:online` to add web access to any model, including o1 and o3-mini 🌐Along with `:nitro` (fastest), `:floor` (cheapest), and `:free` (for 30 models)Quoting OpenRouter (@Op...
OpenRouter API Reference - Complete Documentation: Comprehensive guide to OpenRouter's API. Learn about request/response schemas, authentication, parameters, and integration with multiple AI model providers.

OpenRouter (Alex Atallah) ▷ #general (163 messages🔥🔥):

DeepSeek R1 Performance, Error Issues with API Keys, Self-Moderated OpenAI Endpoints, New Model Introductions, Rate Limiting Concerns

DeepSeek R1 struggles for some users: Users report that while using DeepSeek R1 on OpenRouter, the service regularly pauses, causing issues for their agents and expressing concerns about its reliability in production.
- Some compare its performance to other models, stating they find its reasoning superior under certain settings, including a recommended temperature of 0.6 without a system prompt.
API Key Issues and Strikethrough Indicators: A user discovered that their API keys showed a strikethrough on the website and returned a 401 error, leading to confusion regarding their validity and usage.
- Admins indicated that keys may be disabled due to potential leaks detected by their system, emphasizing the importance of using secrets.
Interest in Self-Moderated Endpoints: There was discussion about self-moderated OpenAI endpoints, with users expressing eagerness for lower latency and more consistent response outputs similar to Anthropic’s approach.
- Admins indicated they are working towards implementing such features based on community feedback.
Rate Limiting and Model Parameters: Users inquired about the rate limits for models like Gemini 2.0 Pro, revealing insights about daily request limits based on model variants and user credits.
- There were also discussions about inconsistencies in performance from different providers, with comparisons to expected parameter settings for optimal results.
Feedback on New Models and Providers: Participants exchanged thoughts on new models like Sambanova, exploring their pricing structure and user experiences with the quality of responses compared to more established systems.
- Users noted varied results depending on the platform used, leading to discussions about the transparency of underlying prompts and behavioral adjustments of models like Claude.

Links mentioned:

Tweet from DeepSeek (@deepseek_ai): 🎉 Excited to see everyone’s enthusiasm for deploying DeepSeek-R1! Here are our recommended settings for the best experience:• No system prompt• Temperature: 0.6• Official prompts for search & file up...
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter Quickstart Guide: Get started with OpenRouter's unified API for hundreds of AI models. Learn how to integrate using OpenAI SDK, direct API calls, or third-party frameworks.
API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
SambaNova Cloud: Preview AI-enabled Fastest Inference APIs in the world.
no title found: no description found

Reasoning Models are Near-Superhuman Coders (OpenAI IOI, Nvidia Kernels)

Fri, Feb 14, 2025

OpenRouter (Alex Atallah) ▷ #announcements (11 messages🔥):

Groq DeepSeek R1 70B launch, New sorting preferences in OpenRouter, Update to usage field in API, Token count comparisons, Discussion on model ranking consistency

Groq DeepSeek R1 70B offers record speeds: OpenRouter announced the addition of Groq DeepSeek R1 70B, recording an impressive 1000 tokens per second throughput and supporting various parameters, with options to increase rate limits.
- This is part of a broader integration with OpenRouter AI that maximizes users’ interaction with the platform.
New default sorting options enhance user experience: Now, users can easily adjust their default sorting preference for model providers by changing settings to focus on throughput or balance between speed and cost.
- Additionally, appending :nitro to any model name ensures users access the fastest provider available, as stated in the announcement from OpenRouter.
API usage field might switch to native token counts: A proposed update suggests changing the usage field in the API from GPT token normalization to the native token count of models, with user feedback being solicited.
- Concerns about model rankings and consistency have been raised, emphasizing the importance of maintaining fair comparisons across models.
Token count differences spark discussion: There are speculations regarding how the switch from GPT’s normalized counts to native token counts might affect models like Vertex, and concerns about varying token ratios persist.
- The reply confirmed that while there are slight differences, it won’t be as extreme as previous character-based models, thus not resulting in disruptive changes.
Call for additional functionality in usage reporting: A suggestion was made to incorporate a field in the API that explicitly returns the GPT token count, reflecting a desire for more comprehensive usage metrics.
- This aligns with ongoing discussions about improving clarity and transparency in model comparisons and usage reporting.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): NEW: You can now change the default sort for providers for any model in your account settings.Sort by "Throughput" if you care about speed 🚀Sort by Default to balance uptime, price, and throu...
Tweet from OpenRouter (@OpenRouterAI): Excited to announce @GroqInc officially on OpenRouter! ⚡️- incl. a record-fast 1000 TPS distilled DeepSeek R1 70B- tons of supported parameters- bring your own key if you want, get a rate limit boostP...

OpenRouter (Alex Atallah) ▷ #general (257 messages🔥🔥):

OpenAI's o3-mini functionality, Issues with Deepseek R1, Self-moderated OpenAI endpoints, Google's rate limit errors, Usage of AI models for YouTube content creation

OpenAI’s o3-mini functionality for Tier 3 users: After 8 days, a user reported that OpenAI enabled o3-mini for their Tier 3 key, which was previously Tier 2.
- They expressed frustration about the wait time but noted that they can now use OpenAI credits with BYOK.
Deepseek R1 demonstrates superior reasoning: A user shared their experience of using Deepseek R1, which showed impressive reasoning capabilities while working on complex SIMD functions compared to o3-mini.
- They called o3-mini ‘stubborn,’ implying it was less effective in reasoning tasks.
Discussing self-moderated OpenAI endpoints: A user expressed interest in whether self-moderated OpenAI endpoints would be available, expecting lower latency and consistent results.
- The team indicated they are exploring this option and acknowledged user concerns about moderation issues.
Google’s rate limit issues causing frustration: Users reported receiving 429 errors from Google due to resource exhaustion, affecting their use of the Sonnet model.
- The OpenRouter team mentioned they are addressing growing rate limit issues caused by Anthropic capacity limitations.
Best AI for creating YouTube thumbnails and titles: A user inquired about the best AI model for generating YouTube content aimed at maximizing click-through rates.
- Another user suggested tracking performance to refine model outputs, despite expressing dissatisfaction with existing tools.

Links mentioned:

no title found: no description found
Tweet from Sam Altman (@sama): OPENAI ROADMAP UPDATE FOR GPT-4.5 and GPT-5:We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings.We want AI to “just work” for you; we re...
LLM Rankings: programming | OpenRouter: Language models ranked and analyzed by usage for programming prompts
Provider Integration - Add Your Models to OpenRouter: Learn how to integrate your AI models with OpenRouter. Complete guide for providers to make their models available through OpenRouter's unified API.
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
DeepSeek: R1 – Provider Status: See provider status and make a load-balanced request to DeepSeek: R1 - DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It&#...
API Rate Limits - Manage Model Usage and Quotas: Learn about OpenRouter's API rate limits, credit-based quotas, and DDoS protection. Configure and monitor your model usage limits effectively.
OpenRouter Quickstart Guide: Get started with OpenRouter's unified API for hundreds of AI models. Learn how to integrate using OpenAI SDK, direct API calls, or third-party frameworks.
Web Search - Real-time Web Grounding for AI Models: Enable real-time web search capabilities in your AI model responses. Add factual, up-to-date information to any model's output with OpenRouter's web search feature.
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
Are OpenAI credits expiring?: Since dashboard change, I see no warning about credit expiration date. They forgot to put it, they placed it somewhere else or credits are not expiring any more?
no title found: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

Feature feedback

Request for Feedback on Features: A member expressed enthusiasm about the new features, stating, ‘This looks awesome!’ and encouraged others to share any missing functionalities.
- The message highlights the community’s focus on improving the product through user feedback.
Encouraging Community Engagement: The same member encouraged ongoing communication for feature discovery with the phrase, ‘let us know if you find any features that are missing.’
- This suggests a proactive approach to gathering user input and enhancing the overall experience.

small news items

Thu, Feb 13, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

OpenAI o1 and o3 availability, Groq-powered Llamas introduction, Nitro feature upgrade, Groq DeepSeek R1 70B performance

OpenAI o1 and o3 Now Available to Everyone: OpenAI has announced that the o1 and o3 reasoning model series are open for all OpenRouter users, eliminating the need for BYOK and providing higher rate limits for those previously using their own keys. More details can be found here.
- The models also feature web search capabilities, adding significant utility to the experience.
Groq-Powered Llamas Delivering Speed: With Groq officially supported, users can now experience lightning-fast endpoints with Groq-powered Llamas, achieving over 250 tokens per second for Llama 3.3 and 600 TPS for Llama 3.1. Details on available models are shared at this link.
- Users have the option to bring their own keys for boosted rate limits.
Revamped Nitro Feature Boosts Throughput: The :nitro suffix is now enhanced for all models, enabling users to sort endpoints by latency and throughput rather than appearing as separate endpoints. This powerful configuration can be achieved via API or directly within the chatroom.
- Enhanced charts have also been introduced to track provider performance over time, making it easier to compare them.
Groq DeepSeek R1 70B Sets New Speed Record: The newly added Groq DeepSeek R1 70B achieves a phenomenal rate of approximately 1000 tokens per second, marking a new benchmark in speed. More information can be found here.
- The addition includes support for numerous parameters and an option for users to bring their own key for additional rate limit boosts.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Excited to announce @GroqInc officially on OpenRouter! ⚡️- incl. a record-fast 1000 TPS distilled DeepSeek R1 70B- tons of supported parameters- bring your own key if you want, get a rate limit boostP...
Tweet from OpenRouter (@OpenRouterAI): OpenAI o1 and o3 are now available to all OpenRouter users!BYOK is no longer required. If you did have your own key working, you now have much higher rate limits.They work with web search too 👇
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Groq | OpenRouter: Browse models provided by Groq

OpenRouter (Alex Atallah) ▷ #general (163 messages🔥🔥):

Chat History Issues, Provider Routing and Performance, Model Performance Comparisons, OpenRouter Updates, Credit and Subscription Concerns

Chat History Disappears Amid Updates: Several users reported issues with disappearing chat histories after recent updates, highlighting that chat histories are only stored locally, which was not made clear upfront.
- Members discussed the lack of prominent warnings regarding data loss when clearing browser history, suggesting the need for clearer messaging in the application.
Provider Routing Effects on Performance: A user blacklisted a provider due to consistently receiving empty responses, indicating problems with the routing system favoring lesser-quality providers.
- Another member recommended checking the documentation to disable fallback settings for better control over provider selection.
Concerns about Model Performance: Discussion arose regarding the mixed reliability of hosted models, with some users experiencing poor performance and empty responses when using certain models.
- Users noted that while some models like Mixtral performed well, others, including Llama3, were described as less reliable.
Updates on OpenRouter Features: Updates from OpenRouter included the introduction of a price suffix for better modeling of request costs and a discussion on how requests are routed among providers.
- The community discussed how the default behavior ensures load balancing across the best available providers to maximize performance.
Subscription and Credit Usefulness: Users expressed frustration over spending on openAI while seeking more value from their subscriptions, especially regarding model access and performance.
- Concerns were raised regarding the potential expiration of credits and how previously paid credits can be utilized within different contexts.

Links mentioned:

Reasoning Tokens - Improve AI Model Decision Making: Learn how to use reasoning tokens to enhance AI model outputs. Implement step-by-step reasoning traces for better decision making and transparency.
Are OpenAI credits expiring?: Since dashboard change, I see no warning about credit expiration date. They forgot to put it, they placed it somewhere else or credits are not expiring any more?
Reddit - Dive into anything: no description found
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
Bau Bau Merrow GIF - Bau bau Merrow Virtualmerrow - Discover & Share GIFs: Click to view the GIF
Provider Routing - Smart Multi-Provider Request Management: Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.
Tweet from Sam Altman (@sama): OPENAI ROADMAP UPDATE FOR GPT-4.5 and GPT-5:We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings.We want AI to “just work” for you; we re...

not much happened today

Wed, Feb 12, 2025

OpenRouter (Alex Atallah) ▷ #general (114 messages🔥🔥):

Websearch functionality, Anthropic computer-use tools, Issues with Gemini model, Chathistory retrieval, Music chord detection AI

Discussion on Websearch Queries: Members debated the search query used by the Websearch feature, questioning if it processes the entire conversation as a single query.
- One suggested using alternative APIs due to concerns over the lack of flexibility in the current implementation.
Workaround for Anthropic Tools in OpenRouter: A user inquired about workarounds for integrating Anthropic’s computer-use tools with OpenRouter, noting schema differences.
- They shared a script but encountered errors related to required fields in the API.
Issues with Gemini Model: A member reported increased rejections when using the Gemini model, indicating stricter safety settings.
- This user compared it with the AI studio’s lower harassment flag, hinting at inconsistency in moderation.
Chathistory Retrieval Issues: A member expressed frustration over lost chat history following an update, emphasizing the importance of past discussions.
- Another user explained that chat records are stored in the browser’s IndexedDB, suggesting problems could arise from clearing site data.
AI Model for Music Chord Detection: A participant asked about AI models that could analyze music and provide chords, noting the challenges they faced with existing tools.
- They referenced a specific GitHub project but commended its performance while expressing disappointment in the output quality.

Links mentioned:

Tweet from Sam Altman (@sama): no thank you but we will buy twitter for $9.74 billion if you want
Generative AI Prohibited Use Policy: no description found
Computer use (beta) - Anthropic: no description found
Cline - Autonomous Coding Agent for VSCode: Cline is an AI-powered coding assistant for Visual Studio Code.
How Exa Search Works - Exa: no description found
Cline - Visual Studio Marketplace: Extension for Visual Studio Code - Autonomous coding agent right in your IDE, capable of creating/editing files, running command...
GitHub - spotify/basic-pitch: A lightweight yet powerful audio-to-MIDI converter with pitch bend detection: A lightweight yet powerful audio-to-MIDI converter with pitch bend detection - spotify/basic-pitch
OpenAI quietly funded independent math benchmark before setting record with o3: OpenAI's involvement in funding FrontierMath, a leading AI math benchmark, only came to light when the company announced its record-breaking performance on the test. Now, the benchmark's dev...
GitHub - openai/simple-evals: Contribute to openai/simple-evals development by creating an account on GitHub.
List of open source audio to midi packages : List of open source audio to midi packages . GitHub Gist: instantly share code, notes, and snippets.

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

mazvi: Cool

not much happened today

Tue, Feb 11, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Reasoning Tokens Visibility, Model Activity Pages

Reasoning Tokens Now Visible: Users can now view reasoning tokens in model activity pages, displayed alongside prompt and completion tokens.
- This feature enhances transparency in evaluating model performance, as illustrated in the attached image.
Insightful Display of Model Metrics: The introduction of viewing reasoning tokens aligns with ongoing efforts to improve user insights into model performance metrics.
- Such changes encourage deeper analysis and understanding among users regarding how models operate.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

chat-thyme Discord bot, FindSMap application, Open Router integration

chat-thyme: Discord Bot Made Easy: Chat-thyme is a system designed for setting up Discord bots with any LLM framework compatible with OpenAI, allowing for seamless integration with OpenRouter.
- It also offers search capabilities with Exa for models that support tool use, though its reliability varies by provider.
FindSMap PWA: Mapping History: FindSMap is a progressive web application that connects to historical maps and archaeological institutes globally, using Open Street Maps and Leaflet.js for mapping.
- Built with Claude and Open Router, it has undergone a long iterative process, showcasing the developer’s growth and commitment to the project.

Link mentioned: FindsMap - Research, Explore and Log Your Metal Detecting Finds: no description found

OpenRouter (Alex Atallah) ▷ #general (291 messages🔥🔥):

DeepSeek R1 performance issues, Gemini models and pricing, API request limitations, User experience with model outputs, Account management concerns

DeepSeek R1 experiencing timeouts: Users have reported significant performance issues with DeepSeek R1, particularly regarding timeouts when making requests.
- The ‘nitro’ variant for R1 is now integrated into the main model features, allowing users to sort by throughput.
Concerns over Gemini model pricing: Some users expressed frustration regarding the cost of using the Gemini Pro 1.5 model, which is seen as expensive despite being cheaper than some competitors.
- Others suggested exploring newer models like Gemini 2.0 Flash for better pricing and performance.
Issues with API request quotas: Several users faced ‘Quota exceeded’ errors when using API requests, indicating that their usage limits may have been reached.
- Provider responses indicated a temporary service disruption, but some users were still able to access models without issues.
User experiences with model output quality: Debates emerged around the relative quality of various AI models, with many asserting that certain models like Sonnet 3.5 outperform others in practical applications.
- Discussions included experiences with how different models handle context and reasoning tasks.
Account and data management challenges: Users raised concerns about the potential loss of chat history and difficulties in managing account settings effectively.
- There were also discussions about accessing models with specific provider keys without incurring costs.

Links mentioned:

Ollama: Get up and running with large language models.
Qwen2.5 VL 72B Instruct (free) - API, Providers, Stats: Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. Run Qwen2.5 VL 72B Instruct (free) with API
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Tweet from Vipul Ved Prakash (@vipulved): Rolling out a new inference stack for DeepSeek R1 @togethercompute that gets up to 110 t/s on the 671B parameter model!
Qwen VL Plus (free) - API, Providers, Stats: Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of...
Tweet from Shruti Mishra (@heyshrutimishra): 🚨 China JUST dropped another AI model that beats OpenAI, DeepSeek, and Meta.o1-level reasoning, 200K characters context window, 50 files, real-time search in 1000+ webpages.Here's everything you ...
no title found: no description found
GitHub - simplescaling/s1: s1: Simple test-time scaling: s1: Simple test-time scaling. Contribute to simplescaling/s1 development by creating an account on GitHub.
Models | OpenRouter: Browse models on OpenRouter
Groq is Fast AI Inference: Groq offers high-performance AI models & API access for developers. Get faster inference at lower cost than competitors. Explore use cases today!
Tweet from Sam Altman (@sama): no thank you but we will buy twitter for $9.74 billion if you want

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

OpenRouter Integration, Typescript SDK for LLMs

Building OpenAI-Formatted LLM Library: A team is developing a TypeScript SDK to call over 60 LLMs using OpenAI’s format and has just integrated OpenRouter for this purpose.
- Feedback is appreciated as they acknowledge the work might still be rough around the edges.
GitHub Repository for the Project: They shared a GitHub link for the abso project, aimed at facilitating calls to 100+ LLM Providers using OpenAI’s format.
- The repository promises a comprehensive TypeScript SDK for developers looking to implement this functionality.

Link mentioned: GitHub - lunary-ai/abso: TypeScript SDK to call 100+ LLM Providers in OpenAI format.: TypeScript SDK to call 100+ LLM Providers in OpenAI format. - lunary-ai/abso

not much happened today

Sat, Feb 8, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Authentication issues, Reasoning tokens visibility

Website Authentication Provider Down: Our website experienced issues due to the authentication provider being down, but the team is actively working on resolving it.
- The incident had no effect on our API services, and the website was back up approximately 15 minutes later.
Visibility of Reasoning Tokens Introduced: Reasoning tokens are now included in model activity pages, displayed alongside prompt and completion tokens for better visibility 📊.
- This update enhances user insight into token usage, as shared in the recent announcement with image details.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Chat-Thyme, Discord bots, OpenAI compatibility, Search capabilities with Exa

Launch of Chat-Thyme for Discord Bots: A member introduced Chat-Thyme, a system for setting up Discord bots that interface with any LLM framework compatible with OpenAI, making OpenRouter a simple plug-and-play option.
- This platform offers search capabilities with Exa for models that support tool use, though experiences have varied based on different model providers.
Open-source Nature of Chat-Thyme: The developer noted that Chat-Thyme is fully open source under the MIT license, encouraging community engagement and investigation.
- They expressed enthusiasm for feedback and contributions, inviting everyone to check out the project.

OpenRouter (Alex Atallah) ▷ #general (129 messages🔥🔥):

Downtime Issues, DeepSeek R1 Differences, Gemini Model Capabilities, OpenRouter API Usage, Reasoning Content Handling

Downtime on OpenRouter due to Clerk: Members reported experiencing downtime on OpenRouter related to issues with the authentication service, specifically Clerk, affecting logged-in users.
- A status update indicated that the root cause was identified and was addressed shortly after, restoring service functionality.
Confusion over DeepSeek R1 Variants: Discussion emerged regarding the differences between DeepSeek R1 and DeepSeek R1 Nitro, with users noting performance factors related to provider speed.
- The R1 Nitro variant is suggested to utilize providers with above-average TPS speeds, while basic R1 can access any provider without error restrictions.
Inquiries on Gemini Code Execution: A user asked whether Gemini Code Execution could be utilized within OpenRouter APIs, referring specifically to functionality outlined in Google’s documentation.
- Clarifications on model capabilities, specifically PDF and audio support for Gemini, were sought, along with specifics on the status of other models.
Utilization of Reasoning Content in API Calls: Users shared methods for enabling reasoning output within DeepSeek R1 through API requests by including include_reasoning: true.
- Questions arose about differentiating the output when reasoning is enabled, with one user successfully extracting just the output without reasoning content.
Safety Updates Affecting LLM Behavior: Members speculated about new safety updates affecting Claude 3.5, with reports of unexpected behaviors like responses to profanity.
- The community shared observations about perceived drops in model performance, attributing it to recent changes in the API updates.

Links mentioned:

Tweet from Eric Hartford (@cognitivecompai): HUGE release - Dolphin3.0-Mistral-24B and Dolphin3.0-R1-Mistral-24B🐬flippin' right at ya! The Dolphin goodness in a 24B size, smart as hell - PLUS the thinking R1 variant, trained with 800k tok...
SYNTHETIC-1 | Prime Intellect: A collaborative generation of the largest synthetic dataset of verified reasoning traces for math, coding and science using DeepSeek-R1.
BYOK — OpenRouter | Documentation: Bring your own provider API keys
no title found: no description found
BYOK — OpenRouter | Documentation: Bring your own provider API keys
Get a generation — OpenRouter | Documentation: Returns metadata about a specific generation request
Clerk, Inc. status : no description found

s1: Simple test-time scaling (and Kyutai Hibiki)

Fri, Feb 7, 2025

OpenRouter (Alex Atallah) ▷ #announcements (14 messages🔥):

DeepSeek Insurance, Kluster integration issues, Qwen model deprecation, Website downtime update

DeepSeek Insurance Now Covers No Completion Tokens: OpenRouter will now insure DeepSeek R1 requests that receive no completion tokens, ensuring no charge even if the upstream provider does.
- The completion rate for Standard DeepSeek R1 has improved from 60% to 96% over time.
Kluster Integration Issues Resolved: A user explained a situation where completion tokens were delayed by Kluster, leading to unexpected charges despite apparent timeouts on OpenRouter’s end.
- They discovered that Kluster was failing to cancel requests when timing out, but this issue has since been addressed.
Qwen Models Being Deprecated by Novita: Novita will be deprecating their Qwen/Qwen-2-72B-Instruct model, with OpenRouter disabling it around the same time.
- Users should make sure to transition away from this model before the deprecation date.
OpenRouter Website Experiences Downtime: OpenRouter experienced a minor downtime due to their authentication provider being down, affecting website access but not the API.
- The issue was resolved within approximately 15 minutes and services were restored.

Links mentioned:

R1 - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
R1 (nitro) - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Y CLI Development, Terminal Enthusiasm, Chat Data Management, MCP Client Support, Deepseek-r1 Integration

Y CLI emerges as an open router chat alternative: A personal project, Y CLI, aims to provide a web chat alternative with all chat data stored in single jsonl files.
- You can check out the project on its GitHub page.
MCP client support showcased: The project includes support for an MCP client, demonstrated in this asciinema recording capturing its functionality on macOS.
- This recording received 4 views and showcases xterm-256color and zsh in action.
Deepseek-r1 reasoning support added: Another feature of Y CLI is the Deepseek-r1 reasoning content support, evidenced in this asciinema recording.
- This demo also runs on macOS with 2 views and supports the xterm-256color and zsh terminal setup.
Github encourages contributions: Developers are invited to contribute to Y CLI via its GitHub repository.
- The page highlights the ongoing development efforts and user contributions illustrated in this GitHub overview.
A call for terminal fans: The developer expressed interest in finding fellow terminal fans within the community.
- The project aims to attract those who appreciate terminal-based tools and configurations.

Links mentioned:

y-cli mcp client: https://github.com/luohy15/y-cli
y-cli reasoning content: https://github.com/luohy15/y-cli
GitHub - luohy15/y-cli: Contribute to luohy15/y-cli development by creating an account on GitHub.

OpenRouter (Alex Atallah) ▷ #general (242 messages🔥🔥):

DeepInfra issues, Gemini 2.0 Flash readiness, OpenRouter authentication service, Error handling with models, Provider performance discrepancies

DeepInfra experiencing failures: Users reported that DeepInfra is currently failing to return responses 50% of the time due to an increase in processing delays.
- Some users are seeing zero token completions when utilizing DeepInfra with applications like SillyTavern.
Gemini 2.0 Flash model integration concerns: There are discussions around issues with the Gemini 2.0 Flash model regarding its incompatibility with tool calling.
- Users are filing issues as they encounter errors stating that tool invocation must have a return result, but it works fine with other models.
Authentication service downtime: OpenRouter experienced downtime due to issues with their authentication service provided by Clerk, Inc.
- Although the website faced challenges, the API remained operational for users, with updates being shared regarding the status.
Error identification with models: Users reported discrepancies and errors when utilizing different models, such as Mistral and Novita AI.
- Issues include one model returning unusually high token counts and another causing frequent processing failures.
General discussions about provider performances: The community is sharing observations about performance differences between models and providers, including suggestions for improvements.
- There is a call for better mechanisms to handle errors and optimize responses to streamline user experiences with AI models.

Links mentioned:

"The caller does not have permission" when creating API key: I'm using MakerSuite with Gemini and I deleted an API Key. I went to create a new one, but I'm getting an error saying the caller does not have permission. What does that mean and how can I ...
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Tweet from Logan Kilpatrick (@OfficialLoganK): @HCSolakoglu Vertex customers tend to skew larger enterprise customers and have flexibility to negotiate things like bulk discounts, etc. This does not apply to the Gemini Developer API, everyone pays...
Gemini Flash 2.0 - API, Providers, Stats: Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1. Run Gemini Flash 2.0 with API
Google AI Studio | OpenRouter: Browse models provided by Google AI Studio
CleanShot 2025-02-06 at 11 .21.37: Screenshot uploaded to CleanShot Cloud
Provider Routing — OpenRouter | Documentation: Route requests to the best provider
CleanShot 2025-02-06 at 10 .58.39: Screenshot uploaded to CleanShot Cloud
OpenRouterTeam/ai-sdk-provider: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of AI models through the OpenRouter chat and completion APIs. - OpenRouterTeam/ai-sdk-provider
no title found: no description found
Clerk, Inc. status : no description found
GitHub - OpenRouterTeam/ai-sdk-provider: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of AI models through the OpenRouter chat and completion APIs.: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of AI models through the OpenRouter chat and completion APIs. - OpenRouterTeam/ai-sdk-provider

Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking

Thu, Feb 6, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

DeepSeek R1 Nitro, Downtime Incident

DeepSeek R1 Nitro boasts speedy uptime: The DeepSeek R1 Nitro has shown significantly improved uptime and speed, with 97% of requests now completing fully with a finish reason, according to OpenRouterAI.
- Users are encouraged to try it out for optimal performance!
Minor hiccup causes brief downtime: A member acknowledged a minor hiccup that caused downtime but reported that the service should be back live now after a rollback.
- There was an apology for the inconvenience, assuring users that everything is operational again.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Significantly better uptime and speed on our DeepSeek R1 Nitro endpoint.Seeing 97% of requests now fully complete, with a finish reason. Try it! 👇

OpenRouter (Alex Atallah) ▷ #general (298 messages🔥🔥):

OpenRouter downtime, API errors and rate limits, Gemini 2.0 updates, Provider routing and pricing, Community support and troubleshooting

OpenRouter experiences downtime: Users noted API issues and rate limit errors, raising concerns about ongoing service reliability as Toven confirmed there might be downtime.
- The downtime was quickly addressed, with Toven announcing that service returned right away after reverting a recent change.
Rate limits affecting API calls: Some users, like Tusharmath, encountered rate limit errors when using the API, particularly with Anthropic, which has a limit of 20 million input tokens per minute.
- Louisgv mentioned reaching out to Anthropic for a potential rate limit increase to resolve issues for users experiencing restrictions.
Interest in Gemini 2.0 performance: Xiaoqianwx discussed expectations for Gemini 2.0, suggesting it may need stronger models to compete effectively, while noting mediocre benchmark results.
- The community expressed disappointment in performance, with discussions on the model’s comparative strengths and weaknesses.
Provider routing and pricing concerns: Users inquired about potential price controls for API usage, specifically regarding variations in costs across different providers.
- Toven introduced a new max_price parameter for controlling spending limits on API calls, which is currently live but not fully documented.
Community troubleshooting and support: Members utilized the channel for sharing error messages and seeking assistance, demonstrating a collaborative spirit in resolving issues.
- Toven encouraged users to reach out multiple times if their help tickets weren’t addressed promptly, emphasizing support availability.

Links mentioned:

LiveBench: no description found
Tweet from OpenAI Developers (@OpenAIDevs): @FeltSteam @UserMac29056 @TheXeophon @chatgpt21 chatgpt-4o-latest in the API is now updated and matches last week's GPT-4o update in ChatGPT. Sorry for the delay on this one! https://help.openai.c...
Managing your personal access tokens - GitHub Docs: no description found
no title found: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Provider Routing — OpenRouter | Documentation: Route requests to the best provider
tokenizer_config.json · deepseek-ai/DeepSeek-R1 at main: no description found
no title found: no description found
List endpoints for a model — OpenRouter | Documentation: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Provider Routing — OpenRouter | Documentation: Route requests to the best provider
DeepSeek users in US could face million-dollar fine and prison time under new law: Hugely popular Chinese AI app has raised security, privacy and ethical concerns
MythoMax 13B – Provider Status: See provider status and make a load-balanced request to MythoMax 13B - One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
Chat Templates: no description found
Gemini Flash 2.0 - API, Providers, Stats: Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1. Run Gemini Flash 2.0 with API
no title found: no description found
Reddit - Dive into anything: no description found
GitHub - ShivamB25/Research-Analysist: automate the process of web scraping and report generation: automate the process of web scraping and report generation - ShivamB25/Research-Analysist
GitHub - OpenRouterTeam/ai-sdk-provider: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of AI models through the OpenRouter chat and completion APIs.: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of AI models through the OpenRouter chat and completion APIs. - OpenRouterTeam/ai-sdk-provider

How To Scale Your Model, by DeepMind

Wed, Feb 5, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Cloudflare joins OpenRouter, Gemma 7B-IT release, Llama models availability

Cloudflare officially joins OpenRouter!: Cloudflare is now a provider on OpenRouter, introducing a variety of open-source models including their Workers AI platform and new Gemma models.
- This partnership enhances the OpenRouter ecosystem with a range of tools for developers in AI applications.
Exciting new release: Gemma 7B-IT!: Gemma 7B-IT is an inference-tuned model now available through Cloudflare, featuring tool calling capabilities for development efficiency.
- Developers are encouraged to explore this model for faster and more efficient tool integration in their applications.
A wide array of Llama models available now!: The platform now supports various Llama models, including Gemma 7B-IT, offering multiple options for users.
- Developers can request any of these models through Discord for their AI projects.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

Model Error Display

Model name now appears in error messages: A member announced that the display issue has been resolved and the name of the model will now show in error messages.
- This change aims to enhance user clarity when encountering errors.
Improved Error Feedback Mechanism: The update indicates a move towards better user experience by providing clearer error feedback that includes model specifics.
- Users can now efficiently troubleshoot issues with more context.

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}

o3-mini launches, OpenAI on "wrong side of history"

Sat, Feb 1, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

o3-mini model release, Reasoning capabilities, BYOK program updates

o3-mini model launches for BYOK users: OpenAI’s new model, o3-mini, is now available for Bring Your Own Key users in usage tiers 3 through 5, delivering enhanced reasoning capabilities.
- Users can add their key here to start utilizing the model which has shown a 56% preference over its predecessor in expert tests.
Impressive benchmarks achieved by o3-mini: o3-mini matches the performance of the larger o1 model on AIME/GPQA and possesses 39% fewer major errors on complex problems.
- This model also includes features like built-in function calling and structured outputs, catering to developers and STEM enthusiasts.
Affordable option for developers: The o3-mini model offers a budget-friendly solution for users seeking reliable assistance in math, science, and coding.
- It’s an attractive option for BYOK users aiming to access advanced reasoning capabilities without excessive costs.

Link mentioned: o3 Mini - API, Providers, Stats: OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. The model features three adjustable reasoning effort l…

OpenRouter (Alex Atallah) ▷ #general (445 messages🔥🔥🔥):

OpenRouter API Usage, Model Comparisons, O3-Mini Access, Claude 3.5 and AGI Discussions, Developer Insights and Suggestions

O3-Mini Access Requirements: Access to the O3-Mini model is currently limited to BYOK customers, specifically those with an OpenAI key and usage tier greater than 3.
- Free users can also utilize O3-Mini by selecting the Reason button in ChatGPT.
Model Performance Comparisons: Users debated the performance of models like OpenAI’s O1 and DeepSeek R1, with some stating R1 excels in writing quality.
- Others expressed disappointment with models, including the perception that GPT-4 doesn’t meet expectations.
AGI Perspectives in AI Community: Discussions around AGI revealed divided opinions, with some believing it’s in reach while others argue it’s a distant goal.
- Conversations included reflections on past AI presentations that sparked beliefs in AI’s potential.
OpenRouter API Testing and Errors: Developers shared experiences testing the OpenRouter API, finding it difficult to produce errors during tests.
- Suggestions for generating errors included using invalid API keys or tools unsupported by specific models.
Developer Engagement with the Community: Community members actively engaged in discussions about model capabilities and their own development experiences.
- They shared tips, queries and provided feedback to improve user experiences with APIs and model requests.

Links mentioned:

Quick Start | OpenRouter: Start building with OpenRouter
Tweet from OpenAI (@OpenAI): OpenAI o3-mini is now available in ChatGPT and the API.Pro users will have unlimited access to o3-mini and Plus & Team users will have triple the rate limits (vs o1-mini).Free users can try o3-mini in...
Microsoft makes OpenAI’s o1 reasoning model free for all Copilot users: Microsoft calls it Think Deeper
Anthropic Status: no description found
Mistral Nemo - API, Providers, Stats: A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chines...
Chess Analysis Board and PGN Editor: Analyze games with the strongest chess engine in the world: Stockfish. Improve your game with the help of personalized insights from Game Review.
Cerebras becomes the world’s fastest host for DeepSeek R1, outpacing Nvidia GPUs by 57x: Cerebras Systems launches DeepSeek's R1-70B AI model on its wafer-scale processor, delivering 57x faster speeds than GPU solutions and challenging Nvidia's AI chip dominance with U.S.-based ...
Cerebras becomes the world’s fastest host for DeepSeek R1, outpacing Nvidia GPUs by 57x: Cerebras Systems launches DeepSeek's R1-70B AI model on its wafer-scale processor, delivering 57x faster speeds than GPU solutions and challenging Nvidia's AI chip dominance with U.S.-based ...
Tweet from Tibor Blaho (@btibor91): "Meet the o3-mini family - Introducing o3-mini and o3-mini-high — two new reasoning models that excel at coding, science, and anything else that takes a little more thinking."
Llama 3.1 8B - Quality, Performance & Price Analysis | Artificial Analysis: Analysis of Meta's Llama 3.1 Instruct 8B and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window &...
Cerebras becomes the world’s fastest host for DeepSeek R1, outpacing Nvidia GPUs by 57x: Cerebras Systems launches DeepSeek's R1-70B AI model on its wafer-scale processor, delivering 57x faster speeds than GPU solutions and challenging Nvidia's AI chip dominance with U.S.-based ...
Reddit - Dive into anything: no description found

Mistral Small 3 24B and Tulu 3 405B

Fri, Jan 31, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

DeepSeek R1 Distill Qwen 32B, DeepSeek R1 Distill Qwen 14B

Introducing DeepSeek R1 Distill Qwen 32B: The new model DeepSeek R1 Distill Qwen 32B delivers lightweight performance similar to the larger R1 Llama 70b Distill, priced at $0.7/M for input and output.
- Interested users can request access to the model via the Discord channel.
Launch of DeepSeek R1 Distill Qwen 14B: The DeepSeek R1 Distill Qwen 14B is now available, promising smaller size and faster processing while scoring 69.7 on AIME 2024.
- This model is priced at $0.75/M for both input and output, and can also be accessed through the Discord.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts

OpenRouter (Alex Atallah) ▷ #app-showcase (6 messages):

Subconscious AI's capabilities, Beamlit's platform features, Discord engagement

Subconscious AI Transforming Decision-Making: Subconscious AI is revolutionizing decision-making through advanced AI-driven research, market simulation, and causal inference modeling, as noted on their website.
- They highlighted that their platform helps businesses and policymakers gain deep insights into consumer behavior and market trends, emphasizing the guaranteed human-level reliability of their causal models.
Beamlit Aims to Accelerate Generative AI Development: Mathis, co-founder of Beamlit, shared that their platform allows developers to ship AI agents up to 10× faster using a simple command interface akin to Vercel for AI Agents.
- They launched a free public alpha version, inviting users to provide feedback and explore features like integrated Github workflows and observability tools.
Community Engagement on Discord: A member expressed interest in Subconscious AI and joined their Discord for more information.
- This highlights an ongoing trend of community-oriented conversations aimed at fostering deeper connections between emerging AI technologies and potential users.

Links mentioned:

subconscious.ai: no description found
Beamlit: no description found
no title found: no description found

OpenRouter (Alex Atallah) ▷ #general (180 messages🔥🔥):

OpenRouter Pricing Concerns, DeepSeek R1 Model Limitations, Google AI Studio Rate Limits, Provider Issues and Downtimes, New Model Announcements

OpenRouter pricing sparks debate: Users are questioning why OpenRouter charges 5% for forwarding API requests, with one suggesting it feels too high given the service provided.
- You’ll have to take that one up with Stripe, another user quipped, hinting at potential underlying fees.
DeepSeek R1 generating issues with context window: Several users reported issues with the DeepSeek R1 model, including trouble retrieving responses when generation timed out due to exceeding context limits.
- One user confirmed that to view reasoning with the model, the include_reasoning parameter needs to be passed in the API request.
Frequent rate limit errors with Google AI Studio: Users have experienced 429 RESOURCE_EXHAUSTED errors while querying Gemini models in Google AI Studio, indicating exhausted quotas.
- The rate limits are imposed by Google, and users are encouraged to plug in their own keys for improved throughput.
Provider statuses fluctuate with downtimes: Some users reported ongoing 404 errors with OpenRouter’s API, particularly when trying to access the chat completions endpoint.
- The outages are attributed to varying provider capacities, with Nebius and Avian being highlighted for their inconsistent service.
Upcoming AI model releases spark excitement: Users discussed announcements regarding new AI models like Mistral’s Small 3 and Tülu 3, showcasing increased performance in various capacities.
- The community eagerly anticipates the integration of new models into OpenRouter as they promise significant capabilities.

Links mentioned:

Tweet from Ai2 (@allen_ai): Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of the Tülu 3 family demonstrates that our recipe, which includes Reinforcemen...
Tweet from Mistral AI (@MistralAI): Introducing Small 3, our most efficient and versatile model yet! Pre-trained and instructed version, Apache 2.0, 24B, 81% MMLU, 150 tok/s. No synthetic data so great base for anything reasoning - happ...
Quick Start | OpenRouter: Start building with OpenRouter
Quick Start | OpenRouter: Start building with OpenRouter
Tweet from Risphere (@risphereeditor): Fireworks AI is now the fastest DeepSeek provider in the US.DeepSeek-V3 and and DeepSeek-R1 now run at 30 tokens per second.Congrats to the @FireworksAI_HQ team!
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
Provider Routing | OpenRouter: Route requests across multiple providers
Reddit - Dive into anything: no description found
Parameters | OpenRouter: Configure parameters for requests
Gemini Flash 1.5 8B - API, Providers, Stats: Gemini Flash 1.5 8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. Run Gemini Flash 1.5 8B with API
Reddit - Dive into anything: no description found

not much happened today

Thu, Jan 30, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

DeepSeek R1, Chutes, Perplexity's Sonar, Sonar-Reasoning

DeepSeek R1 welcomes Chutes!: A new decentralized provider, Chutes, is offering a free endpoint for DeepSeek R1 at openrouter.ai. This adds more options for users looking to leverage the capabilities of DeepSeek R1.
- Exciting developments ahead as OpenRouter expands its provider lineup!
Perplexity enhances Sonar models: Perplexity’s Sonar, part of its latest model fleet, has received significant improvements making it more efficient for speed and cost. Check out sonar.perplexity.ai for details.
- A new version, Sonar-Pro, is expected to launch soon, promising even more capabilities!
Meet Sonar-Reasoning!: Sonar-Reasoning is a specialized reasoning model built on DeepSeek’s architecture, excelling at search and reasoning tasks. This new model aims to enhance user experience across various applications.
- Users can utilize similar functionalities across all models by leveraging the web search capabilities, as highlighted in the detailed announcement here.

Link mentioned: DeepSeek R1 - API, Providers, Stats: DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It’s 671B parameters in size, with 37B active in an inference pass. Ru…

OpenRouter (Alex Atallah) ▷ #general (277 messages🔥🔥):

OpenRouter User Experiences, DeepSeek Model Performance, Model Communication and Pricing, Image Generation Discussions, Translation Model Recommendations

Users Experience Limitations with DeepSeek and Pricing: Several users reported difficulties with the performance of the DeepSeek v3 and translations for languages like Polish, indicating that results can be incorrect and lack context.
- Meanwhile, concerns were raised about OpenRouter’s pricing structure, which users found expensive, citing 5% fees on API requests, which many believe could be lower.
Image Generation Request from Users: Users expressed strong interest in integrating image generation capabilities, such as DALL-E or Stability AI, into the OpenRouter platform for enhanced functionality.
- Members noted that the addition of such features could attract more users and enhance the platform’s utility.
Device and Communication Issues with Models: Some users faced issues with models not responding correctly or returning empty tokens, suggesting the need for more robust handling of outputs when using the OpenRouter interface.
- Others inquired about the retrieval of lost responses due to request length limits, highlighting the importance of data accessibility.
DeepSeek R1 Model Concerns: Users shared experiences regarding the R1 model’s output, reporting inconsistencies when compared to the OpenAI models and discussing the limits of reasoning within the interface.
- The need for upgraded video support for models like Gemini was also mentioned by members as a priority.
Translation Model Recommendations: Discussions emerged about the effectiveness of various translation models, with users finding that prompting in the target language led to better outcomes.
- Recommendations for alternatives like Grok and Claude were shared, with users noting their satisfaction with the clarity of system prompts.

Links mentioned:

Limits | OpenRouter: Set limits on model usage
Tweet from Olivier Depiesse (@carismarus): @OpenRouterAI Wen token?
Hugging Face – The AI community building the future.: no description found
Sonar Reasoning - API, Providers, Stats: Sonar Reasoning is a reasoning model provided by Perplexity based on [Deepseek R1](https://openrouter.ai/deepseek/deepseek-r1). Run Sonar Reasoning with API
DeepSeek R1 (free) - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
Provider Routing | OpenRouter: Route requests across multiple providers
Welcome to Inference Providers on the Hub 🔥: no description found
GitHub - OpenRouterTeam/openrouter-runner: Inference engine powering open source models on OpenRouter: Inference engine powering open source models on OpenRouter - OpenRouterTeam/openrouter-runner
Dynamics 365 Customer Voice : no description found

not much happened today

Wed, Jan 29, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Amazon Nova models, Amazon Bedrock operational issues, Model availability

Amazon Nova models experience downtime: Both Amazon Nova models and Amazon Bedrock encountered operational issues due to an upstream problem, resulting in the servers returning a misleading status code of 400.
- The surge in usage was misinterpreted as a key leak, although BYOK usage remained unaffected.
Amazon Bedrock fully operational: An update confirmed that Amazon Bedrock is now fully operational again, recovering from the previous issues impacting the models.
- All Nova and Claude models are back online, returning to normal functionality.

OpenRouter (Alex Atallah) ▷ #general (255 messages🔥🔥):

Deepseek Provider Issues, Gemini Video Support, Model Speed Comparisons, OpenRouter Usage, Provider Pricing Context

Deepseek Provider Struggles Persist: The Deepseek provider has been down for several days, frustrating users who relied on its fast performance for R1 queries.
- Some users speculate the downtime is due to a significant DDOS attack, indicating the provider’s challenges in managing increased requests.
Gemini Video Support Implementation: A user shared a code snippet for integrating video into their Gemini model workflow, indicating attempts to incorporate media handling.
- There is currently limited documentation on how to pass video references using OpenRouter, with team members checking for updates.
Comparative Model Speed and Performance: Users discussed the speeds of various models, noting that OpenRouter’s performance is perceived as faster and cleaner than the official OpenAI API.
- The stability and concurrency of different models were debated, with some reporting better experiences than others.
OpenRouter’s Flexibility with Providers: Users inquired about the specifics of using OpenRouter, seeking to understand how to force the use of specific providers in API calls.
- It was clarified that users can specify providers and manage fallback settings for their requests.
Free Model Offerings and Pricing Insights: A user questioned the rationale behind some models being offered for free, spurring discussions about service pricing.
- Mentioning the balance between cost-efficiency and performance, users emphasized the importance of understanding pricing structures.

Links mentioned:

Tweet from Qwen (@Alibaba_Qwen): The burst of DeepSeek V3 has attracted attention from the whole AI community to large-scale MoE models. Concurrently, we have been building Qwen2.5-Max, a large MoE LLM pretrained on massive data and ...
Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model: QWEN CHAT API DEMO DISCORDIt is widely recognized that continuously scaling both data size and model size can lead to significant improvements in model intelligence. However, the research and industry...
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
DeepSeek: DeepSeek R1 – Provider Status: See provider status and make a load-balanced request to DeepSeek: DeepSeek R1 - DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tok...
Operator: An agent that can use its own browser to perform tasks for you.
DeepSeek: DeepSeek R1 – Uptime and Availability: Uptime statistics for DeepSeek: DeepSeek R1 across providers - DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 67...
Provider Routing | OpenRouter: Route requests across multiple providers
Provider Routing | OpenRouter: Route requests across multiple providers
DeepSeek R1 (nitro) - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
DeepSeek R1 Distill Llama 70B - API, Providers, Stats: DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3. Run DeepSeek R1 Distill Llama 70B with API
no title found: no description found
Mem0 - The Memory layer for your AI apps: Mem0 is a self-improving memory layer for LLM applications, enabling personalized AI experiences that save costs and delight users.
LLM Cost - Insight Engine - AI Price Calculator: Compare LLM Costs Easily! Use this simple calculator to estimate AI model costs for OpenAI GPT, Google Gemini, and more. See pricing per million tokens and find the most cost-effective solution for yo...

DeepSeek #1 on US App Store, Nvidia stock tanks -17%

Tue, Jan 28, 2025

OpenRouter (Alex Atallah) ▷ #announcements (4 messages):

Liquid AI joins OpenRouter, Nitro DeepSeek R1, Amazon Nova models issue

Liquid AI Unveils Multilingual Models: We’re thrilled to announce that Liquid has joined OpenRouter as our newest provider, bringing powerful proprietary models like LFM 40B, LFM 3B, and LFM 7B to the platform.
- LFM-7B stands out as the best-in-class multilingual model optimized for performance across major languages, boasting an exceptional performance-to-size ratio.
Nitro DeepSeek R1 Launch!: The new Nitro variant for DeepSeek R1 is now available, which promises faster and more reliable performance as mentioned in the announcement.
- Upcoming features include dynamic Nitro variants that will allow sorting of providers by speed, with future updates showing medians instead of averages for throughput.
Downed Amazon Nova Models: Currently, Amazon Nova models are down due to an upstream issue with Amazon Bedrock, which misinterpreted a surge in usage as a key leak and is returning a misleading status code 400.
- We’re actively working on a fix and will provide updates as new information becomes available.

Links mentioned:

DeepSeek R1 (nitro) - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
no title found: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Introducing LFM-7B: Setting New Standards for Efficient Language Models: The world’s best-in-class English, Arabic, and Japanese model, native in French, German, and Spanish, optimized to be the substrate for private enterprise chat, code, fast instruction following, and a...

OpenRouter (Alex Atallah) ▷ #general (535 messages🔥🔥🔥):

DeepSeek Model Performance, OpenRouter API Issues, Model Suggestions and Submissions, BYOK Integration, Current State of DeepSeek Provider

DeepSeek Model Performance Uncertainties: Users reported fluctuating performance with DeepSeek models, particularly R1, experiencing slow response times and errors like ‘503 model is overloaded’. DeepSeek’s Nitro variant is a faster shortcut to Fireworks, but it isn’t performing better than expected.
- Updates indicated ongoing system issues, likely due to heavy user demand causing downtimes.
OpenRouter API Down Times: Multiple users experienced significant latency and errors when using DeepSeek through OpenRouter, prompting discussions on whether to migrate to direct API usage. The recommendation to bring your own keys (BYOK) from DeepSeek was made to mitigate rate limits.
- Users expressed frustrations about the speed and reliability of the API, with comparisons to the chatroom’s performance.
Model Suggestions and Submission Processes: A user inquired about the process for getting a model approved for OpenRouter use after their suggestion was deleted. Guidance was provided on the requirements for models to have inference providers willing to onboard onto OpenRouter.
- The user faced a rate limit issue when trying to resubmit their model suggestion.
Integration of BYOK with OpenRouter: Bringing your own provider API keys allows users to have direct control over rate limits and costs via their provider account, with 5% fees deducted from OpenRouter credits. Discussion highlighted the potential impacts of using BYOK on cost management.
- Users were advised to plug their keys into OpenRouter for better control over their API usage.
Current State of DeepSeek Provider: DeepSeek faced recent issues due to malicious attacks, resulting in service limitations for new registrations. Users noted the limitation of deepinfra as a provider for R1, likely due to reliability issues experienced by OpenRouter.
- Discussion emphasized the high demand on DeepSeek services, leading to challenges in maintaining stable performance.

Links mentioned:

flowith 2.0 - Your AI Creation Workspace, with Knowledge: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Quick Start | OpenRouter: Start building with OpenRouter
Operator: An agent that can use its own browser to perform tasks for you.
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Structured Outputs | OpenRouter: Enforce structured outputs for models
deepseek-ai/Janus-Pro-7B · Hugging Face: no description found
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
RULER: What's the Real Context Size of Your Long-Context Language Models?: The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted ...
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
bartowski/Llama-3.2-3B-Instruct-GGUF · Hugging Face: no description found
Parameters | OpenRouter: Configure parameters for requests
Steelskull/L3.3-MS-Nevoria-70b · Hugging Face: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
DeepSeek R1 Distill Llama 70B - API, Providers, Stats: DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3. Run DeepSeek R1 Distill Llama 70B with API
Reddit - Dive into anything: no description found
DeepSeek R1 - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
Provider Routing | OpenRouter: Route requests across multiple providers
DeepSeek Service Status: no description found
Snow White Parody GIF - Snow White Parody - Discover & Share GIFs: Click to view the GIF
meta-llama/Llama-Guard-3-8B · Hugging Face: no description found
GitHub - gomlx/gomlx: GoMLX: An Accelerated Machine Learning Framework For Go: GoMLX: An Accelerated Machine Learning Framework For Go - gomlx/gomlx
no title found: no description found
FAQ | DeepSeek API Docs: Account

TinyZero: Reproduce DeepSeek R1-Zero for $30

Sat, Jan 25, 2025

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

DeepSeek R1 updates, DeepSeek provider outage

DeepSeek R1 expands message pattern support: DeepSeek R1 now supports more types of message patterns, allowing users to send weird message orderings again.
- This update aims to improve overall usability and flexibility for users.
Temporary deranking of DeepSeek provider: The DeepSeek provider experienced a strange outage this morning, prompting a temporary deranking until the issue is fixed.
- Users were notified of this change to manage expectations regarding service availability.
DeepSeek provider back online: The DeepSeek provider has been restored and is back online after the earlier outage.
- This heralds the return of normal service for users relying on DeepSeek functionalities.

OpenRouter (Alex Atallah) ▷ #general (290 messages🔥🔥):

DeepSeek R1 Performance, Gemini API Access, BlackboxAI Concerns, Rate Limits and Key Usage, Provider Issues on OpenRouter

DeepSeek R1 Improving Performance: Users have reported that DeepSeek R1 is now working properly on SillyTavern, noting its writing quality is exceptional and cost-effective.
- Despite earlier issues, many users are now impressed with the model’s performance and affordability.
Gemini API Access and Limitations: There has been a discussion regarding the access to Gemini models, with some users suggesting the use of personal API keys to bypass rate limits on the free version.
- Users were encouraged to obtain their own keys to utilize higher rates and access features effectively.
Concerns Over BlackboxAI: A conversation has emerged concerning the legitimacy of BlackboxAI, highlighting its complicated installation process and lack of transparency in reviews.
- Some users are skeptical about its operations, suggesting it might be a scam or a poorly managed service.
Rate Limits and API Key Management: There were inquiries about rate limits associated with API keys, particularly whether they expire or get disabled.
- It was clarified that OpenRouter API keys do not expire but can be disabled by the user.
Provider Issues on OpenRouter: Users have been facing ongoing issues with DeepSeek and other provider models on OpenRouter, with some suggesting that DeepSeek’s API might have different weights than others.
- These discrepancies have led to discussions about the implications for benchmark results and overall user experience.

Links mentioned:

kluster.ai - Power AI at scale: Large scale inference at small scale costs. The developer platform that revolutionizes inference at scale.
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
Tweet from Anthropic (@AnthropicAI): Introducing Citations. Our new API feature lets Claude ground its answers in sources you provide.Claude can then cite the specific sentences and passages that inform each response.
no title found: no description found
DeepSeek R1 Distill Llama 70B - API, Providers, Stats: DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3. Run DeepSeek R1 Distill Llama 70B with API
Web Search | OpenRouter: Model-agnostic grounding
Crypto Payments API | OpenRouter: APIs related to purchasing credits without a UI
no title found: no description found
DeepSeek R1 - API, Providers, Stats: DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Ru...
Requests | OpenRouter: Handle incoming and outgoing requests

OpenAI launches Operator, its first Agent

Fri, Jan 24, 2025

OpenRouter (Alex Atallah) ▷ #announcements (6 messages):

Web Search API Launch, Reasoning Tokens Introduced, Web Search Pricing Update, Model Standardization Improvements, Announcement Prematurely Made

Web Search API Launch: The new Web Search API allows users to append :online to the model name for web search functionality, priced at $4/1k results.
- By default, up to 5 web results are fetched per request using Exa.ai, and users can customize the result count and prompt.
Introduction of Reasoning Tokens: The introduction of Reasoning Tokens enables users to see how models reason directly in the Chatroom by including include_reasoning: true in requests.
- This feature is also available via API, enhancing transparency in model reasoning.
Web Search Pricing Update: A clarification on web search pricing was shared, indicating a charge starting at $4/1k results, with costs resulting in less than $0.02 per request.
- This pricing model will begin implementation the next day, coinciding with API access soft launching.
Model Standardization Improvements: Recent updates include normalization of finish_reason across models, using OpenAI-style explanations, with native reasons returned for clarity.
- Reasoning tokens are standardized among all reasoning models, expected to enhance consistency and usability.
Announcement Prematurely Made: There was an acknowledgment of a premature announcement regarding updates; certain features were still in deployment.
- This prompted a humorous reaction in the chat, ensuring clarity on the ongoing changes.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): New LLM standard emerging: Reasoning Tokens! 🧠- you can now see how models reason directly in the Chatroom- standardized API (including finish reasons) across multiple thinking models, including Deep…
Cheers Cheerleader GIF - Cheers Cheer Cheerleader - Discover & Share GIFs: Click to view the GIF
Tweet from OpenRouter (@OpenRouterAI): Another launch today: the Web Search API!Add grounding to any model by simply appending “:online” 🌐

OpenRouter (Alex Atallah) ▷ #general (244 messages🔥🔥):

Deepseek R1, API Features and Issues, Web Search Pricing, Model Performance Comparisons, Payment Methods for Credits

Deepseek R1 struggles with responsiveness: Users reported that Deepseek-provided R1 is hanging and failing to return responses from both Deepseek and DeepInfra, indicating potential service issues.
- One user questioned if the lack of response was due to inherent model problems.
API Key Issues and Charges: A user encountered a problem where their API key for Mistral wasn’t being prioritized, resulting in charges on their credits instead of using the integration settings.
- Other users speculated that additional charges related to OpenRouter might be complicating the issue.
Web Search Pricing Explained: The new pricing for web search queries indicates a charge of $0.02 for each web result, which adds to the overall usage costs.
- Users expressed confusion about the implications of the web search feature on their API usage and billing.
Discussions on Model Performance: Participants discussed the capabilities of distilled models, noting that while some models do not ‘think’, they still outperform older models like O1 Mini and Claude.
- There was some skepticism about the performance differences and the effectiveness of various model implementations.
Payment Method Alternatives for Credits: A user inquired about alternative methods for purchasing credits, especially given that available methods were not suitable for their country.
- It was mentioned that credits could be purchased using cryptocurrency through OpenRouter’s interface, providing a workaround for some users.

Links mentioned:

kluster.ai - Power AI at scale: Large scale inference at small scale costs. The developer platform that revolutionizes inference at scale.
Introducing Citations on the Anthropic API: Today, we’re launching Citations, a new API feature that lets Claude ground its answers in source documents.
Web Search | OpenRouter: Model-agnostic grounding
Tweet from Anthropic (@AnthropicAI): Introducing Citations. Our new API feature lets Claude ground its answers in sources you provide.Claude can then cite the specific sentences and passages that inform each response.
DeepSeek R1 Distill Llama 70B - API, Providers, Stats: DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](https://openrouter. Run DeepSeek R1 Distill Llama 70B with API
no title found: no description found
Reddit - Dive into anything: no description found
no title found: no description found
Crypto Payments API | OpenRouter: APIs related to purchasing credits without a UI
DeepSeek R1 - API, Providers, Stats: DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens📖 Fully open-source model & [technical report](https://api-docs.deepseek…
Requests | OpenRouter: Handle incoming and outgoing requests

Bespoke-Stratos + Sky-T1: The Vicuna+Alpaca moment for reasoning

Thu, Jan 23, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Web Search Pricing, API Access Launch

Web Search Pricing Set at $4/1k Results: A new pricing model for web search has been established at $4/1k results, set to be implemented starting tomorrow.
- Each request will typically include up to 5 web search results, resulting in an approximate cost of less than $0.02 per request.
API Access to Soft Launch Tomorrow: The API access is set to soft launch alongside the new pricing, introducing expanded customizability options.
- Members are encouraged to share any feedback or questions regarding the new features.

OpenRouter (Alex Atallah) ▷ #general (187 messages🔥🔥):

DeepSeek model performance, DeepSeek R1 issues, Censorship concerns, Uncensored models, Cerebras model availability

DeepSeek R1 experiences performance drop: Multiple users reported an 85% drop in API performance for DeepSeek R1 overnight, hinting at increased scrutiny on this provider.
- Concerns were raised about potential censorship within the model, particularly when discussing sensitive topics.
Concerns over censorship in AI models: Users expressed frustration with the censorship found in models like DeepSeek R1 and discussed how to navigate it.
- Tricks to bypass censorship were shared, with humor about the risks involved in pressing boundaries on uncensored content.
Favorite uncensored models on OpenRouter: Discussion centered around which models on OpenRouter are considered uncensored and coherent, with users mentioning options like Dolphin models and Hermes.
- For NSFW roleplaying, choices were narrowed down further, indicating limited yet popular selections.
Availability of Cerebras models: Despite mentions of Cerebras’ Mistral Large, users confirmed it is not generally available for public use, leading to frustration about inaccessible models.
- Many noted that only Llama models seem to be available from Cerebras, raising questions about the validity of model existence claims.
Upcoming feature fixes and enhancements: OpenRouter team confirmed they are working on resolving issues with DeepSeek R1 and the upcoming API for R1 search.
- Users are encouraged to keep an eye out for improvements, with ongoing transparency regarding maintenance and updates.

Links mentioned:

Hyperbolic AI Dashboard: no description found
DeepSeek R1 – Uptime and Availability: Uptime statistics for DeepSeek R1 across providers - DeepSeek-R1 is here!⚡ Performance on par with OpenAI-o1📖 Fully open-source model & technical report🏆 MIT licensed: Distill & commercializ…
Hyperbolic | OpenRouter: Browse models provided by Hyperbolic
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
no title found: no description found
no title found: no description found
Fireworks - Fastest Inference for Generative AI: Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!
Hyperbolic AI Dashboard: no description found
账号登录-火山引擎: no description found
no title found: no description found

Project Stargate: $500b datacenter (1.7% of US GDP) and Gemini 2 Flash Thinking 2

Wed, Jan 22, 2025

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Llama endpoints discontinuation, DeepSeek R1 censorship-free, DeepSeek R1 web search grounding

Llama Endpoints to Disappear: The free Llama endpoints will no longer be available at the end of the month due to changes from the provider, Samba Nova.
- Samba Nova will transition to a Standard variant and will incur pricing, affecting user access.
DeepSeek R1 is Censorship-Free: DeepSeek R1 can be used censorship-free on OpenRouter, affirming its capabilities.
- Despite some limitations discussed, fine-tuning may enhance its performance according to community feedback.
DeepSeek R1 Adds Web Search Functionality: DeepSeek R1 now integrates web search grounding on OpenRouter by clicking the 🌐 icon.
- It performs comparably to OpenAI’s o1 model while costing only $0.55 per input token, making it an economical choice.

Links mentioned:

Tweet from Alex Atallah (@xanderatallah): Note that you can use DeepSeek R1 censorship-free on @OpenRouterAI:Quoting MatthewBerman (@MatthewBerman) DeepSeek R1 doing what @shaunralston expected. At the end of the day, it's still a censore...
Tweet from OpenRouter (@OpenRouterAI): Bring DeepSeek R1 Online!you can incorporate web search results by clicking the 🌐 icon in OpenRouter:Quoting OpenRouter (@OpenRouterAI) DeepSeek R1 is now live on OpenRouter!⚡ Performance on par with...

OpenRouter (Alex Atallah) ▷ #general (152 messages🔥🔥):

DeepSeek R1 and V3 Comparison, Gemini 2.0 Flash Update, API Key Tiers for Gemini Models, Reasoning Content Retrieval, Perplexity's New Sonar Models

DeepSeek R1 for Reasoning and V3 for Chatting: Users are discussing the ideal combination of models for optimal performance, recommending DeepSeek V3 for chatting and DeepSeek R1 for reasoning.
- This combination is viewed as effective due to R1’s reasoning capabilities alongside V3’s chatting features.
Gemini 2.0 Flash gets a Major Update: A new model, ‘Gemini 2.0 Flash Thinking Experimental 01-21’, has been released with a 1 million context window and 64K output tokens.
- Users noted some inconsistencies in model naming during the rollout process, which took about ten minutes.
No Tiered API Keys for Gemini 2: It is highly unlikely that Gemini 2 will require tiered API keys similar to O1, as it’s not yet fully deployed on Vertex.
- Currently, it is accessible only through AI Studio.
Strategies to Access Reasoning Content: A user suggested a method to trick the system into displaying reasoning content by using certain prefixes in the API calls.
- Concerns about managing token clutter from previous CoTs are raised, stressing the importance of effective message handling.
Perplexity Launches New Sonar Models: Perplexity introduced two new Sonar models and users are encouraged to vote for their addition.
- Feedback on Perplexity’s performance is mixed, with some users expressing skepticism about the models’ utility.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): @risphereeditor @deepseek_ai Thanks for flagging - adding it now!
Tweet from Risphere (@risphereeditor): Perplexity now has a Sonar API.The Sonar API is a web search LLM engine. It uses Perplexity's fine-tuned LLMs.There are two models, Sonar and Sonar Pro. Sonar Pro has access to more sources.It'...
Tweet from Sato Mahga (@Satomahga): `gemini-2.0-pro-exp` was added to the quota in Google Cloud (and accordingly to Ai Studio projects) about 40 minutes ago. The quota itself for the free tier is 100 requests per day and 5 requests per ...
no title found: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Provider Routing | OpenRouter: Route requests across multiple providers
Reasoning Model (deepseek-reasoner) | DeepSeek API Docs: deepseek-reasoner is a reasoning model developed by DeepSeek. Before delivering the final answer, the model first generates a Chain of Thought (CoT) to enhance the accuracy of its responses. Our API p...

DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level

Tue, Jan 21, 2025

OpenRouter (Alex Atallah) ▷ #announcements (4 messages):

DeepSeek R1 Launch, Performance Comparison with OpenAI, Censorship-Free Access, Llama Endpoints Shutdown

DeepSeek R1 Launches on OpenRouter: The DeepSeek R1 model is now live on OpenRouter, boasting performance comparable to OpenAI’s o1 model.
- With transparent thinking tokens, it is priced at $0.55 per input token, which is just 4% of the cost of OpenAI’s equivalent.
Censorship-Free DeepSeek R1: Users can access DeepSeek R1 censorship-free on OpenRouter, as noted by community discussions.
- Despite being a censored model, users believe that fine-tuning by experts could enhance performance.
Free Llama Endpoints Discontinued: A notice was shared that the free Llama endpoints will be going away at the end of the month due to changes from the provider, Samba Nova.
- Samba Nova will transition to the Standard variant, which will come with a price increase.

Links mentioned:

Tweet from Alex Atallah (@xanderatallah): Note that you can use DeepSeek R1 censorship-free on @OpenRouterAI:Quoting MatthewBerman (@MatthewBerman) DeepSeek R1 doing what @shaunralston expected. At the end of the day, it's still a censore...
Tweet from OpenRouter (@OpenRouterAI): DeepSeek R1 is now live on OpenRouter!⚡ Performance on par with OpenAI o1🧠 Transparent thinking tokens🍕$0.55/M input tokens, hosted by @deepseek_ai That’s 4% of the price of o1Quoting Risphere (@ris...
DeepSeek R1 - API, Providers, Stats: DeepSeek-R1 is here!⚡ Performance on par with OpenAI-o1📖 Fully open-source model & technical report🏆 MIT licensed: Distill & commercialize freely!. Run DeepSeek R1 with API

OpenRouter (Alex Atallah) ▷ #general (258 messages🔥🔥):

DeepSeek R1 Launch, OpenAI Model Rate Limits, User Experience with DeepSeek, Web Search API in OpenRouter, Reasoning Content Access

DeepSeek R1 is Live!: DeepSeek announced the launch of R1, which reportedly performs on par with OpenAI’s models and is fully open-source, licensed under MIT.
- Users expressed excitement about its capabilities, especially in creative tasks like video content generation and calculus.
OpenAI Model Rate Limits Explained: Users sought clarification on rate limits for Gemini 2.0 through OpenRouter, with confirmations that paid models have no limits, while free models are capped at 200 requests per day.
- It was noted that users can add their rate limit settings by connecting their API keys.
User Feedback on DeepSeek: Several users shared their initial experiences with DeepSeek R1, reporting it as a strong tool for various applications, although some expressed frustration with API limitations.
- There were discussions about potential adjustments to improve access to reasoning content from the API.
Web Search API Availability: Inquiries arose regarding the availability of the Web Search API, with confirmation that it is currently only accessible through the chatroom interface.
- Users expressed interest in a beta option for expanding its integration capabilities.
Accessing Reasoning Content with DeepSeek: Questions were raised about obtaining reasoning_content from the DeepSeek API, with responses indicating that OpenRouter needs to implement support for it.
- The community is eager for updates on this feature as it could enhance the model’s usability.

Links mentioned:

Tweet from undefined: no description found
Transforms | OpenRouter: Transform data for model consumption
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Tweet from xiaoqianWX (@xiaoqianWX): DeepSeek R1's API just became available(model name: deepseek-reasoner). Pricing seems to be 15CNY(2USD)/Mtok out. Haven't been able to bench anything on it yet
Models | OpenRouter: A table of all available models
deepseek-ai/DeepSeek-R1 · Hugging Face: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Tweet from DeepSeek (@deepseek_ai): 🚀 DeepSeek-R1 is here!⚡ Performance on par with OpenAI-o1📖 Fully open-source model & technical report🏆 MIT licensed: Distill & commercialize freely!🌐 Website & API are live now! Try DeepThink at h...
deepseek-ai/DeepSeek-R1-Zero · Hugging Face: no description found
Tweet from Teortaxes▶️ (@teortaxesTex): R1 pass@10 is *way better* than o1-High compute; gains 20% on Hard set over pass@1. Whales tend to be mode-collapsed so pass@n only makes sense with how cheap they are. This supports my guess that rea...
Screenshot 2025 01 18 192202: Image Screenshot 2025 01 18 192202 hosted in Tinypic

not much happened today

Sat, Jan 18, 2025

OpenRouter (Alex Atallah) ▷ #general (116 messages🔥🔥):

Activity Page Confusion, Gemini Model Endpoint Changes, OpenRouter API and Regional Restrictions, DeepSeek Performance, BYOK API Key Integration

Confusion Over Activity Page: Users expressed frustration regarding the activity page, where the graph displays the same information across multiple keys, leading to confusion about whether this is a bug or intended feature.
- Several users noted the importance of tracking usage per key for better management.
Gemini Model’s Endpoint Changes: A member highlighted that the Gemini 2.0 flash model has a new endpoint for requests, which has caused errors in the OpenRouter setup.
- Other users confirmed the website’s documentation had to be updated to reflect these changes.
Issues with Regional Restrictions: Several users reported that OpenRouter requests from Hong Kong face restrictions, while using Singaporean IPs resolves the issue, suggesting a possible new relay node.
- Others recalled that OpenAI and Anthropic have long been limited in the region, which might explain the current challenges.
DeepSeek Performance and Configuration: Discussion emerged about user experiences with DeepSeek V3 performance, with inquiries regarding optimal settings for the highest quality outputs.
- Some users noted varied results in results quality, sparking conversations about effectiveness across different use cases.
BYOK Integration Feedback: Users suggested that BYOK (Bring Your Own Key) feature could benefit from clearer messages confirming successful key integrations.
- Feedback was shared on additional metadata for requests indicating if the BYOK was successfully activated.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
AI Engine: Adds AI features to WordPress. Chatbots, Forms, Copilot, Content Generation, and much more!
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
DeepSeek V3 - API, Providers, Stats: DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported ev...

not much happened today

Fri, Jan 17, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Minimax-01, Needle-In-A-Haystack test, Requesting models on Discord

Minimax-01 Launches with Record Context Length: The new model Minimax-01 is now available, being the first open-source LLM to pass the Needle-In-A-Haystack test at an impressive 4M context length. More details can be found on the OpenRouter page.
- To request access to the model, visit our Discord.
Image Analysis Updates: An image was attached alongside the announcement regarding Minimax-01, providing a visual reference for the model. The image features analyses relevant to the details shared in the launch.
- Further insight into the image can be found in the associated Discord attachment.

Link mentioned: OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts

OpenRouter (Alex Atallah) ▷ #general (85 messages🔥🔥):

Minimax model performance, DeepSeek issues, OpenRouter regional restrictions, Gemini flash model errors, Activity page functionality

Minimax model evaluation sparks interest: Users are curious about the performance of the new Minimax model with developer tasks, especially in comparison to existing options like DeepSeek.
- Discussions highlight that while some expect only decent performance, published scores such as those from humaneval may be worth checking.
DeepSeek experiences delays: Members reported ongoing issues with DeepSeek, including latency and provider reliability, particularly during peak usage times.
- Users discussed strategies for troubleshooting, including checking provider errors and potential fixes involving tweaking API settings.
OpenRouter’s regional restrictions revealed: It was confirmed that OpenRouter has been enforcing regional restrictions in line with policies from OpenAI and Anthropic for some time.
- The revelation stirred conversations about the implications of these restrictions and user experiences navigating them.
Gemini model endpoint changes cause confusion: Updates on the Gemini flash 2.0 model indicated a change in endpoints, leading to unexpected errors for users trying to access the service.
- Affected users shared solutions, including adjustments to privacy settings to remedy problems with endpoint accessibility.
Activity page functionality questioned: A user raised concerns about the activity page, which appeared to show identical graphs for different API keys, causing confusion.
- Clarification revealed that the page currently aggregates all transactions without distinction, leading to debates around its design and utility.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Minimax-01 by @MiniMax__AI is now available: a low-cost, 456B multi-modal open source LLM.It's the first to pass the vanilla Needle-In-A-Haystack test at a whopping 4M context:
AI Engine: Adds AI features to WordPress. Chatbots, Forms, Copilot, Content Generation, and much more!
Provider Routing | OpenRouter: Route requests across multiple providers

Titans: Learning to Memorize at Test Time

Thu, Jan 16, 2025

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Mistral coding model, Minimax-01 release

Mistral drops new top-line coding model: The new Fill-In-The-Middle (FIM) coding model from Mistral has been released, marking a significant advancement in coding models. Users can request the model through the Discord channel as it is currently not available for direct access.
- This model promises to be a top performer in its category, emphasizing its unique capabilities beyond just FIM.
Minimax-01 sets new records: The first open-source LLM, Minimax-01, is now available and has impressively passed the Needle-In-A-Haystack test at a remarkable 4M context length. More details can be found on the Minimax page.
- To access this model, interested users are directed to the same Discord server for requests.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): New top-line FIM (fill-in-the-middle) coding model from @MistralAI is out!
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts

OpenRouter (Alex Atallah) ▷ #general (80 messages🔥🔥):

DeepSeek API issues, Token limit inconsistencies, Provider performance, Model removal, Prompt caching

DeepSeek experiencing performance issues: Many users reported problems with the DeepSeek API, experiencing inconsistent response times and errors while trying to access the service.
- Nicholaslyz noted that all DeepSeek v3 providers have been facing latency issues recently, particularly with first token responses.
Inconsistencies in token limits: Users expressed frustration over DeepSeek’s handling of token limits, where claims of 64k tokens can suddenly drop to 10-15k without warning.
- Amoomoo highlighted how this impacts development, as unexpected errors undermine the reliability of their
Concerns about model removal: A user inquired about the potential removal of the model lizpreciatior/lzlv-70b-fp16-hf, stating they encountered a no endpoint error.
- Toven responded, indicating that there may no longer be a provider available for that model.
Performance of various providers debated: There was discussion about the performance disparities between DeepSeek and other providers, with some noting TogetherAI and NovitaAI had higher latencies.
- Nilaier mentioned that while the OpenRouter website displayed high latency for those two providers, DeepSeek and DeepInfra maintained more manageable response times.
Prompt caching functionality: A user inquired about whether the OpenRouter supports prompt caching for models like Claude.
- Toven confirmed that caching is supported, linking to the documentation for more details.

Links mentioned:

Prompt Caching | OpenRouter: Optimize LLM cost by up to 90%
Requests | OpenRouter: Handle incoming and outgoing requests
DeepSeek V3 – Uptime and Availability: Uptime statistics for DeepSeek V3 across providers - DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-...
MiniMax-Intelligence with everyone: no description found
Dum Suspense GIF - Dum Suspense Climax - Discover & Share GIFs: Click to view the GIF
MiniMaxAI/MiniMax-Text-01 · Hugging Face: no description found

small little news items

Wed, Jan 15, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Telegram AI Chatbot, DeVries AI Subscription Model, Multi-Model Access in Telegram

DeVries AI transforms Telegram into LLM Interface: The new DeVries AI allows users to converse with 200+ Large Language Models directly in Telegram for a low cost subscription.
- Chat now with the AI for free as a trial before committing to a subscription.
Streamlined AI Access in One Chat: The DeVries AI chatbot provides access to popular models like ChatGPT and Claude in a single, familiar Telegram environment.
- Users can engage in text and soon image and video generation using this integrated solution.
Affordable AI Subscription Solution: For just $24.99/month, users gain access to all current and future generative AI models without needing multiple subscriptions.
- This model allows effortless switching between models and early access to new releases.

Link mentioned: devriesai: Your Telegram AI Agent

OpenRouter (Alex Atallah) ▷ #general (106 messages🔥🔥):

OpenRouter Provider Setup, Deepseek Performance Issues, OpenRouter Rate Limits, Anime Discussions, MiniMax Model Releases

How to Become an OpenRouter Provider: To become a provider and deploy models on OpenRouter, one can reach out via email at [email protected] for assistance.
- A user creatively speculated about creating a provider that secretly uses OpenRouter, prompting a humorous response about possibly inventing AGI.
Deepseek V3 Performance Takes a Hit: Users reported that Deepseek is often slow to respond, with a member stating it takes 7 out of 10 times to get a reply.
- It’s believed that the slow response times are due to overload, with some users suggesting a transition to alternatives like the Together AI endpoint.
Navigating OpenRouter’s Rate Limits: New users receive some free credits for testing, but the rate limits depend on the amount of credits purchased, as explained in the OpenRouter docs.
- Several users discussed the lack of an enterprise account, emphasizing the need for higher rate limits for their testing purposes.
Anime Series Recommendations Spark Conversation: A lively discussion emerged about various anime series, particularly the Fate series, with users passionately recommending different titles.
- The conversation humorously shifted to user interactions with anime titles, showcasing shared experiences and preferences.
Minimax Model Announcements: The launch of a new MiniMax model with a staggering 456 billion parameters has generated attention for its impressive context handling capabilities.
- While not the SOTA in benchmarks, its efficiency and capacity positions it as a potentially valuable tool in the AI landscape.

Links mentioned:

Models: 'free' | OpenRouter: Browse models on OpenRouter
Dum Suspense GIF - Dum Suspense Climax - Discover & Share GIFs: Click to view the GIF
Limits | OpenRouter: Set limits on model usage
Anthropic Status: no description found
Fate/Zero - Wikipedia: no description found
Reddit - Dive into anything: no description found
MiniMaxAI/MiniMax-Text-01 · Hugging Face: no description found
MiniMaxAI/MiniMax-VL-01 · Hugging Face: no description found

not much happened today

Tue, Jan 14, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

louisgv: Phi 4 is now available: https://openrouter.ai/microsoft/phi-4

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Friday Agents, Telegram LLM Interface, DeVries AI Chatbot

Friday Agents Framework Launch: The GitHub repository for Friday Agents introduces a powerful JavaScript framework for building AI-powered applications using a multi-agent architecture, available at GitHub - amirrezasalimi/friday-agents.
- This framework consists of two main components to streamline the development of AI applications.
Unlock 200+ AI Models via Telegram: The DeVries AI Chatbot allows users to converse with 200+ large language models directly in Telegram for a low-cost subscription, with a free trial available at devriesai.com.
- For just $24.99/month, users gain access to all current and upcoming AI models, simplifying interactions through a familiar platform.

Links mentioned:

devriesai: Your Telegram AI Agent
GitHub - amirrezasalimi/friday-agents: Friday Agents. App: https://chat.toolstack.run/: Friday Agents. App: https://chat.toolstack.run/. Contribute to amirrezasalimi/friday-agents development by creating an account on GitHub.

OpenRouter (Alex Atallah) ▷ #general (212 messages🔥🔥):

OpenRouter usage, Deepseek model performance, Launching of Mistral's Codestral model, Comparison of different LLMs, Provider deployment on OpenRouter

OpenRouter Offers Flexible LLM Options: Users express satisfaction with OpenRouter’s Deepseek V3 model due to its performance and pricing, while others explore Android apps with features similar to OpenRouter’s chat interface.
- Concerns are raised about the limitations of specific models, particularly regarding handling images and performance inconsistencies.
Mistral’s Codestral Model Release: Mistral has announced the launch of their new Codestral model featuring a 262K context and improvements over previous versions, although it is no longer available for general release.
- The model is noted for its efficient architecture and increased speed for coding tasks, but users express disappointment at the lack of open access.
Discussions on LLM Pricing and Cost Comparisons: Participants discuss the costs associated with various cloud services for LLM implementation, with some expressing an interest in comparing expenses across platforms.
- Questions arise about ideal providers among users, particularly when considering different models’ performance and relevant features.
Insights on OpenRouter Model Providers: Some users inquire about becoming model providers on OpenRouter, seeking guidance on the process to deploy models within the OpenRouter ecosystem.
- Support is mentioned as a crucial contact point for individuals interested in offering their own models through the platform.
Exploring Model Selection and Usage Strategies: Discussion on the effectiveness of different LLMs indicates a preference for models that generate comprehensive reports over conversational responses.
- Users share their strategies for selecting models based on specific use cases, noting the importance of context and processing times in their choices.

Links mentioned:

Codestral 25.01: Code at the speed of Tab. Available today in Continue.dev and soon on other leading AI code assistants.
Crypto Payments API | OpenRouter: APIs related to purchasing credits without a UI
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Models | OpenRouter: A table of all available models
Tweet from OpenRouter (@OpenRouterAI): 23% growth in inference last week 👀@AnthropicAI's Claude 3.5 Sonnet self-moderated was the biggest source
GitHub - openai/openai-node: Official JavaScript / TypeScript library for the OpenAI API: Official JavaScript / TypeScript library for the OpenAI API - openai/openai-node
OpenRouter · Cloudflare AI Gateway docs: OpenRouter ↗ is a platform that provides a unified interface for accessing and using large language models (LLMs).
Tweet from Cloudflare Developers (@CloudflareDev): We now support @OpenRouterAI on Cloudflare's AI Gateway. You can now add them as a provider to monitor, log and control your OpenRouter LLM requests. Read more on how to add them here 👇
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Calling women ‘household objects’ now permitted on Facebook after Meta updated its guidelines | CNN Business: no description found
Weights & Biases - Evaluating and testing LLM applications · AaronWard/generative-ai-workbook · Discussion #36: Article W&B Sweeps - used for iterating of configurations and evaluating metrics such as tokens used, cost, response quality results, different templates, additional configurations etc. 1. Underst...
‘It’s Total Chaos Internally at Meta Right Now’: Employees Protest Zuckerberg’s Anti LGBTQ Changes: Meta's decision to specifically allow users to call LGBTQ+ people "mentally ill" has sparked widespread backlash at the company.
Mark Zuckerberg Orders Removal of Tampons From Men's Bathrooms at Meta Offices: Report | 🌎 LatestLY: It is also reported that business managers were instructed to remove tampons from men's bathrooms, which Meta provided to non-binary and transgender employees using the men's bathroo...
Meta Deletes Trans and Nonbinary Messenger Themes: Amid a series of changes that allows users to target LGBTQ+ people, Meta has deleted product features it initially championed.
GitHub - OpenRouterTeam/openrouter-runner: Inference engine powering open source models on OpenRouter: Inference engine powering open source models on OpenRouter - OpenRouterTeam/openrouter-runner

Moondream 2025.1.9: Structured Text, Enhanced OCR, Gaze Detection in a 2B Model

Sat, Jan 11, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

AI Agent Hackathon, OpenRouter API credits, Live Agent Studio, Voiceflow sponsorship, n8n prize increase

Join the oTTomator AI Agent Hackathon!: Participants can create agents using any LLM and claim $10 in OpenRouter API credits, with total prizes reaching $1,500 for first place and $150 for runners-up. Registration is open now until January 22nd with winners announced on February 1st; register here.
- This individual competition allows only one submission per person, and participants are encouraged to review the provided agreements and guides.
Exciting Cash Prizes for the Hackathon: The oTTomator Live Agent Studio Hackathon is offering $6,000 in cash prizes sponsored by Voiceflow and n8n! The hackathon runs from January 8th to January 22nd, with community voting taking place from January 26th to February 1st.
- Participants can build agents compatible with the Live Agent Studio, and the n8n team has increased the prize pool by offering $700 and $300 for the best n8n agents!

Link mentioned: oTTomator: no description found

OpenRouter (Alex Atallah) ▷ #general (46 messages🔥):

OpenRouter UI Performance, Gemini Flash and API Issues, O1 Response Format, API Access Requests, Hanami Usage Experience

OpenRouter UI is lagging behind: Users expressed frustration over the OpenRouter’s UI performance, stating it hangs significantly after passing 1k lines of chat history, making scrolling and editing cumbersome.
- Suggestions included implementing sorting by cost and optimizing Next.js pagination to improve overall user experience.
Gemini Flash has distinct behavior: Concerns were raised about the Gemini Flash not working via API despite being functional in chatrooms, causing confusion among users.
- One user also highlighted their love for Gemini, but mentioned performance issues and the need for improved functionality.
O1’s response format raises eyebrows: Several users criticized the O1 API response format, which uses ==== instead of ``` for markdown, causing strange behaviors during usage.
- Discussions revolved around whether this change was intended to save tokens or improve output, with varying opinions on the implications.
API access and development inquiries: A user inquired about the possibility of releasing their own LLM API via OpenRouter, signaling interest in expanding the platform’s offerings.
- Another user reported issues with API requests and called for assistance, highlighting the need for better infrastructure support.
Hanami usage discussion: A user asked if anyone was using Hanami, an inquiry that prompted another user to share their testing results, which included unexpected characters.
- The exchange stressed a need for more robust experiences with various tools among the community.

not much happened today

Fri, Jan 10, 2025

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

AI Agent Hackathon, OpenRouter API credits, Live Agent Studio competition, Prize pool increase, Registration details

Hackathon offers OpenRouter credits and cash prizes: Participants in the ottomator.ai’s AI Agent Hackathon can claim $10 in OpenRouter API credits, with a total prize pool of $6,000 for top performers.
- Registration is open now until January 22nd, with winners announced on February 1st.
Live Agent Studio competition details: The Live Agent Studio Hackathon runs from January 8th to January 22nd at 7:00 PM CST, culminating in community voting starting January 26th.
- Winners will be livestreamed on February 1st, and participants are encouraged to build agents compatible with the studio using their tool of choice.
Increased prize pool for n8n agents: The n8n team has added to the prize pool, now totaling $6,000, with specific awards of $700 and $300 for the best n8n agents.
- Judging for these two awards will be conducted by the n8n team, adding extra incentive for participants to engage with their platform.
Important Hackathon guidelines: Participants are reminded to read the agreement and comprehensive guide to building AI agents for the Hackathon carefully.
- These resources will provide essential information and instructions prior to participation, ensuring everyone is well-prepared.

Link mentioned: oTTomator: no description found

OpenRouter (Alex Atallah) ▷ #general (46 messages🔥):

OpenRouter Performance Issues, O1 API Response Format, Gemini Flash Performance, Hanami Usage, Crypto Payments

OpenRouter’s UI performance criticized: Members discussed how OpenRouter infrastructure is great, but its UI performance is lacking, especially when handling long chat histories of over 1k lines.
- Users noted that scrolling and typing becomes nearly impossible when the chat history exceeds this limit and requested improvements in activity filtering and pagination.
Strange formatting in O1 API: Multiple users reported that the O1 API responses use ===== instead of backticks for formatting, causing confusion and dissatisfaction with its behavior.
- One user speculated this might save tokens, while others questioned the rationale behind the change.
Gemini Flash capabilities: A member shared performance metrics for Gemini Flash 1.5, noting 63,364 in requests and 7,018 out, leading to a cost of $0.000171 with impressive 255.6 tps.
- They expressed enthusiasm for Gemini, despite some performance suggestions for better user experience.
Hanami framework inquiry: A user inquired if anyone was utilizing the Hanami framework, while another noted experiencing unexpected characters during testing.
- This sparked a brief discussion about its reliability and usability among members.
Crypto payments as liberation: One user humorously remarked about transcending government constraints after making a payment with crypto, sparking congratulatory responses from others.
- This light-hearted exchange highlighted the community’s engagement with emerging payment methods.

not much happened today

Thu, Jan 9, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Model Context Protocol, Agents Base launch, Marketing automation

Introducing Model Context Protocol for Twitter: A new GitHub project, x-mcp, aims at bridging Twitter and AI with the Model Context Protocol, providing users with full control on the platform.
- For more details, visit the GitHub repository and explore how it can enhance Twitter functionality.
Agents Base Launches on Product Hunt: A new product, Agents Base, has launched on Product Hunt, designed to automate marketing with swarms of cloud agents for incredible performance.
- It claims to achieve 50-500x better CPM than traditional ad platforms and offers features for automating video repurposing and SEO optimized content. Check it out here.
Marketing Agents for Brand Growth: Agents Base enables brands to grow on autopilot using marketing agents that perform A/B testing across various demographics and formats.
- This automation framework aims to streamline marketing efforts significantly, as highlighted by the positive reception and discussion on Product Hunt.

Links mentioned:

Agents Base - Grow any brand on autopilot with swarms of marketing agents | Product Hunt: Deploy swarms of cloud marketing agents that automate A/B testing across demographics, copywriting, and viral video styles to get 50-500x better CPM than Google, Instagram, or TikTok ads. Automate rep...
GitHub - lord-dubious/x-mcp: Bridging Twitter and AI with the Model Context Protocol: Bridging Twitter and AI with the Model Context Protocol - lord-dubious/x-mcp

OpenRouter (Alex Atallah) ▷ #general (60 messages🔥🔥):

LLM Game Development, Azure Model Integration, AI Model Conversation Preferences, Bug Reports on Llama Models, API Call Timeout Issues

LLMs Struggle with Game Development: Members discussed that current LLMs lack proper world models, making it difficult to create complex games like 3D FPS titles, while simpler games are feasible with careful prompting.
- One suggestion pointed out that LLMs can produce simple games, but they require constant feedback and debugging from users to avoid getting stuck on bugs.
Integrating Azure Models with OpenRouter: A user posed a question on how to use a hosted gpt-4o model on Azure with OpenRouter, receiving suggestions to look at the available models directly on the OpenRouter platform.
- Further information was provided about checking differences between Azure-hosted models and those provided directly by OpenAI.
Preferences in LLM Conversations: Participants shared their favorite models for casual chat, with recommendations for Gemini 1206 and Flash thinking for non-coding discussions.
- While preferences varied, some criticized the Claude model for its system prompts due to perceived limitations in conversational quality.
Bug Reports in Llama Models: A user reported a potential bug in Llama models where the usage object returned all zero values for token counts, suggesting a persistent issue.
- Another participant confirmed the zero value issue has been occurring for months, signaling ongoing trouble with the model’s functionality.
Challenges with Vercel API Call Timeouts: A discussion emerged about the 10 second timeout issue occurring with Vercel when making API calls, leading users to seek solutions for overcoming this limitation.
- One member indicated that issues with registration processes were also problematic, suggesting further complications around API interactions.

Links mentioned:

Building Games With OpenAI’s o1 Model.: Let’s see what this new model can do!
Azure | OpenRouter: Browse models provided by Azure
GPT-4o (2024-11-20) - API, Providers, Stats: The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with...

not much happened today

Wed, Jan 8, 2025

OpenRouter (Alex Atallah) ▷ #general (138 messages🔥🔥):

OpenRouter Payment Issues, Model Performance Concerns, DeepSeek V3 Reliability, Using Crypto for Payments, LLM Limitations in Game Development

OpenRouter Payment Issues Persist: Users report ongoing payment problems with OpenRouter, including multiple declined transactions using virtual cards and difficulty with their payment gateway.
- One user expressed frustration with their card not supporting OR purchases anymore, suggesting a shift to crypto payments.
Concerns Over Model Performance: Users discussed the frequent crashes of Lambda’s Hermes 405b, noting the status indicator still shows green despite issues.
- There were also mentions of perceived slow performance from DeepSeek V3, which some users attributed to high demand.
DeepSeek V3 Reliability Issues: DeepSeek V3 is experiencing reliability concerns, especially under high input conditions, affecting functionality across platforms.
- A user pointed out that the issue seems to be prevalent on both DeepSeek and OpenRouter APIs.
Exploring Crypto Payments: Several users discussed the feasibility of using crypto for payments in place of traditional methods, highlighting its advantages in certain regions.
- Trust Wallet and other providers were suggested as potential options for users in the Philippines struggling with payment issues.
Limitations of LLMs in Game Creation: Users explored the limitations of current LLMs, like O3 and GPT-5, in creating more complex game designs compared to simpler 2D games.
- There was a consensus that while simpler games could potentially be generated, more complex designs remain challenging due to organizational difficulties.

Links mentioned:

Home: aider is AI pair programming in your terminal
API Request loading indefinitely, not completing. · Issue #1157 · cline/cline: What happened? API Request starts loading indefinitely, never completing. I'm using Deepseek v3. It was working totally fine for some 2 hours, then suddenly this started happening, in any chat win...
feat: Support API keys for VertexAI mode by copybara-service[bot] · Pull Request #84 · googleapis/python-genai: feat: Support API keys for VertexAI mode

PRIME: Process Reinforcement through Implicit Rewards

Tue, Jan 7, 2025

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

llmcord, Nail Art Generator

llmcord transforms Discord into LLM frontend: The llmcord project has gained over 400 GitHub stars for making Discord a versatile LLM frontend compatible with multiple APIs like OpenRouter, Mistral, and more.
- It emphasizes easy integration, allowing users to access various AI models directly within their Discord setup.
Nail Art Generation via AI Magic: A new Nail Art Generator utilizes text inputs and inspiration images to create unique nail designs, powered by OpenRouter and Together AI.
- Users can upload up to 3 images to generate tailored nail art, enhancing creative expression in nail design.

Links mentioned:

Nail Art Inspo generator: no description found
GitHub - jakobdylanc/llmcord: Make Discord your LLM frontend ● Supports any OpenAI compatible API (Ollama, LM Studio, vLLM, OpenRouter, xAI, Mistral, Groq and more): Make Discord your LLM frontend ● Supports any OpenAI compatible API (Ollama, LM Studio, vLLM, OpenRouter, xAI, Mistral, Groq and more) - jakobdylanc/llmcord

OpenRouter (Alex Atallah) ▷ #general (173 messages🔥🔥):

Gemini Flash models, DeepSeek performance issues, OpenRouter usage queries, Structured output support, O1 model accessibility

Recommendations for Gemini Flash models: Members discussed whether to recommend the Gemini Flash 1.5 model or other newer versions, with suggestions leaning towards trying the 8b version first.
- Hermes users noted the pricing structure through OpenRouter is competitive compared to AI Studio, which charges more for larger token usage.
DeepSeek experiencing downtime: Several members reported DeepSeek being down, likely during an upgrade, causing slower response times and freezing issues for inputs beyond 8k tokens.
- Comments suggested that increased demand may result in scaling problems, causing the performance dip.
OpenRouter API usage confusion: Users frequently inquired about OpenRouter’s API, notably experiencing issues with latency and request limits, particularly when trying different providers.
- The conversation highlighted that using DeepSeek directly could improve experience compared to using through OpenRouter, given the ongoing limits and response times.
Challenges with Structured Output: A member questioned the lack of structured output support across most providers when using the meta-llama model, noting that only Fireworks supports it.
- It was suggested that re-evaluating the model’s testing might help clarify these discrepancies.
Questions about O1 Model and Credits: Conversations around the O1 model indicated confusion regarding its status, with mentions of it being ‘dead’ or limited typically to BYOK due to associated costs.
- An inquiry was raised regarding displaying credit usage, indicating some users were unable to find the graph feature previously available on the activities page.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
DeepSeek V3 - API, Providers, Stats: DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported ev...
SmallThinker Demo - a Hugging Face Space by PowerInfer: no description found
cognitivecomputations/Dolphin3.0-Llama3.1-8B · Hugging Face: no description found
Gemini Flash 1.5 - API, Providers, Stats: Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video...
Provider Routing | OpenRouter: Route requests across multiple providers
Typhon0130 - Overview: Imagine how you want to feel at the end of the day. Start working towards that now. - Typhon0130
New item by Matthieu Alirol: no description found
New item by Matthieu Alirol: no description found

not much happened today

Sat, Jan 4, 2025

OpenRouter (Alex Atallah) ▷ #general (86 messages🔥🔥):

OpenRouter authentication issues, DeepSeek performance, Model recommendations for structured output, Janitor AI integration with OpenRouter, Payment processing issues

OpenRouter authentication issues: Several users reported issues with using OpenRouter on n8n, with messages indicating ‘Unauthorized’ errors despite having credits loaded.
- Matt070655 mentioned changing the HTTPS address and adding the API key but still encountered connection refusals.
DeepSeek performance woes: Users expressed frustration over DeepSeek’s slow performance, with one noting a low 0.6 TPS.
- Concerns were raised that the current demand might not have been adequately predicted, leading to degraded experiences.
Model recommendations for structured output: A user sought alternatives to gpt-4-mini for structured JSON output, finding options limited.
- Others suggested models like Gemini Flash, with discussions about version effectiveness and anticipated rate limitations.
Janitor AI integration with OpenRouter: Assistance was provided on how to set up OpenRouter with Janitor AI, emphasizing adjustments in settings for URL and API compatibility.
- Instructions highlighted toggling options in the advanced settings for better integration.
Payment processing issues: Users reported difficulties in processing payments on OpenRouter, specifically with certain credit cards failing while others worked.
- One user noted a capital one card consistently failed, while an alternative card successfully processed the payment.

Links mentioned:

Hyperbolic AI Dashboard: no description found
LiteLLM: LiteLLM handles loadbalancing, fallbacks and spend tracking across 100+ LLMs. all in the OpenAI format
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
DeepSeek Service Status: no description found
GitHub - lm-sys/RouteLLM: A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!: A framework for serving and evaluating LLM routers - save LLM costs without compromising quality! - lm-sys/RouteLLM
GitHub - BerriAI/litellm: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq] - BerriAI/litellm

Dec 31, 2024 not much happened to end the year

Wed, Jan 1, 2025

OpenRouter (Alex Atallah) ▷ #general (71 messages🔥🔥):

OpenRouter Model Additions, DeepSeek v3 Performance, Gemini 2.0 Limitations, Sonnet Comparison, Self-Moderated Chat Models

Adding Models to OpenRouter: A user inquired about the feasibility of adding their model to OpenRouter, speculating it may be limited to companies with significant funding.
- Others suggested starting a personal provider to host the model, emphasizing that it’s worth a try regardless of initial hurdles.
DeepSeek v3 Outperforms Others: Several users praised DeepSeek v3 for its performance, particularly citing its stability in credit usage over time compared to models like Claude.
- Discussions highlighted its appeal compared to more expensive models, with some claiming it’s effective for certain tasks despite limitations.
Limitations of Gemini 2.0: A user pointed out the challenges of using Gemini 2.0 Flash, particularly for NSFW image captioning, making it seem unusable on OpenRouter.
- Concerns were raised about its performance and context limits, especially when dealing with complex images.
Sonnet vs. DeepSeek Comparison: Users discussed the disparity between Sonnet and DeepSeek in terms of instruction-following and complex queries, with some participants favoring Sonnet’s capabilities.
- Critics of DeepSeek noted it doesn’t measure up for sophisticated programming tasks, despite its more favorable pricing.
Understanding Self-Moderated Models: A user questioned the concept of self-moderation in models, leading to discussions about how refusal messages work when terms of service are violated.
- Clarifications emphasized that both moderated and non-moderated versions of chat models are governed by their respective providers’ terms.

Links mentioned:

Building effective agents: A post for developers with advice and workflows for building effective AI agents
Not Diamond: Not Diamond is the world's most powerful AI model router.

Dec 31, 2024 not much happened today

Tue, Dec 31, 2024

OpenRouter (Alex Atallah) ▷ #general (249 messages🔥🔥):

DeepSeek V3 performance issues, OpenRouter model integration, Translation model recommendations, Building multimodal agents, LLM pricing and feature comparisons

DeepSeek V3 performance issues: Users have reported that DeepSeek V3 performs noticeably worse on OpenRouter compared to its official API, with speculation that the Together API may be involved.
- Responses indicate that changes or downgrades in performance can lead to user complaints, and some believe it’s indicative of a new version being released.
OpenRouter model integration: Integrating new models into OpenRouter requires providers with sufficient interest, and users can either partner with established AI labs or start their own provider.
- Valuing niche LLM capabilities like coding can position a model favorably if marketed and developed appropriately.
Translation model recommendations: Discussion highlighted that GPT-4o mini is preferred for translations, while Gemini 1.5 Flash was noted for making frequent errors.
- Users suggested specific system prompts to enhance performance for translation tasks, emphasizing the importance of structure.
Building multimodal agents: Although having models output JSON simplifies agent operations, it’s not strictly necessary for running agents effectively.
- Users discussed their interests in frameworks for multimodal agents, with mentions of Google’s Project Mariner as an interesting example.
LLM pricing and feature comparisons: Discussions about LLM pricing revealed a lack of cached input token discounts via OpenRouter, with distinctions between various pricing strategies.
- While some users expressed concerns about perceived downgrades in model performance, others emphasized the need for clear communication and evidence regarding model capabilities.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Building effective agents: A post for developers with advice and workflows for building effective AI agents
Prompt Caching | OpenRouter: Optimize LLM cost by up to 90%
DeepSeek V3 - API, Providers, Stats: DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported ev...
LLM Rankings: translation | OpenRouter: Language models ranked and analyzed by usage for translation prompts
Create FIM Completion (Beta) | DeepSeek API Docs: The FIM (Fill-In-the-Middle) Completion API.
Reddit - Dive into anything: no description found
Fireworks - Fastest Inference for Generative AI: Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!

Dec 28, 2024 not much happened today

Sat, Dec 28, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Deepseek v3, OpenRouter usage, Model comparisons, Cost of frontier models

Deepseek v3 Usage Triples: After launching Deepseek v3, usage on OpenRouter has tripled since yesterday, as noted in a tweet by @OpenRouterAI.
- Deepseek v3 seems to be a genuinely good model, according to community feedback.
Deepseek v3 Compares Favorably with Major Models: Benchmarks for Deepseek v3 show results comparable to Sonnet and GPT-4o, but at a much lower price.
- This opens up opportunities for more users to access advanced models without breaking the bank.
China and Open Source Catch Up: An industry expert commented that China has caught up and open source has matched the capabilities of leading AIs, with frontier models costing about $6M.
- They anticipate that Deepseek v3 will excel on OpenRouter in the upcoming days.
Expectations for Model Performance: The community anticipates that Deepseek v3 will offer competitive performance and likely outperform its predecessors on OpenRouter.
- There is a sentiment that lots of priors should be updated concerning AI capabilities and costs.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Deepseek has tripled in usage on OpenRouter since the v3 launch yesterday.Try it yourself, w/o subscription, including web search:Quoting Anjney Midha 🇺🇸 (@AnjneyMidha) Deepseek v3 seems to be a gen…

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

AI Chat Terminal (ACT), Content Identification/Moderation System (CIMS), Google Search for Grounding, RockDev Tool

AI Chat Terminal Transforms Developer Experience: The AI Chat Terminal (ACT) integrates with OpenAI, Anthropic, and OpenRouter, allowing developers to execute tasks and chat with their codebases for instant assistance.
- Check it out on GitHub and start enhancing your terminal experience today!
CIMS Enhances Community Safety: The new Content Identification/Moderation System (CIMS) for Companion boosts its ability to automatically detect and manage harmful content, fostering a safer environment.
- Learn more about this feature on their GitHub repository.
Demo of Message Flagging in CIMS: Example images demonstrate how messages can be flagged or deleted using the CIMS, adding clarity on its moderation capabilities.
- Two screenshots exemplify the system in action, showcasing its user-friendly design.
RockDev Tool Aims for Privacy-Focused SQL Generation: The RockDev.tool provides an open-source SQL generation tool that uses Open Router as a gateway, facilitating automatic schema creation from code definitions.
- Developers can generate SQL with ease and store chat history locally in the browser, ensuring privacy; feedback is welcomed!
Google Search Grounding for AI Responses: A developer showcased a demo using the Google GenAI SDK for grounding AI responses before web search capabilities are released, highlighting the importance of setup for access.
- This tool leverages Google search capabilities and is available for exploration on GitHub.

Links mentioned:

Transform Your Code into SQL Effortlessly - AI SQL Generator: Convert your code into optimized SQL queries with our AI-powered tool. Start generating today!
GitHub - nlawz/or-google-search: openrouter with google search for grounding: openrouter with google search for grounding. Contribute to nlawz/or-google-search development by creating an account on GitHub.
Home: An AI-powered Discord bot blending playful conversation with smart moderation tools, adding charm and order to your server. - rapmd73/Companion
GitHub - Eplisium/ai-chat-terminal: Terminal Script for OpenAI and OpenRouter API models. Let's make this a functional performing script.: Terminal Script for OpenAI and OpenRouter API models. Let's make this a functional performing script. - Eplisium/ai-chat-terminal

OpenRouter (Alex Atallah) ▷ #general (277 messages🔥🔥):

DeepSeek V3 Performance, Model Comparisons, Tool Calling in AI Models, OCR Support in AI Tools, Open Weight Models

DeepSeek V3 Experiences: Users reported experiencing slow response times with DeepSeek V3, particularly during high traffic periods, leading to timeouts.
- Despite these issues, many users find DeepSeek V3 to provide satisfactory results, especially for translation tasks.
Comparing AI Models: DeepSeek V3 and Claude 3.5 Sonnet were discussed, highlighting that while both models are strong, some believe Claude maintains an edge in creative tasks.
- Participants noted the exceptional price of DeepSeek, suggesting its current pricing might be temporary to attract users.
Recommendations for Tool Calling: For tool calling, GPT-4o and Claude 3.5 Sonnet were recommended as reliable options, while Llama 3.1-70b was noted for inconsistent performance.
- Users expressed interest in Nous Hermes 3-70b, with some believing it could be a competitive option worth trying.
OCR Support Updates: It was noted that Fireworks introduced OCR support for images and PDFs, expanding options for document processing.
- Pixtral was mentioned as another tool capable of handling OCR tasks effectively, with specific usage scenarios discussed.
Understanding OWM and OOD: The terms Open Weight Model (OWM) and Out of Domain (OOD) tasks were clarified, focusing on models that can handle unexpected or creative tasks.
- Discussions highlighted the tendency for models to excel in specific tasks but struggle outside their training data, particularly in creative writing.

Links mentioned:

DeepSeek V3 - API, Providers, Stats: DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported ev...
DeepSeek | OpenRouter: Browse models from DeepSeek
DeepSeek V3 - API, Providers, Stats: DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported ev...
deepseek-ai/DeepSeek-V3 · Hugging Face: no description found
GitHub - nlawz/or-google-search: openrouter with google search for grounding: openrouter with google search for grounding. Contribute to nlawz/or-google-search development by creating an account on GitHub.
every-chatgpt-gui/README.md at main · billmei/every-chatgpt-gui: Every front-end GUI client for ChatGPT, Claude, and other LLMs - billmei/every-chatgpt-gui
Fireworks - Fastest Inference for Generative AI: Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks AI!

Dec 27, 2024 DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens

Fri, Dec 27, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Web Search for LLMs, Price Cuts on Models, New Endpoints API, Deepseek v3 Launch

Web Search for Any LLM Debuts: The Web Search feature has launched for any language model on OpenRouter Chatroom, making it easier to obtain up-to-date information. A live demo is available at this link.
- API access will be introduced later, and the feature is free for the time being.
Price Cuts Across Models: Significant price reductions have been implemented for several models, including qwen-2.5 which saw a 12% decrease, and hermes-3-llama-3.1-70b with a 31% cut.
- The detailed pricing updates highlight a range of models now available at lower costs.
New Endpoints API in Beta: A new Endpoints API is now in beta, allowing users to access model details and available endpoints during a undocumented preview. This could change before the official documentation is released.
- An example of usage can be found at the API link.
Deepseek v3 Sees Tripled Usage: Since the launch of Deepseek v3, its usage on OpenRouter has tripled with benchmarks showing competitive performance against Sonnet and GPT-4o at a lower price point. Interested users can try it without a subscription at this link.
- Notable comments emphasize that the model is viewed as a strong contender and that China has caught up in the AI space.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Deepseek has tripled in usage on OpenRouter since the v3 launch yesterday.Try it yourself, w/o subscription, including web search:Quoting Anjney Midha 🇺🇸 (@AnjneyMidha) Deepseek v3 seems to be a gen...
Tweet from OpenRouter (@OpenRouterAI): Holiday 🎁 experiment: Web Search, but for any LLM!Here's Sonnet with & without grounding:

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

3D Game Generation Tool, AI Chat Terminal (ACT)

Generate 3D Games with Simple Words: A new tool allows users to create a 3D game by simply describing it in words, addressing previous limitations faced with models like GPT-3/4 and Claude.
- The tool’s abilities have improved significantly with o-1 and o-1 preview, promising potential for full voxel engine support to render complex shapes.
Transform Your Terminal with AI Chat Terminal: Introducing the AI Chat Terminal (ACT) that merges agent features and codebase chatting, streamlining interactions with AI models like OpenAI and Anthropic.
- Key features include an Agent Mode for executing tasks and a multi-provider support for switching between different models efficiently. Try it now!

Links mentioned:

AI-Generated 3D Platform Game: no description found
GitHub - Eplisium/ai-chat-terminal: Terminal Script for OpenAI and OpenRouter API models. Let's make this a functional performing script.: Terminal Script for OpenAI and OpenRouter API models. Let's make this a functional performing script. - Eplisium/ai-chat-terminal

OpenRouter (Alex Atallah) ▷ #general (301 messages🔥🔥):

DeepSeek V3 Feedback, OpenRouter Chat Performance, DeepSeek Pricing, API Limitations, Model Comparisons

DeepSeek V3 Reviews are Mixed: Users are discussing the performance of DeepSeek V3, noting that it seems to perform comparably to previous versions with slight improvements for specific tasks like coding.
- One user shared a poem generated by the models, highlighting the creativity of each, though some felt the output quality varied.
OpenRouter Chat UI Experiences: Feedback was given regarding the OpenRouter chat UI, with reports of lag and performance issues when handling an extensive chat history.
- Users expressed a desire for quicker responses as the current interface becomes unwieldy with larger data sets.
Pricing and Access to Models: Discussion around pricing for models included concerns over the costs of O1 Pro and hopeful considerations for alternatives via OpenRouter.
- Users want to avoid high monthly fees, particularly as new models like O3 are rumored to carry substantial price tags.
Batching Requests in APIs: Conversation regarding how batching requests works focused on scheduling multiple requests for processing when GPUs are idle.
- Users noted that batching is not supported directly via OpenRouter API and emphasized the importance of request prioritization.
Token Limitations and Model Access: Users raising concerns about access errors, like ‘no endpoints found matching your data policy,’ discovered misconfigured settings as the source.
- The discussion highlighted the need for clear communication on API settings to improve user experience.

Links mentioned:

Limits | OpenRouter: Set limits on model usage
Tweet from Ruben Kostandyan (@ruben_kostard): @paulgauthier Yes, I see it on the API as well: https://x.com/ruben_kostard/status/1871939691794350161Quoting Ruben Kostandyan (@ruben_kostard) You can verify the @deepseek_ai model being V3 on the A...
DeepSeek | OpenRouter: Browse models from DeepSeek
good luck have fun: no description found
High Bandwidth Memory - Wikipedia: no description found
Deepseek V3 - API, Providers, Stats: DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported ev...
Deepseek V3 - API, Providers, Stats: DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported ev...
Llama 3.3 | Model Cards and Prompt formats: .
deepseek-ai/DeepSeek-V3 · Hugging Face: no description found
Training Large Language Models to Reason in a Continuous Latent Space: no description found
every-chatgpt-gui/README.md at main · billmei/every-chatgpt-gui: Every front-end GUI client for ChatGPT, Claude, and other LLMs - billmei/every-chatgpt-gui
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces: no description found

Dec 25, 2024 not much happened today

Wed, Dec 25, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Web Search for LLMs, Price cuts for various models, New Endpoints API

Web Search: Your LLM Info Companion: A holiday launch introduces Web Search for any language model on the OpenRouter Chatroom, making info retrieval easier. Check out the live demo here as API access is promised for the future.
- This free tool aims to enhance up-to-date information quests during the festive season.
Price Cuts That Pack a Punch!: Various models announced significant price reductions, including qwen/qwen-2.5 coders at -12% and nousresearch/hermes-3 at -11%. The meta-llama models saw as much as a -31% cut, making them more accessible than ever.
- This seasonal discount is intended to boost the engagement and usage of advanced models across the community.
New API Endpoints Unwrapped: A beta version of the new Endpoints API is now available, providing model details and endpoints for users to explore. This preview is undocumented but promises to bring future enhancements once the official version is released.
- An example of using the new API can be seen at this link, showcasing the potential for expanded developer capabilities.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Holiday 🎁 experiment: Web Search, but for any LLM!Here’s Sonnet with & without grounding:

OpenRouter (Alex Atallah) ▷ #general (128 messages🔥🔥):

SambaNova Model Parameters, Tier 5 API Key Requests, Web Search Feature in Chat, Qwen Model Performances, Claude 3.5 Comparison

SambaNova Model Parameters not functioning: Users noticed that basic parameters such as temperature and top_p are not working as expected with SambaNova models, with defaults seemingly applied.
- This situation has led to discussions about performance inconsistencies, including a member recalling issues with system prompts.
Requesting a Tier 5 API Key: A user inquired about how to obtain a Tier 5 API key, prompting a response that it requires a payment of $1,000 to OpenAI.
- Detailed information about the usage tiers can be found in the OpenAI documentation.
New Web Search Feature in Chat: A new web search feature has been enabled in the chat, allowing prompts to conduct searches automatically within the system context.
- While currently not available via API, more feedback will dictate future enhancements, particularly in configuration options to manage token costs.
Qwen Model Performance Analysis: Users have reported mixed experiences when comparing Qwen models, particularly the QVQ-72B with Llama 3.3 and Phi-4 in terms of instruction following and performance.
- Performance varied across tasks, with subjective assessments indicating differences in ability to tackle math and geometry problems effectively.
Claude 3.5 Model Specifications: Members noted that the Claude 3.5 beta and Claude 3.5 are essentially the same model, where the beta version operates as a self-moderated endpoint.
- Clarifications were made about performance consistency and capabilities among different models, addressing user queries about coding efficiency in niche languages.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): just launched a 🎅 feature, can you find it?Hint:🌐
Tweet from Pallav Agarwal (@pallavmac): The latest Pal Chat update brings full @OpenRouterAI support with the ability to quickly switch between OpenRouter models and use your own API Key. Makes it kind of like the first native OpenRouter iO...
Qwen/QVQ-72B-Preview · Hugging Face: no description found
good luck have fun: no description found
Llama 3.3 | Model Cards and Prompt formats: .
Training Large Language Models to Reason in a Continuous Latent Space: no description found
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces: no description found

Dec 24, 2024 not much happened this weekend

Tue, Dec 24, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Crypto Payments API, On-chain payments for LLMs, Funding agent intelligence

Launch of the Crypto Payments API: OpenRouter introduced the Crypto Payments API, enabling on-chain payments for any LLM and facilitating headless transaction scripting.
- This feature supports ETH, @0xPolygon, and @Base, and is powered by @CoinbaseDev. You can find more details and a tutorial here.
Making self-funding agents possible: The API allows developers to create agents capable of self-funding their intelligence, marking a significant milestone in agent automation.
- This innovation opens new frontiers for AI functionalities and autonomous financial interactions.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Introducing the Crypto Payment API: the first way to script on-chain payments for any LLM 💸Want to make one of the first agents that can fund its own intelligence?Works with ETH, @0xPolygon, & @Base,…

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

Tool Calling Capabilities, Structured Outputs Playground, PKCE Authentication Key Storage

Testing Tool Calling Features with PDFs: A member tested the tool calling capabilities with different models using a feature called searchDocuments that queries uploaded PDFs for contextual generation, leveraging the Vercel AI SDK and Pinecone for embedding storage.
- Their GitHub repository documents how to utilize OpenRouter with vector databases.
Explore Structured Outputs Playground: A member shared a link to a playground for testing different schemas with structured outputs, indicating that it’s a recent release from OpenRouter, enhancing user experimentation.
- Users can examine how various models handle these schemas at OpenRouter Structured.
Discussion on PKCE Authentication Key Storage: A member raised a query about whether API keys from PKCE should be stored in localStorage or encrypted HttpOnly cookies, noting that the responses were inconclusive but somewhat favored the latter.
- After implementing both methods in a demo app, they published a blog post detailing the advantages and pitfalls of each approach, concluding that the cookie method could be worthwhile despite its challenges.

Links mentioned:

OpenRouter Structured Outputs: no description found
GitHub - nlawz/openrouter-pinecone: Using openrouter with vector db from pinecone: Using openrouter with vector db from pinecone. Contribute to nlawz/openrouter-pinecone development by creating an account on GitHub.

OpenRouter (Alex Atallah) ▷ #general (241 messages🔥🔥):

OpenRouter features, Model comparisons, API issues, User experiences, Model performance

OpenRouter API Key Requirements: To request a Tier 5 API key, users need to pay $1,000 to OpenAI, according to the outlined guidelines.
- Details are available on the OpenAI usage tiers documentation.
User Feedback on Model Performance: Users have reported issues with SambaNova models, mentioning that basic parameters like temperature and top_p seem ineffective and default settings are applied.
- One user also highlighted slow response times and potential context issues when interacting with characters in the Wizard model.
Issues with API Access: Multiple users are encountering errors, including a 403 error when trying to access OpenAI’s O1 via OpenRouter and receiving 401 errors with specific libraries.
- Users are encouraged to create detailed threads about their issues, including relevant model and provider details for better assistance.
Comparative Model Analysis: GPT-4 Turbo was tested against other models, and while it shows strong performance and substance, some users noted that its style might be too dry for certain applications.
- Discussions suggest that while GPT-4 Turbo is overall better, it’s important to consider specific use cases when comparing it to models like GPT-4o.
New Pal Chat Update: The latest update of Pal Chat has integrated full support for OpenRouter, allowing users to switch between models and utilize their own API keys.
- This enhances user experience by making the app closely resemble the first native OpenRouter iOS app.

Links mentioned:

Red herring - Wikipedia: no description found
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
Tweet from Pallav Agarwal (@pallavmac): The latest Pal Chat update brings full @OpenRouterAI support with the ability to quickly switch between OpenRouter models and use your own API Key. Makes it kind of like the first native OpenRouter iO...
Mag Mell R1 12B - API, Providers, Stats: Mag Mell is a merge of pre-trained language models created using mergekit, based on [Mistral Nemo](/mistralai/mistral-nemo). It is a great roleplay and storytelling model which combines the best parts...
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Models | OpenRouter: Browse models on OpenRouter
Requests | OpenRouter: Handle incoming and outgoing requests
- YouTube: no description found
- YouTube: no description found
- YouTube: no description found
Entscheidungsproblem - Wikipedia: no description found
- YouTube: no description found
LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps

Dec 21, 2024 o3 solves AIME, GPQA, Codeforces, makes 11 years of progress in ARC-AGI and 25% in FrontierMath

Sat, Dec 21, 2024

OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

Gemini 2.0 Flash Thinking Experimental, Timeout Logic Change and Reversion, BYOK (Bring Your Own API Keys), o1 Model Changes, Crypto Payments API

Gemini 2.0 Flash Thinking Model Launch: Google’s new thinking model, Gemini 2.0 Flash Thinking, is now live, allowing it to output thinking tokens directly into its text content stream. Users can try it on OpenRouter.
- The model ‘google/gemini-2.0-flash-thinking-exp’ is currently unavailable, and users are directed to request access via the Discord.
Timeout Logic Issue Resolved: A temporary change in timeout logic affected a subset of users, but the issue has been resolved and everything is back to normal. The team has apologized for the inconvenience and plans to enhance automated testing for timeouts.
- Users were only impacted for 30 minutes, and measures will be taken to avoid similar situations in the future.
Launch of BYOK - Bring Your Own API Keys: BYOK empowers users to leverage their own API keys and credits from major providers, enhancing throughput with combined rate limits. This new feature offers access to unified analytics and works with third-party credits from platforms like OpenAI and Google Cloud.
- Users can manage their integration through Settings and utilize this service for just 5% of their upstream provider’s cost.
o1 Model Going BYOK-Only Temporarily: OpenAI’s o1 model will be BYOK-only until the new year, with the o1-preview and o1-mini remaining unaffected. Users with Tier 5 OpenAI keys can still access the o1 model through their BYOK settings.
- The team is working closely with OpenAI to improve access, as this limitation is against OpenRouter’s principles of broad access.
Introduction of the Crypto Payments API: A new Crypto Payments API allows for headless, on-chain payments for any LLM, marking a significant development in autonomous funding. This feature supports payments via ETH, 0xPolygon, and Base, powered by Coinbase.
- More details and a tutorial can be found in the announcement on OpenRouter’s status.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Tweet from OpenRouter (@OpenRouterAI): Introducing the Crypto Payment API: the first way to script on-chain payments for any LLM 💸Want to make one of the first agents that can fund its own intelligence?Works with ETH, @0xPolygon, & @Base,...
Tweet from OpenRouter (@OpenRouterAI): Two big new features today!#1: BYOK, Bring Your Own API KeysWe're excited to announce BYOK, giving you the best possible uptime:🚀 aggregate our rate limits with yours!💰 use 3rd party credits fro...

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

AI To-Do List, Open Router integration, 5-Minute Rule

AI To-Do List powered by Open Router: An engaging AI To-Do List concept was shared, built using Open Router, which can process tasks using context like code or spreadsheets.
- The idea plays on the 5-Minute Rule, starting to work in seconds, and aims to trigger agents to complete tasks automatically, highlighting how fun work can be.
Functionality of the To-Do List: The list can be utilized not only to manage tasks but also to create new tasks, creating a recursive efficiency.
- A user remarked, “It’s actually fun to do work,” emphasizing the playful aspect of this approach.

Link mentioned: Todo Lists: no description found

OpenRouter (Alex Atallah) ▷ #general (170 messages🔥🔥):

OpenRouter Payment Policies, AGI Discussions, Model Releases and Features, Cloud Service Utilization, User Experience with APIs

OpenRouter’s Payment Structure Explained: Users discussed the complexities of using their own keys with OpenRouter, noting a 5% fee on provider costs, causing confusion around how that interacts with usage and credits. An example was requested to clarify this structure for better understanding.
- The documentation will be updated to clarify that usage fees depend on the rate from the upstream provider plus the additional fee from OpenRouter.
Insights on AGI from Community Perspectives: Debate arose around whether AGI advancements are merely a ‘red herring’, with one user noting that higher compute power isn’t equivalent to genuine AGI. Others countered that recent developments show significant performance leaps, suggesting logical progression towards AGI.
- Users were directed to a 1.5-hour discussion video for deeper insight into these claims, indicating a divide in beliefs about the implications of rapid AI advancements.
Upcoming Model Releases from OpenAI: The upcoming releases of o3-mini and regular o3 were mentioned, suggesting a timeline for potential new features in AI models. The naming conventions around these models were humorously noted due to conflicts with existing company names.
- Community members expressed surprise at the rapid pace of technological evolution, underscoring the significant improvements seen recently.
User Experiences with Cloud Services: Conversations highlighted the frustrations users have with cloud service support, particularly from Google, comparing it unfavorably to OpenRouter’s integration solutions. One user suggested that OpenRouter simplifies user experiences by handling complexities around service availability and limitations.
- A call for transparency in terms of profits margins was made, emphasizing the necessity for OpenRouter to remain profitable while providing solid service.
Community Engagement on Resource Utilization: Members discussed their experiences with various APIs, seeking clarity on implementation details, especially around model calling and resource usage. The conversation highlighted specific user integration with the mcp-bridge.
- Confusion was noted regarding the provider rate structures, prompting suggestions for clearer documentation and user support.

Links mentioned:

François Chollet (@fchollet.bsky.social): It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive, but it's not just bru...
Red herring - Wikipedia: no description found
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Requests | OpenRouter: Handle incoming and outgoing requests
- YouTube: no description found
- YouTube: no description found
o1 - API, Providers, Stats: The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using ...

Dec 20, 2024 ModernBert: small new Retriever/Classifier workhorse, 8k context, 2T tokens,

Fri, Dec 20, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Price reductions, Market competition

Gryphe Cuts Price by 7%: The price of gryphe/mythomax-l2-13b has dropped by 7% this morning, continuing the trend of price reductions in the market.
- This is part of ongoing price wars in the competitive landscape of AI models.
Qwen Slashes Prices by 7.7%: Another significant 7.7% drop occurred on qwen/qwq-32b-preview as the price wars heat up.
- These adjustments reflect the fierce competition among leading AI providers.
Mistral-Nemo Takes a 12.5% Hit: mistralai/mistral-nemo has seen a 12.5% price cut, indicating a proactive pricing strategy.
- This reflects the intensifying market dynamics, as companies vie for customer attention and market share.

Links mentioned:

MythoMax 13B - API, Providers, Stats: One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge. Run MythoMax 13B with API
QwQ 32B Preview - API, Providers, Stats: QwQ-32B-Preview is an experimental research model focused on AI reasoning capabilities developed by the Qwen Team. As a preview release, it demonstrates promising analytical abilities while having sev...
Mistral Nemo - API, Providers, Stats: A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chines...

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

AI Ecosystem Maps, Crowdsourced AI Enablement Stack

Need for a Crowdsourced AI Enablement Stack: Many VC firms have published their AI ecosystem maps, but there is a demand for a truly crowdsourced and open-source AI enablement stack.
- This initiative aims to keep developers informed on what tools to use, ensuring they won’t waste time in their projects. More details can be found on GitHub.
Feedback Request on AI Enablement Logic: There is an open call for contributions and feedback on the logic and structure of this AI enablement approach.
- The goal is to create an up-to-date resource for developers, encouraging community input and collaboration.

Link mentioned: GitHub - daytonaio/ai-enablement-stack: A Community-Driven Mapping of AI Development Tools: A Community-Driven Mapping of AI Development Tools - daytonaio/ai-enablement-stack

OpenRouter (Alex Atallah) ▷ #general (62 messages🔥🔥):

DeepSeek Models, OpenRouter Issues, Model and API Discussion, Data Management, User Experience Feedback

Exploration of DeepSeek Models for Learning: Users are experimenting with DeepSeek-v2 and DeepSeek V2.5 for coding assistance, emphasizing the benefit of inputting entire GitHub repos for better understanding of complex projects.
- One user mentioned how DeepSeek helped with code optimization and commenting, while another warned against using it for advanced code creation.
OpenRouter User Support Challenges: Several users reported issues with OpenRouter, including unexpected account problems and unclear responses from support regarding missing balances.
- User frustrations were evident as one sought clarity on their balance disappearing, highlighting the need for improved communication from support.
API and Model Capability Discussions: There were questions about the o1 reasoning_effort parameter’s accessibility, indicating users’ interest in model capabilities and interfaces.
- Users also discussed the utility of different models and the importance of privacy settings for sensitive tasks, especially regarding healthcare data.
User Experiences with OpenRouter Features: Participants shared their perspectives on the interface and its suitability for various uses, with some suggestions for improvements in user navigation.
- There was a discussion about interface tagging, clarity, and the need for a more streamlined user experience in AI applications.
Community Interaction and Humor: Members participated in light-hearted banter and joke discussions about user bios and the silliness of online personas.
- The community seemed supportive, with users engaging in fun commentary alongside serious inquiries about the platform.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OAuth PKCE | OpenRouter: Secure user authentication via OAuth

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

Programmatic feature requests, Provider API integration

Request for Programmatic Feature Implementation: A member expressed interest in seeing a programmatic version of a specific feature, emphasizing the ability to pass the provider API key with the request.
- I’d love to see a programmatic version of this feature highlights the desire for increased functionality in API integration.
Interest in API Key Functionality: The same member reinforced the need for passing the provider API key with requests implicitly, to streamline access and improve user experience.
- This indicates a broader interest in API features that cater to developers’ needs for flexibility and efficiency.

Dec 19, 2024 Genesis: Generative Physics Engine for Robotics (o1-mini version)

Thu, Dec 19, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

OpenAI o1 model, Structured outputs, EVA Llama model, Price reductions, Provider pages

OpenAI o1 model launches with cool features: The new OpenAI o1 model is live, succeeding the o1-preview with features like function calling and reduced latency.
- It introduces a new reasoning_effort API parameter for controlling the model’s thinking time before answering, enhancing user interactivity.
Structured outputs gain traction: OpenRouter now normalizes structured outputs for 46 models across 8 different companies, making it easier to get results in a preferred format.
- A tutorial on this finesse was shared here, highlighting its relevance in practical usage.
New storytelling model EVA Llama joins the lineup: A new roleplay and storytelling model, EVA Llama, has been launched along with updates for Grok 2 and Cohere models.
- Users can explore EVA Llama details in more depth via this link.
Exciting price drops on popular models: A 12.5% reduction has been implemented for the mythomax-l2-13b model, making it more accessible.
- In addition, there’s a whopping 55% price drop for the sought-after QwQ reasoning model, impressing the community with affordability.
Provider pages offer insightful analytics: Users can now click on provider names to view model hosting charts, enhancing transparency about performance over time.
- An example was noted with DeepInfra’s provider page, providing detailed insights.

Links mentioned:

o1-preview - API, Providers, Stats: The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding.The o1 models are optimized for math, science, programming, and other STEM-related tasks...
Tweet from OpenRouter (@OpenRouterAI): Structured outputs are very underrated. It's often much easier to constrain LLM outputs to a JSON schema than asking for a tool call.OpenRouter now normalizes structured outputs for- 46 models- 8 ...
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
EVA Llama 3.33 70b - API, Providers, Stats: EVA Llama 3.33 70b is a roleplay and storywriting specialist model. Run EVA Llama 3.33 70b with API
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
MythoMax 13B - API, Providers, Stats: One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge. Run MythoMax 13B with API
QwQ 32B Preview - API, Providers, Stats: QwQ-32B-Preview is an experimental research model focused on AI reasoning capabilities developed by the Qwen Team. As a preview release, it demonstrates promising analytical abilities while having sev...
Tweet from OpenRouter (@OpenRouterAI): OpenAI o1 is now live for all! Try its 🧠 on:- image inputs- structured outputs- function calling- a "reasoning effort" controlThe Chatroom link below has a couple challenges you can try with ...

OpenRouter (Alex Atallah) ▷ #general (209 messages🔥🔥):

Exposed OpenRouter keys, Chat details in API, Using OpenRouter API keys with PKCE, OpenRouter pricing structure, Model performance comparisons

Reporting Exposed OpenRouter Keys: A user discovered exposed OpenRouter API keys on GitHub with limits over $100 and inquired where to report them, with a member suggesting [email protected].
- There was a discussion about the safety of sending these compromised keys over email.
Retrieving Chat Details in API: An inquiry was made about viewing chat details of API calls, amid concerns about the inability to retrieve prompts or responses once the metadata is accessed.
- A suggestion was made for having a flag to see conversations as chats instead of stateless requests.
Using OpenRouter with PKCE: A user discussed creating a web app using OpenRouter API keys via PKCE, considering the security of handling keys on the client-side versus the backend.
- Recommendations were made for managing API keys securely while maintaining a near-stateless architecture.
OpenRouter Pricing and Costs: Clarifications were sought regarding the costs associated with OpenRouter’s service, particularly if using own API keys incurred additional fees.
- It was noted that using custom keys incurs a 5% fee on top of the upstream provider’s costs.
Performance of Various Models: A user noted inconsistencies in model responses, particularly with QwQ, prompting discussions about the role of model sizes in instruction following.
- Users were encouraged to utilize higher-end models like Google Experimental 1206 or DeepSeek-v2 for more consistent coding assistance.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
Model Spec (2024/05/08): no description found
Limits | OpenRouter: Set limits on model usage
LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps

Dec 19, 2024 Genesis: Generative Physics Engine for Robotics (o1-2024-12-17)

Thu, Dec 19, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

OpenAI o1 model, EVA Llama model, Price drops on models, Provider Pages improvements, New reasoning parameters

OpenAI launches o1 model with enhanced features: OpenAI’s new o1 model includes significant upgrades such as function calling, structured outputs and the novel reasoning_effort parameter, allowing better control over response time.
- Users can explore the model further at openai/o1 and find structured output tutorials here.
EVA Llama joins the family of models: Alongside o1, OpenRouter has introduced EVA Llama, a new storytelling and roleplay model, expanding the versatility of the available tools.
- Check out EVA Llama via this link for more details on its capabilities.
Price slashes for popular models: The gryphe/mythomax-l2-13b model has seen a 12.5% price reduction, making it even more accessible to users.
- Additionally, the QwQ reasoning model’s price has dropped by an impressive 55%, encouraging more engagement with the technology.
Introducing Provider Pages for transparent tracking: Provider Pages now feature clickable provider names, allowing users to access performance charts for all hosted models over time.
- For instance, users can explore the data for DeepInfra and assess the providers’ offerings easily.
Interactive challenges with the new Chatroom: The Chatroom is hosting challenges that encourage users to interact with the capabilities of the o1 model, including image and structured input handling.
- Users were directed to join the discussion, enabled by detailed links and challenges shared in the announcements.

Links mentioned:

o1-preview - API, Providers, Stats): The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding.The o1 models are optimized for math, science, programming, and other STEM-related tasks…
Tweet from OpenRouter (@OpenRouterAI))): Structured outputs are very underrated. It’s often much easier to constrain LLM outputs to a JSON schema than asking for a tool call.OpenRouter now normalizes structured outputs for- 46 models- 8 …
OpenRouter): A unified interface for LLMs. Find the best models & prices for your prompts
EVA Llama 3.33 70b - API, Providers, Stats): EVA Llama 3.33 70b is a roleplay and storywriting specialist model. Run EVA Llama 3.33 70b with API
OpenRouter): A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter): A unified interface for LLMs. Find the best models & prices for your prompts
MythoMax 13B - API, Providers, Stats): One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge. Run MythoMax 13B with API
QwQ 32B Preview - API, Providers, Stats): QwQ-32B-Preview is an experimental research model focused on AI reasoning capabilities developed by the Qwen Team. As a preview release, it demonstrates promising analytical abilities while having sev…
Tweet from OpenRouter (@OpenRouterAI): OpenAI o1 is now live for all! Try its 🧠 on:- image inputs- structured outputs- function calling- a “reasoning effort” controlThe Chatroom link below has a couple challenges you can try with …

OpenRouter (Alex Atallah) ▷ #general (209 messages🔥🔥):

OpenRouter keys exposure, API call metadata viewing, Using Google AI API with OpenRouter, Reasoning model instruction compliance, Model performance in coding assistance

Reporting Exposed OpenRouter Keys: A user found exposed OpenRouter keys on GitHub with high limits and inquired about reporting them, receiving guidance to contact support at OpenRouter.
- Concerns over the safety of sending compromised API keys via email were also discussed.
Viewing API Call Metadata: A member questioned how to retrieve prompts from API calls and learned that only metadata is accessible post-factum, while request/response pairs remain stateless.
- Suggestions included potential solutions like using a flag to capture chat details and proxying requests.
Discussion on Using Google AI API: It was confirmed that using a personal Google API key with OpenRouter incurs a 5% fee on top of the API usage costs, applicable whether credits are purchased or not.
- Users were made aware that the integration of personal API keys allows control over rate limits but still costs additional fees.
Reasoning Models and Instruction Following: Users noted challenges with QwQ following specific output formatting instructions, pointing to reasoning models’ design priorities that favor thought over strict instruction compliance.
- OpenAI’s introduction of a ‘developer’ role aims to enhance instruction adherence in these models, still leading to varying degrees of success.
Learning to Code Using AI Models: Members discussed various AI models suitable for coding assistance, highlighting Google Experimental 1206 for its expansive context capabilities and DeepSeek-v2 for general coding help.
- Practical examples included using extensive codebases as context for generating optimization suggestions and comments, enhancing the learning experience.

Links mentioned:

Integrations | OpenRouter: Bring your own provider keys with OpenRouter
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Limits | OpenRouter: Set limits on model usage
Model Spec (2024/05/08): no description found
LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps

Dec 18, 2024 o1 API, 4o/4o-mini in Realtime API + WebRTC, DPO Finetuning

Wed, Dec 18, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Structured Outputs, Multi-Model Apps, OpenRouter Model Support

OpenRouter Expands to 46 Models: OpenRouter now supports 46 different models for structured outputs, significantly enhancing the usability of multi-model applications.
- With structured outputs, it’s easier to constrain LLM outputs to a JSON schema, streamlining the development process, as highlighted in the demo here.
Normalization of Structured Outputs: The platform now normalizes structured outputs across 8 different model companies and includes 8 free models.
- This broad support is aimed at facilitating smoother integration of various models into applications, emphasizing the underrated nature of structured outputs.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Structured outputs are very underrated. It’s often much easier to constrain LLM outputs to a JSON schema than asking for a tool call.OpenRouter now normalizes structured outputs for- 46 models- 8 …

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

“

Shashank Excited About Something Awesome: A member expressed enthusiasm by stating, “That’s awesome!”
Gratitude Acknowledged: Another member responded with gratitude, indicating appreciation for the enthusiasm shared in the previous message.

OpenRouter (Alex Atallah) ▷ #general (130 messages🔥🔥):

Gemini Flash 2 performance, Using typos in prompts for model response, API Key Exposure, OpenRouter API limitations, o1 API changes and pricing

Gemini Flash 2 shows improved coding ability: Members discussed how Gemini Flash 2 generates better code for scientific problem-solving tasks compared to Sonnet 3.5, particularly in array sizing scenarios.
- Feedback indicated that external frameworks could help enhance effectiveness, bringing attention to the specific use-case efficiency.
Experimenting with typos in prompts to guide AI: Members shared ideas on using intentionally placed typos and meaningless words in prompts to influence model outputs, highlighting potential benefits for creative writing.
- The strategy includes attracting model attention to desired keywords, even while leveraging controlled outputs with Chain of Thought (CoT) techniques.
Reporting exposed OpenRouter API Keys: A member reported discovering exposed OpenRouter API keys on GitHub, prompting discussions on where to report such findings for security reasons.
- It was suggested to email [email protected] for any exposed keys that could pose a risk.
OpenRouter API limitations regarding chat details: Questions arose about the ability to retrieve chat history or prompted inputs from the OpenRouter API, with emphasis on the stateless nature of requests.
- It was clarified that while metadata is available, the full conversation isn’t stored on OpenRouter, thus requiring a proxy for logging conversations.
Changes in pricing and token usage for o1 API: Users noted that o1 API now consumes 60% fewer tokens, raising concerns about potential impacts on model performance.
- Discussions highlighted the necessity for adjustments in pricing and token efficiency, while confirming that tier limitations currently apply.

Links mentioned:

Limits | OpenRouter: Set limits on model usage
Google: Gemini Experimental 1206 (free) – Recommended Parameters: Check recommended parameters and configurations for Google: Gemini Experimental 1206 (free) - Experimental release (December 6, 2024) of Gemini.
Gemini Flash 2.0 Experimental (free) - API, Providers, Stats: Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1. Run Gemini Flash 2.0 Experimental (free) with API

Dec 17, 2024 Meta Apollo - Video Understanding up to 1 hour, SOTA Open Weights

Tue, Dec 17, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

SF Compute launch, Qwen QwQ price cut, New Grok models from xAI

SF Compute joins OpenRouter: OpenRouter announced a new provider: SF Compute, enhancing their offerings.
- This addition aims to broaden options for users looking for diverse service integrations.
Qwen QwQ gets a hefty price reduction: Qwen QwQ experiences a significant 55% price cut, attracting more users to its features.
- Details can be found on their pricing page.
Traffic increasing for new Grok models: Two new Grok models from xAI were released over the weekend, leading to increased traffic on their platform.
- Users are encouraged to explore all the models at OpenRouter’s xAI page.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Two new @Grok models from @xai came out this weekend - already seeing traffic move over.Check them all out here! https://openrouter.ai/x-ai
QwQ 32B Preview - API, Providers, Stats: QwQ-32B-Preview is an experimental research model focused on AI reasoning capabilities developed by the Qwen Team. As a preview release, it demonstrates promising analytical abilities while having sev...

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

OpenRouter API wrapper, OpenRouter-client

Launch of OpenRouter API Wrapper: A member shared the announcement of an API wrapper for OpenRouter, named openrouter-client, which was published just two days ago.
- The wrapper simplifies interactions with OpenRouter, featuring example code for implementation and configuration.
Community Excitement for API Wrapper: One member expressed enthusiasm about the new API wrapper, stating, That’s awesome! in response to the announcement.
- The developer acknowledged the excitement by responding with a simple, Thank you!

Links mentioned:

2024 LinkedIn Rewind | Your Year in Review: Create your personalized 2024 highlight reel for LinkedIn in minutes. Free tool for professionals to showcase achievements and insights in their authentic voice. No login required.
openrouter-client: An API wrapper for OpenRouter. Latest version: 1.1.0, last published: 2 days ago. Start using openrouter-client in your project by running `npm i openrouter-client`. There are no other projects in the...

OpenRouter (Alex Atallah) ▷ #general (372 messages🔥🔥):

Hermes 3 405B performance, Gemini Pro 2 capabilities, Image generation model updates, Prompt caching in LLM providers, Rate limits for Gemini models

Hermes 3 405B shows strong capabilities: Users reported that Hermes 3 405B has been effective for creative tasks, with some claiming it rivals Claude 2.0 in quality.
- However, there were discussions about its slower performance compared to other models in coding tasks.
Gemini Pro 2’s growing popularity: Gemini Pro 2 (1206) has been highlighted as a competitive alternative to models like Sonnet 3.5 for coding tasks.
- Some users noted its effectiveness in generating code and handling scientific problems better than Flash.
Image generation model updates from Google: Google announced new versions of its image generation models, including Imagen 3 and a new model called Whisk.
- These updates suggest a push towards better visual content generation capabilities in AI.
Prompt caching functionality in providers: Discussion arose regarding the absence of prompt caching features for open source models in certain providers.
- Some users theorized on the potential cost savings and efficiency gains that caching could provide in LLM applications.
Rate limits for Gemini models: Users expressed concerns over the rate limits associated with different Gemini models, especially under the Google Cloud Platform.
- It was observed that rate limits varied significantly between the experimental and production models.

Links mentioned:

Quick Start | OpenRouter: Start building with OpenRouter
Suchir Balaji: OpenAI whistleblower found dead in apartment: The San Francisco medical examiner's office determined Suchir Balaji's death to be suicide and police found no evidence of foul play.
Creating and highlighting code blocks - GitHub Docs: no description found
Limits | OpenRouter: Set limits on model usage
Reddit - Dive into anything: no description found
Parameters | OpenRouter: Configure parameters for requests
every-chatgpt-gui/README.md at main · billmei/every-chatgpt-gui: Every front-end GUI client for ChatGPT, Claude, and other LLMs - billmei/every-chatgpt-gui
Reddit - Dive into anything: no description found
State-of-the-art video and image generation with Veo 2 and Imagen 3: We’re rolling out a new, state-of-the-art video model, Veo 2, and updates to Imagen 3. Plus, check out our new experiment, Whisk.
Provider Routing | OpenRouter: Route requests across multiple providers
OpenRouter Status: OpenRouter Incident History

OpenRouter (Alex Atallah) ▷ #beta-feedback (3 messages):

OpenRouter launch, New feature integration

OpenRouter Feature Goes Live!: @alexatallah announced that the new feature is now live for everyone 🙂 and stated that an announcement will be put up soon.
- Stay tuned for more details!
Users Ask for Feature Usage Instructions: A user inquired, how to use this feature?, wanting clarity on the new functionality.
- Another user responded, noting that you just need to go to OpenRouter Settings Integrations and add your key there!

Dec 14, 2024 Meta BLT: Tokenizer-free, Byte-level LLM

Sat, Dec 14, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Model Provider Filtering, API Uptime Issues

Filter Models by Provider Now Available: Users can now filter the /models page by provider, enhancing the ability to find specific models quickly. A screenshot was provided with details on this update.
API Uptime Deteriorates During AI Launch Week: OpenRouter reported recovering over 1.8 million requests for closed-source LLMs amidst widespread API failures during AI Launch Week. Zsolt Ero noted that APIs from all providers experienced significant downtime, with OpenAI’s API down for 4 hours and Gemini’s API being nearly unusable.
- There were complaints about the reliability of various providers, with even Anthropic showing extreme unreliability, leading to major disruptions for businesses relying on these models.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): OpenRouter recovered over 1.8 million requests for closed-source LLMs in the last 2 daysQuoting Zsolt Ero (@hyperknot) Interesting side effect of this “AI Launch Week” is that all providers&#3…

OpenRouter (Alex Atallah) ▷ #general (77 messages🔥🔥):

Gemini Flash 2.0 Feedback, Euryale Model Issues, Using API Keys, Creative Writing Model Comparison, Synthetic Datasets in Pretraining

Gemini Flash 2.0 Encountering 0 Latency Bugs: Members discussed ongoing bugs with the Gemini Flash 2.0, noting that the homepage version is returning no providers, and expressed enthusiasm for the fixes being implemented.
- There was also a suggestion to link to the free version and concerns regarding quota exceeding messages when using Google models.
Euryale’s Recent Performance Decline: A member raised concerns about the Euryale model producing nonsensical outputs recently, suspecting a potential issue with model updates rather than changes on their end.
- Another member noted that similar experiences are common, highlighting the unpredictable nature of AI model performance.
Inquiry about API Key Usage Process: A user asked how to opt in for using their own model provider API keys within their account, seeking guidance on the necessary processes.
- Sources were shared for further details regarding account configurations and setup procedures.
Debate on Creative Writing Models: Members expressed strong opinions on the superiority of Claude 2.0 in creative writing, suggesting that newer models like Hermes 3 do not meet its quality.
- The conversation highlighted a perceived trend of downgrading creativity for intelligence in recent models and the need for more specialized prose-focused models.
Synthetic Datasets and Their Effectiveness: Concerns were raised regarding models trained on synthetic datasets performing well on benchmarks but badly in real applications, suggesting a sacrifice of creativity for optimization.
- A member posited that improving instructions and reasoning has inadvertently led to a decline in novel idea generation in these models.

Links mentioned:

LLM Model VRAM Calculator - a Hugging Face Space by NyxKrage: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
no title found: no description found
Gemini 2.0 Flash Experimental (free) - API, Providers, Stats: Gemini 2.0 Flash offers a significantly faster time to first token (TTFT) compared to [Gemini 1. Run Gemini 2.0 Flash Experimental (free) with API
no title found: no description found
Gemini 2.0 Flash Experimental - API, Providers, Stats: Gemini 2.0 Flash offers a significantly faster time to first token (TTFT) compared to [Gemini 1. Run Gemini 2.0 Flash Experimental with API
Gemini 2.0 Flash Experimental - API, Providers, Stats: Gemini 2.0 Flash offers a significantly faster time to first token (TTFT) compared to [Gemini 1. Run Gemini 2.0 Flash Experimental with API

OpenRouter (Alex Atallah) ▷ #beta-feedback (9 messages🔥):

Access to custom provider keys, Integration beta feature, API Keys provision

Thrill of Upcoming Key Access: Members are eagerly requesting access to the custom provider keys, with multiple users voicing their needs ahead of the public release.
- One member noted, ‘I would like to request access to the custom provider keys’.
Integration Beta Feature Requests Surge: Several members expressed their desire for access to the integration beta feature, demonstrating significant interest within the community.
- Comments like ‘Hi, I would like to get access to the integration feature’ indicate an active user engagement on this topic.
Excitement Over Key Access Launch: Alex Atallah confirmed that access to the custom provider keys is set to be opened soon, bringing anticipation to community members.
- He stated, ‘It’s now live for everyone 🙂 will put up an announcement soon,’ signaling the imminent availability of the keys.
User Initiative for Personal API Keys: One user expressed a desire to provide their own API Keys, which may hint at a push for customization options.
- The request highlights a growing interest in personalizing access beyond standard configurations.

Dec 10, 2024 OpenAI Sora Turbo and Sora.com

Tue, Dec 10, 2024

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Countless.dev launch, Claude 3.5 Sonnet updates, Integration with Poe

Countless.dev makes model comparison easy: The newly launched Countless.dev is a free and open-source tool designed to help users compare AI models, including LLMs and vision models, making it easy to sort by price, token limits, or features.
- It’s currently live on Product Hunt, and the creator has requested support to achieve a first place ranking.
Claude 3.5 Sonnet surpasses expectations: The updated Claude 3.5 Sonnet, titled claude-3-5-sonnet-20241022, boasts better-than-Opus capabilities while maintaining Sonnet prices, particularly excelling in coding and data science tasks.
- New features include enhanced visual processing and exceptional tool use for complex, multi-step problem solving.
Integration with Poe for enhanced functionality: Integration with Poe allows access to advanced features such as OpenAI Whisper and Text-to-Speech, broadening functionality for users.
- This integration is part of ongoing updates to improve user experience and expands the capabilities of AI models.

Links mentioned:

Adam - Poe: no description found
Countless.dev - Discover, compare, and choose AI models—100% Free | Product Hunt: Countless.dev makes it easy to explore, compare, and calculate costs for every AI model—LLMs, vision models, and more. Sort by price, token limits, or features, and find the perfect match for your use...

OpenRouter (Alex Atallah) ▷ #general (318 messages🔥🔥):

Llama Models, API Errors, Sora Model Features, OpenRouter Rate Limits, Mistral Model Updates

Discussion on Llama Models’ Performance: Users expressed interest in the effectiveness of various models like Llama 3.3 and Hermes, highlighting their smart functionalities and some being uncensored.
- Insights were shared about Llama being a popular choice for its capabilities and lack of restrictions, with old Gemini also being mentioned.
Experience with API Errors: A user reported experiencing ‘Provider Returned Error’ consistently with free models, indicating issues linked to API limitations.
- Others mentioned that these errors could be due to overload from the provider, particularly with Claude AI, leading to frustrations in usage.
Sora Model Features and Comparisons: Users discussed potential features of the Sora model, including its notable ‘remix’ feature for video editing, indicating a complex interface for user input.
- There were inquiries about video-to-video capabilities, with some skepticism about how effective Sora might be in comparison to existing tools like Runway.
OpenRouter’s Rate Limits: Questions arose about OpenRouter’s rate limits, with discussions around potential removal if users have sufficient credits.
- The rationale for these limits includes preventing big swings in account balances before the caches expire, with a focus on maintaining low latency.
Mistral Model Development Updates: Updates about Mistral models indicated that several unreleased models were recently pulled back shortly after being announced.
- The community speculated on whether the new Codestral and mistral-ocr models would be made available soon after their leak through API notices.

Links mentioned:

Sora: Transform text and images into immersive videos. Animate stories, visualize ideas, and bring your concepts to life.
Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Release Notes - December 6, 2024: December 5, 2024 We’re thrilled to introduce some of the most exciting Qwen models, along with the leading content moderation moderation model, llama Guard 3, now available on the SambaNova Cloud. ...
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Quick Start | OpenRouter: Start building with OpenRouter
InternVL2.5: no description found
Inference.net: Affordable Generative AI
Tweet from DeepInfra (@DeepInfra): 🚨 Big news! @DeepInfra supports Llama 3.3 70B on day 0 at the lowest prices:Llama 3.3 70B (bf16): $0.23/$0.40Llama 3.3 70B Turbo (fp8): $0.13/$0.40 in/out per 1MExperience cutting-edge AI with seamle...
Models Overview | Mistral AI Large Language Models: Mistral provides two types of models: free models and premier models.
EXAONE-3.5 - a LGAI-EXAONE Collection: no description found
Limits | OpenRouter: Set limits on model usage
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Google: Gemini Experimental 1121 (free): Experimental release (November 21st, 2024) of Gemini.
Gemini Experimental 1206 (free) - API, Providers, Stats: Experimental release (December 6, 2024) of Gemini.. Run Gemini Experimental 1206 (free) with API
fal.ai | The generative media platform for developers: fal.ai is the fastest way to run diffusion models with ready-to-use AI inference, training APIs, and UI Playgrounds
Anthropic Status: no description found
OpenGVLab/InternVL2_5-78B · Hugging Face: no description found
Reddit - Dive into anything: no description found
every-chatgpt-gui/README.md at main · billmei/every-chatgpt-gui: Every front-end GUI client for ChatGPT, Claude, and other LLMs - billmei/every-chatgpt-gui

OpenRouter (Alex Atallah) ▷ #beta-feedback (13 messages🔥):

Integration Beta Feature Requests, Custom Provider Keys, Amazon Bedrock Model Integrations, Google Flash Model Access

Multiple Requests for Integration Beta Feature Access: Several users have requested access to the integration beta feature, indicating a strong interest in trying out this functionality.
- Hi, I’d like to request access to the integration beta feature. was a common theme across various messages.
Interest in Custom Provider Keys: One user expressed a desire to try out the custom provider keys, highlighting the variety of integration options available.
- The request demonstrates the need for enhanced functionality in the integration landscape.
Proposed Model Integrations for Amazon Bedrock: A member suggested adding Opus and Mistral Large to the models recognized by Amazon Bedrock for integrations.
- This proposal emphasizes ongoing interest in expanding available models within current integration capabilities.
Access Request for Google Flash Model: One user mentioned seeking access to the Google Flash 1.5 Model, suggesting specific technical interests.
- Hi I saw that I was to come here to get the beta for access to Google Flash 1.5 Model. indicates platform guidance for accessing models.

Dec 06, 2024 Meta Llama 3.3: 405B/Nova Pro performance at 70B price

Sat, Dec 7, 2024

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Author Pages feature, New Amazon Nova models, DeepInfra price drops, Launch of Llama 3.3, Text-based use cases

Explore Models with New Author Pages: OpenRouter launched a new feature allowing users to explore models by creators at openrouter.ai/<author>, showcasing detailed stats and related models via a carousel.
- This update aims to enhance user experience in discovering and analyzing different authors’ collections.
Amazon’s Nova Models Hit the Scene: Amazon unveiled the Nova family of models, including Nova Pro 1.0, Nova Micro 1.0, and Nova Lite 1.0, available for exploration on OpenRouter.
- These models can be accessed using the respective links on the OpenRouter site.
DeepInfra Slashes Prices on Multiple Models: DeepInfra announced significant price reductions, including Llama 3.2 3B Instruct down to $0.018 and Mistral Nemo slashed to $0.04.
- This move gives users a chance to access high-quality models at lower costs, catering to budget-conscious developers.
Llama 3.3 Model Goes Live!: The highly anticipated Llama 3.3 model launched, with two providers already offering it shortly after its release, marking a significant update for text-based applications.
- As noted by AI at Meta, this model promises leading performance in generating synthetic data at reduced inference costs.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Only took 40 minutesLlama 3.3 is live! 🦙🦙🦙Quoting AI at Meta (@AIatMeta) As we continue to explore new post-training techniques, today we're releasing Llama 3.3 — a new open source model that d...
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Nova Pro 1.0 - API, Providers, Stats: Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. Run Nova Pro 1.0 with API
Nova Micro 1.0 - API, Providers, Stats: Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. Run Nova Micro 1.0 with API
Nova Lite 1.0 - API, Providers, Stats: Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Run Nova Lite 1.0 with API

OpenRouter (Alex Atallah) ▷ #general (235 messages🔥🔥):

Amazon Nova Models, OpenAI Updates, Llama 3.3 Launch, Anthropic Model Expectations, InternVL Models

Amazon Nova Models Generate Mixed Reviews: Several users reported issues with Amazon Nova, describing the model as subpar compared to others, with one commenting it’s ‘not very good’.
- Despite the criticism, some noted the potential for speed and cost-effectiveness, showing a divide in user experiences.
OpenAI Day 2 Features Minimal Excitement: On the second day of the OpenAI presentation, the announcement focused on the upcoming reinforcement learning finetuning for o1, generating minimal excitement among users.
- Participants expressed skepticism about the value of these updates, suggesting they expected more substantive advancements.
Llama 3.3 Launch Sparks Interest: The release of Llama 3.3 brought enthusiasm, with users eager to explore its capabilities, despite differing opinions on its overall value compared to other models.
- One user highlighted the speed of OpenRouter in making the model available, signifying good community response.
Anthropic Model Speculations Run Wild: Discussion around Anthropic’s next moves included expectations for a potential release of Opus 3.5, linking it to responses to competing models like GPT-4.5.
- Participants speculated about whether any upcoming models would genuinely enhance capabilities or mirror previous releases.
InternVL Models Overlooked Amid New Releases: Interest in new models like Llama 3.3 overshadowed the mention of InternVL 2.5, with some questioning why certain good models are ignored.
- Opinions varied on the Intern models, reflecting a complex landscape of user preferences toward newer AI offerings.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Tweet from Ahmet ☕ (@ahmetdedeler101): Back in 2015, Elon Musk and Sam Altman shared their thoughts on Trump, AI, and the government. this was just 3 months after they decided to start OpenAI—when it was still a secret. Seeing how they w...
Tweet from DeepInfra (@DeepInfra): 🚨 Big news! @DeepInfra supports Llama 3.3 70B on day 0 at the lowest prices:Llama 3.3 70B (bf16): $0.23/$0.40Llama 3.3 70B Turbo (fp8): $0.13/$0.40 in/out per 1MExperience cutting-edge AI with seamle...
Tweet from Ahmad Al-Dahle (@Ahmad_Al_Dahle): Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier & more cost-efficient to run. By leveraging the latest advancements in post-training techniques in...
Tweet from OpenAI (@OpenAI): OpenAI o1 is now out of preview in ChatGPT.What’s changed since the preview? A faster, more powerful reasoning model that’s better at coding, math & writing.o1 now also supports image uploads, allowin...
12 Days of OpenAI: Day 2: Begins at 10am PTJoin Mark Chen, SVP of OpenAI Research, Justin Reese, Computational Researcher in Environmental Genomics and Systems Biology, Berkeley Lab, ...
Llama 3.3 70B Instruct - API, Providers, Stats: The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). Run Llama 3.3 70B Instruct with API
Nathan Sarrazin (@nsarrazin.com): New Llama model just dropped! Evals are looking quite impressive but we'll see how good it is in practice. We're hosting it for free on HuggingChat, feel free to come try it out: https://hf.co...
Magnum v4 72B - API, Providers, Stats: This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3. Run Magnum v4 72B with API
OpenGVLab/InternVL2_5-78B · Hugging Face: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (5 messages):

Custom Beta Keys, Integration Beta Feature

Repeated Requests for Custom Beta Keys: Several members including vini_43121 and spunkrock. have requested access to custom provider keys multiple times.
- Despite repeated inquiries, there has been no response confirming access or clarifying the process.
Interest in Integration Beta Feature: alehendrix expressed a desire to access the integration Beta Feature, seeking further clarification on availability.
- baten84 also inquired directly on how to gain access to the same feature, indicating a growing interest among members.

Dec 06, 2024 $200 ChatGPT Pro and o1-full/pro, with vision, without API, and mixed reviews

Fri, Dec 6, 2024

OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

OpenRouter token generation, Lambda model price reductions, Author Pages feature launch, Google AI Studio models outage, Amazon Nova model family release

OpenRouter generates a Wikipedia worth of tokens daily: .@OpenRouterAI is now producing a Wikipedia of tokens every 5 days.
- Alex Atallah remarked on this ambitious output, noting it’s equivalent to generating one Wikipedia worth of text daily.
Lambda slashes model prices significantly: Lambda announced major discounts across several models, with Hermes 3B now priced at $0.03, down from $0.14.
- Other models like Llama 3.1 405B and Qwen 32B Coder also saw price drops, promising a better deal for users.
Exciting new Author Pages feature launched: OpenRouter introduced Author Pages, allowing users to explore all models from a specific creator easily at openrouter.ai/<author>.
- This feature includes detailed stats and a related models carousel for a richer user experience.
Brief outage in Google AI Studio models: There was a transient bug affecting Google AI Studio models, causing them to return 404 errors for about 5 minutes.
- The issue was resolved quickly with no action required from users.
Amazon Nova model family debuts: The new Nova family of models from Amazon has launched, featuring models like Nova Pro 1.0 and Nova Lite 1.0.
- Explore these new models and their features on their respective links provided by OpenRouter.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): QwQ usage on OpenRouter is now dwarfing o1-preview & o1-mini:Quoting kache (@yacineMTB) qwen QwQ 32b is awesome, holy shit
Tweet from OpenRouter (@OpenRouterAI): Now generating one Wikipedia of tokens per day 📚Quoting Alex Atallah (@xanderatallah) .@OpenRouterAI generates 'one Wikipedia' worth of words about every 5 days
Parameters | OpenRouter: Configure parameters for requests
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Nova Pro 1.0 - API, Providers, Stats: Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. Run Nova Pro 1.0 with API
Nova Micro 1.0 - API, Providers, Stats: Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. Run Nova Micro 1.0 with API
Nova Lite 1.0 - API, Providers, Stats: Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Run Nova Lite 1.0 with API

OpenRouter (Alex Atallah) ▷ #general (232 messages🔥🔥):

OpenRouter outages, Amazon Nova models, OpenAI O1 updates, Claude's correction behavior, Elon Musk and Sam Altman podcast

Recent OpenRouter outages: Users reported downtime with the OpenRouter API, experiencing connection issues and slow responses.
- Some users noted fluctuating service quality, prompting discussions about expected performance during peak usage.
Exploration of Amazon Nova models: The release of Amazon Nova models, including Nova Pro and Lite, has been met with interest and inquiries regarding their advantages over established models like Claude and GPT.
- Cost was highlighted as a primary reason for considering Amazon’s offerings, prompting users to explore their features.
Updates on OpenAI’s O1 model: OpenAI announced that the O1 model is out of preview, providing improvements in reasoning capabilities, particularly in math and coding.
- Concerns remain about the model’s speed and reliability based on past performance metrics.
Behavior of Claude on corrections: A user observed that Claude can correct mistakes in its output after finalizing a response, resulting in discrepancies between displayed text and copied text.
- This raises awareness among users about potential inconsistencies in the chat output and copied content.
Discussion on the 2015 Musk and Altman podcast: A user shared insights from a 2015 podcast featuring Elon Musk and Sam Altman discussing AI and government prior to the founding of OpenAI.
- Clips from the podcast highlighted their perspectives at the time, which many found insightful and thought-provoking.

Links mentioned:

Unveiling Hermes 3: The First Full-Parameter Fine-Tuned Llama 3.1 405B Model is on Lambda’s Cloud: Introducing Hermes 3 in partnership with Nous Research, the first fine-tune of Meta Llama 3.1 405B model. Train, fine-tune or serve Hermes 3 with Lambda
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Justin Garrison (@justingarrison.com): AI profits don’t come from product income. They come from perceived value (aka stock market) and they keep powerful companies in powerStartups aren’t disrupting things. They’re inflating value for the...
- YouTube: no description found
- YouTube: no description found
Tweet from OpenAI (@OpenAI): OpenAI o1 is now out of preview in ChatGPT.What’s changed since the preview? A faster, more powerful reasoning model that’s better at coding, math & writing.o1 now also supports image uploads, allowin...
Tweet from Ahmet ☕ (@ahmetdedeler101): Back in 2015, Elon Musk and Sam Altman shared their thoughts on Trump, AI, and the government. this was just 3 months after they decided to start OpenAI—when it was still a secret. Seeing how they w...
Parameters | OpenRouter: Configure parameters for requests
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
[AINews] not much happened today: a quiet day is all you need. AI News for 11/29/2024-12/2/2024. We checked 7 subreddits, 433 Twitters and 29 Discords (198 channels, and 4766 messages) for...
[AINews] not much happened today: another quiet day is all we need. AI News for 12/3/2024-12/4/2024. We checked 7 subreddits, 433 Twitters and 29 Discords (198 channels, and 2915 messages)...

OpenRouter (Alex Atallah) ▷ #beta-feedback (4 messages):

Custom Beta Keys Access

Multiple Requests for Custom Beta Keys: Several members expressed interest in obtaining access to custom beta keys for testing purposes.
- One member inquired about the information required to facilitate access, stating, ‘If it’s possible, what information do you need?’.
Call for Organization in Beta Key Access: Members are collectively requesting beta access to custom provider keys, indicating a strong interest in expanding their testing capabilities.
- One member cheerfully acknowledged joining the request, showcasing a community drive for collaboration.

Dec 05, 2024 not much happened today

Thu, Dec 5, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

alexatallah: 20% price cut for Claude 3.5 Haiku!

OpenRouter (Alex Atallah) ▷ #general (148 messages🔥🔥):

Hermes 405B Free Service Status, Gemini Ultra Access, Amazon Nova Model Discussion, Model Memory Functionality, Custom Provider Keys Beta

Hermes 405B Free Service Stopped: The free service for Hermes 405B has been removed, likely due to provider decisions rather than OpenRouter actions, leading to disappointment among users.
- Some users are exploring other options, but the base 405B model remains available for free despite the loss.
Gemini Ultra’s Availability: There are discussions surrounding Gemini 1.0 Ultra, which is rumored to be available but is currently subjected to allowlists for access.
- Users feel that the rollout and versioning of Google’s models lead to confusion, with speculation that Ultra might be discontinued soon.
Discussion on Amazon Nova for Creative Writing: There is curiosity about the effectiveness of the Amazon Nova model for creative writing tasks, with users looking for personal experiences.
- Speculation exists that while Nova is being evaluated, its capabilities compared to others like Runway remain uncertain.
Model Memory and Context Extension: A user inquired about models having memory to retain previous interactions, with suggestions leaning towards self-hosting solutions for context extension.
- Methods such as summarizing past messages to extend context length were recommended as alternatives.
Requesting Early Access to Custom Provider Keys: Users are wanting access to the custom provider keys feature, which is currently in beta and might incur fees in the future.
- To request early access, users are directed to a specific Discord channel for further information.

Links mentioned:

AI Gateway · Cloudflare AI Gateway docs: Cloudflare's AI Gateway allows you to gain visibility and control over your AI apps. By connecting your apps to AI Gateway, you can gather insights on how people are using your application with analyt...
OpenRouter · Cloudflare AI Gateway docs: OpenRouter ↗ is a platform that provides a unified interface for accessing and using large language models (LLMs).
Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Unveiling Hermes 3: The First Full-Parameter Fine-Tuned Llama 3.1 405B Model is on Lambda’s Cloud: Introducing Hermes 3 in partnership with Nous Research, the first fine-tune of Meta Llama 3.1 405B model. Train, fine-tune or serve Hermes 3 with Lambda
no title found: no description found
Llama 3.1 405B Instruct (free) - API, Providers, Stats: The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs.Meta's latest cla...
Claude 3.5 Sonnet - API, Providers, Stats: New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Run Claude 3.5 Sonnet with API
no title found: no description found
How To Tell If Your Social Media Addiction Has Gone Too Far: How to tell if your obsession with FarmVille is a major problem: http://www.yourtango.com/201064181/social-media-addiction-are-you-riskPresenting A YourTango...
Build Generative AI Applications with Foundation Models - Amazon Bedrock Pricing - AWS: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (6 messages):

Custom Key Beta Access

Community Eager for Custom Key Beta Access: Several members expressed their desire for access to the custom key beta and raised their hands in requests.
- One member pleaded, ‘I would like the custom key beta access as well!’, while another shared gratitude for the team’s efforts regardless of the timeline.
Inquiry About Timeline for Key Access: A member inquired about the estimated timeline for obtaining the custom keys, asking if anyone could provide a guess.
- They acknowledged the uncertainty, stating, ‘we totally get it, and thank all of you for all your hard work.‘

Wed, Dec 4, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Model removals, Price reductions, Claude 3.5 Haiku discount

Two Models Disappear: The models nousresearch/hermes-3-llama-3.1-405b and liquid/lfm-40b have been removed from availability.
- Users are reminded to keep API requests operational by adding credits.
Major Price Drops Announced: The price for nousresearch/hermes-3-llama-3.1-405b fell from 4.5 to 0.9 per million tokens, while liquid/lfm-40b decreased from 1 to 0.15.
- These significant reductions come as a silver lining following the removals.
Sales Alert on Claude 3.5 Haiku: A 20% price cut has been announced for Claude 3.5 Haiku, providing users with a more affordable option.
- This discount is part of the ongoing efforts to make models more accessible.

OpenRouter (Alex Atallah) ▷ #general (117 messages🔥🔥):

Hermes 405B Model Status, OpenRouter API Key Management, Gemini Flash Errors, New Amazon Nova Models, LLM Tokenization Insights

Hermes 405B Model No Longer Available: Users confirmed that the Hermes 405B model is gone for good, with sentiments expressed about this being an end of an era.
- The cost of alternatives was discussed, with some users considering purchasing models while others express preference for available free models.
OpenRouter Key Management Features: OpenRouter allows for the creation and management of API keys, with users able to set and adjust credit limits per key without automatic resets on limit changes.
- Users were assured they need to manage key access themselves, maintaining control over who can use their application.
Transient Gemini Flash Errors: Users reported encountering a 525 Cloudflare error while accessing Gemini Flash, which was found to be a transient issue that resolved quickly.
- The model’s instability was noted, with recommendations to verify functionality via OpenRouter’s chat interface.
Plans for Amazon Nova Models: There are ongoing discussions about integrating the new Amazon Nova models, which are currently exclusive to AWS Bedrock.
- Users showed interest in the new models, suggesting they appear to be decent options worth pursuing.
Insights on LLM Tokenization: The discussion included how LLMs break down unseen strings into recognizable tokens, with an emphasis on the importance of token embedding rather than tokens themselves.
- An external resource for experimenting with various tokenizers was shared, allowing for further exploration of the topic.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Tiktokenizer: no description found
The Tokenizer Playground - a Hugging Face Space by Xenova: no description found
GitHub - openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI's models.: tiktoken is a fast BPE tokeniser for use with OpenAI's models. - openai/tiktoken
Transforms | OpenRouter: Transform data for model consumption
Build Generative AI Applications with Foundation Models - Amazon Bedrock Pricing - AWS: no description found
Reddit - Dive into anything: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (5 messages):

Custom provider keys, BYOK access, Gemini experimental model

Access Requested for Custom Provider Keys: Multiple users expressed interest in gaining access to the custom provider keys, indicating a high demand within the channel.
- One member specifically linked their access queries to the inability to use the Gemini experimental 1121 model.
Update on BYOK Access: A quick update announced that the team is working to bring BYOK (Bring Your Own Key) access to everyone soon, although the private beta is currently paused.
- The team is actively addressing some kinks before moving forward.

Dec 02, 2024 not much happened today

Tue, Dec 3, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Feature Requests Voting, Channel for Additional Requests

Vote for Top Feature Requests Now!: Members are encouraged to vote for their top feature requests here to help prioritize future developments.
- Additionally, for any requests that are not listed, they can use <#1107397803266818229> to submit those.
Channel for Additional Feature Requests: A dedicated channel (<#1107397803266818229>) is provided for users to submit any feature requests not covered in the voting.
- This allows for a broader range of input regarding desired features from the community.

OpenRouter (Alex Atallah) ▷ #general (57 messages🔥🔥):

Pixtral Large's Capabilities, Concerns about Model Responses, Provider-Specific Features, Image Generation in OpenRouter, Structured Outputs from Llama 3.2

Pixtral Large impresses users: Users have noted that Pixtral Large offers excellent performance and a massive free tier, encouraging easy access via console.mistral.ai. Another user switched from Hermes 405b to Pixtral, finding it effective with unchanged prompts.
Confusion over Model Identifications: Discussion arose around model training, with some clarifying that models do not inherently know their identity and instead often hallucinate details from training data. This raised questions about why confusion persists despite these explanations.
Question on Cost Calculation Methods: A user inquired whether there are any rates for the /api/v1/generation endpoint and how to accurately estimate generation costs. Suggestions included using Helicone for tracking and clarified that currently, the generation endpoint is necessary for precise cost assessment.
Future of Image Generation in OpenRouter: Although image generation is not currently on the immediate roadmap for OpenRouter, it’s not ruled out as a possibility in the future. Discussions indicate a growing interest in image model capabilities among users.
Challenges with Llama 3.2’s Structured Outputs: Users reported difficulties in obtaining structured outputs with Llama 3.2-vision-instruct, noting that while it claims JSON output capability, performance has lagged in comparison to alternatives like Gemini Flash. It was highlighted that the support for such features largely depends on the inference software used.

Links mentioned:

OpenRouter Integration - Helicone OSS LLM Observability: no description found
Provider Routing | OpenRouter: Route requests across multiple providers
Llama 3.2 90B Vision Instruct - API, Providers, Stats: The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioni...
Pixtral Large: Pixtral grows up.
Introduction - Helicone OSS LLM Observability: no description found
LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps

OpenRouter (Alex Atallah) ▷ #beta-feedback (5 messages):

Custom Provider Keys

Developers push for access to custom provider keys: Multiple developers expressed interest in accessing custom provider keys, indicating a strong community demand for this feature.
- One member noted, ‘Thank you for all the great work!’ while requesting access.
Collective requests from developers: Several users, including those identified as monomethylhydrazine and kit18, also expressed their desire to use their own keys for certain providers.
- This recurring theme highlights a consensus among developers on the need for these functionalities.

Nov 29, 2024 not much happened to end the week

Sat, Nov 30, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Feature Requests Voting, Channel for Additional Requests

Vote for Top Feature Requests Now!: Members are encouraged to vote for their top feature requests here to help prioritize future developments.
- Additionally, for any requests that are not listed, they can use <#1107397803266818229> to submit those.
Channel for Additional Feature Requests: A dedicated channel (<#1107397803266818229>) is provided for users to submit any feature requests not covered in the voting.
- This allows for a broader range of input regarding desired features from the community.

OpenRouter (Alex Atallah) ▷ #general (57 messages🔥🔥):

Pixtral Large's Capabilities, Concerns about Model Responses, Provider-Specific Features, Image Generation in OpenRouter, Structured Outputs from Llama 3.2

Pixtral Large impresses users: Users have noted that Pixtral Large offers excellent performance and a massive free tier, encouraging easy access via console.mistral.ai. Another user switched from Hermes 405b to Pixtral, finding it effective with unchanged prompts.
Confusion over Model Identifications: Discussion arose around model training, with some clarifying that models do not inherently know their identity and instead often hallucinate details from training data. This raised questions about why confusion persists despite these explanations.
Question on Cost Calculation Methods: A user inquired whether there are any rates for the /api/v1/generation endpoint and how to accurately estimate generation costs. Suggestions included using Helicone for tracking and clarified that currently, the generation endpoint is necessary for precise cost assessment.
Future of Image Generation in OpenRouter: Although image generation is not currently on the immediate roadmap for OpenRouter, it’s not ruled out as a possibility in the future. Discussions indicate a growing interest in image model capabilities among users.
Challenges with Llama 3.2’s Structured Outputs: Users reported difficulties in obtaining structured outputs with Llama 3.2-vision-instruct, noting that while it claims JSON output capability, performance has lagged in comparison to alternatives like Gemini Flash. It was highlighted that the support for such features largely depends on the inference software used.

Links mentioned:

OpenRouter Integration - Helicone OSS LLM Observability: no description found
Provider Routing | OpenRouter: Route requests across multiple providers
Llama 3.2 90B Vision Instruct - API, Providers, Stats: The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioni...
Pixtral Large: Pixtral grows up.
Introduction - Helicone OSS LLM Observability: no description found
LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps

OpenRouter (Alex Atallah) ▷ #beta-feedback (5 messages):

Custom Provider Keys

Developers push for access to custom provider keys: Multiple developers expressed interest in accessing custom provider keys, indicating a strong community demand for this feature.
- One member noted, ‘Thank you for all the great work!’ while requesting access.
Collective requests from developers: Several users, including those identified as monomethylhydrazine and kit18, also expressed their desire to use their own keys for certain providers.
- This recurring theme highlights a consensus among developers on the need for these functionalities.

Nov 28, 2024 Qwen with Questions: 32B open weights reasoning model nears o1 in GPQA/AIME/Math500

Thu, Nov 28, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Gemini Flash 1.5, Provider Routing, Load Balancing, Grok Vision Beta

Gemini Flash 1.5 Capacity Increased: OpenRouter has implemented a major boost to the capacity of Gemini Flash 1.5. Users who experienced rate limiting should try their requests again.
- This improvement should enhance user experience significantly during high traffic periods.
Provider Pricing Optimization: The platform is now routing exponentially more traffic to the lowest-cost providers, ensuring users benefit from lower prices on average. More information can be found in the Provider Routing documentation.
- This strategy maintains performance by falling back to other providers when necessary.
Load Balancing Strategy Explained: OpenRouter’s load balancing prioritizes providers with stable uptime and routes requests based on cost-effectiveness. For instance, requests are weighted to favor providers with the lowest costs and fewest outages in the past 10 seconds.
- This ensures that resources are used efficiently and effectively during high-demand situations.
Grok Vision Beta Launch: OpenRouter is ramping up capacity for the Grok Vision Beta, encouraging users to try it out at this link.
- This is an opportunity for users to test the service as it scales up its capabilities.

Links mentioned:

Provider Routing | OpenRouter: Route requests across multiple providers
Grok Vision Beta - API, Providers, Stats: Grok Vision Beta is xAI’s experimental language model with vision capability.. Run Grok Vision Beta with API

OpenRouter (Alex Atallah) ▷ #general (136 messages🔥🔥):

Jamba 1.5 model, AI21 Labs support, EVA Qwen2.5 pricing, Claude API issues, OpenRouter functionality

Jamba 1.5 Model Issues: Users reported issues with the Jamba 1.5 mini model from AI21 Labs, specifically receiving empty responses when calling functions.
- Despite trying different versions, the issue persisted and users speculated it might be due to message preparation or backend problems.
EVA Qwen2.5 Pricing Doubles: Some users noticed that the price for EVA Qwen2.5 72B had doubled and questioned whether this was a promotional price or a standard increase.
- There was speculation that the pricing change could be due to increased competition and business strategies.
Claude API Errors: A user faced errors while using Claude API, encountering messages about blocked functions and backend issues.
- Errors were attributed to a recent incident on Anthropic’s status board, but many users reported inconsistent behavior with Claude models.
OpenRouter’s Model Context Length Concern: A user raised concerns regarding the qwen-2.5-coder model only allowing up to 32k context length instead of the expected 128k.
- It was clarified that support for 128k depends on the provider, specifically noting that only Hyperbolic serves the full context window.
OpenRouter Chat Streaming Issues: Users expressed frustrations that chat streaming in the OpenRouter chat room makes it difficult to read messages as the screen continuously moves.
- A request for an option to disable streaming was made to enhance user experience and readability.

Links mentioned:

QwQ: Reflect Deeply on the Boundaries of the Unknown: GITHUB HUGGING FACE MODELSCOPE DEMO DISCORDNote: This is the pronunciation of QwQ: /kwju:/ , similar to the word “quill”.What does it mean to think, to question, to understand? These are t…
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning: Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations …
[{“model”: “anthropic/claude-3.5-sonnet”, “messages”: {“role”: “system”, “conten - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
The capabilities of multimodal AI | Gemini Demo: Our natively multimodal AI model Gemini is capable of reasoning across text, images, audio, video and code. Here are favorite moments with Gemini Learn more…
Provider Routing | OpenRouter: Route requests across multiple providers
Elevated errors for requests to Claude 3.5 Sonnet: no description found
Llama 3.1 Euryale 70B v2.2 - API, Providers, Stats: Euryale L3.1 70B v2. Run Llama 3.1 Euryale 70B v2.2 with API
Dubesor LLM Benchmark table: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (4 messages):

Custom API Keys Access Requests

Developers eager for Custom API Keys: Multiple developers expressed interest in gaining access to the custom API key feature, highlighting their enthusiasm for the platform’s capabilities.
- One user thanked the team for their great work, while another simply requested to understand how to access this feature.
More requests for custom beta keys: Another developer joined the conversation, stating they would also like to request access to custom beta keys.
- This indicates a growing interest among community members in exploring advanced functionalities of the platform.

Nov 27, 2024 OLMo 2 - new SOTA Fully Open LLM

Wed, Nov 27, 2024

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Companion emotional scoring system, Enhanced interaction realism, Security improvements in Companion, Automated security audits

Companion’s Emotional Scoring Takes Center Stage: The latest updates to Companion introduce an emotional scoring system that understands the emotional tone of conversations, starting neutral and building familiarity over time.
- As the conversation shifts, Companion maintains the emotional connection across different channels, ensuring warmth and understanding.
Diverse Emotional Perspectives Captured: Companion now adapts its responses based on an emotional spectrum, balancing perspectives like love vs. hatred and justice vs. corruption.
- This flexibility allows it to handle multiple models without enforcing a single emotional interpretation.
Security Enhancements for Smoother Interactions: Recent updates have improved detection of personal information such as phone numbers, reducing false positives in Companion.
- These security enhancements include ongoing automated security audits to keep user servers secure and in line with best practices.
Making Interactions More Meaningful: The updates aim to create a smoother, safer, and more connected experience, positioning Companion as more than just a tool.
- It’s about fostering relationships and ensuring that every interaction is significant.
Explore More on GitHub: For detailed insights into these changes, check out the full details on GitHub at GitHub Repository.
- This repository hosts comprehensive information about Companion’s latest features and security enhancements.

OpenRouter (Alex Atallah) ▷ #general (98 messages🔥🔥):

OpenRouter API Key Issues, Performance of Gemini Models, Usage of Models and Document Types, Chat Synchronization Across Devices, Limitations of Free Models

OpenRouter API Key Errors: A user reported a 401 error indicating an incorrect API key while using the OpenRouter API, despite confirming the key is correct.
- Another member suggested checking for quotation marks in the API key, a common mistake that can lead to such errors.
Challenges with Gemini Models: One user experienced a resource exhaustion error (code 429) when utilizing the Gemini Experimental 1121 free model for chatting.
- It was advised to switch to production models to avoid such rate limit errors encountered with experimental ones.
Document Formats and Usage: Users discussed limitations regarding the types of documents that can be attached when using OpenRouter, noting the capabilities with PDFs and HTML files.
- While attaching HTML is seen as beneficial to avoid data loss, it was cautioned that PDFs may lead to problems with text extraction.
Chat Synchronization Across Devices: A user inquired about syncing chat conversations between devices, to which it was clarified that conversations do not get stored on OpenRouter servers.
- Alternatives like using LibreChat were suggested for those needing cloud storage of conversations for syncing across devices.
Limitations of Free Models: Concerns were raised about limitations encountered with free models, including rate limits and response lengths.
- It was mentioned that users with a zero credit balance may still be subject to additional limitations despite being able to test non-free models.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Meta: Llama 3.2 90B Vision Instruct – Provider Status: See provider status and make a load-balanced request to Meta: Llama 3.2 90B Vision Instruct - The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most chal…
LibreChat: Free, open source AI chat platform - Every AI for Everyone
LibreChat: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (3 messages):

Access to Integrations, Access to Custom Provider Keys

Requests for Access to Integrations: A member requested help with gaining access to Integrations using the email [email protected].
- Can someone help?
Follow-up on Access Requests: Another member expressed frustration about not receiving access after a couple of weeks, asking for more information regarding their request.
- If there’s a way to get more information please let me know.
Call for Custom Provider Keys Access: A member introduced themselves and requested access to custom provider keys to join the submissions.
- I’d kindly request access to custom provider keys.

Nov 26, 2024 Anthropic launches the Model Context Protocol

Tue, Nov 26, 2024

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

AI Commit Message Generator, Toledo1 AI Assistant, Compound AI Systems

AI-Powered Commit Message Generation Tool: A new CLI command called cmai was created to generate commit messages using AI, leveraging the OpenRouter API with Bring Your Own Key (BYOK) functionality. It’s open source, and users are encouraged to contribute, with details available on GitHub.
- The command is designed for ease of use, transforming the often tedious commit message process into a fun and effective task.
Toledo1 Offers Unique AI Chat Experience: Toledo1 provides a novel way to privately interact with AI assistants, featuring a pay-per-question model and the ability to combine multiple AIs for tailored answers. Users can check out the demo at toledo1.com.
- This platform allows clients to process real-time data effortlessly, integrating seamlessly with personal workflows through a native desktop application.
Toledo1’s Transparent Pricing and Licensing: Toledo1 operates on a transparent pay-per-query pricing model with no subscriptions and enterprise-grade security. Users simply activate their license key for immediate access without complex setup.
- The tool also supports various AI providers compatible with OpenAI inference, providing flexibility in user selection and usage.
Exploring Compound AI Capabilities: Toledo1’s technology allows for the combination of various AIs to enhance answer accuracy, revealing a substantial leap in AI capabilities. For a deeper dive into the technology, check the discussion on compound AI systems.
- This innovative approach positions Toledo1 at the forefront of AI utilization in personal and professional contexts.

Links mentioned:

Toledo1 – Achieve search sovereignty with Toledo1, a high performance LLM browser: no description found
GitHub - toledo-labs/toledo1: Achieve search sovereignty with Toledo1, a high performance LLM browser: Achieve search sovereignty with Toledo1, a high performance LLM browser - toledo-labs/toledo1
GitHub - mrgoonie/cmai: A quick CLI command to generate commit message using AI and push to origin: A quick CLI command to generate commit message using AI and push to origin - mrgoonie/cmai

OpenRouter (Alex Atallah) ▷ #general (112 messages🔥🔥):

Hermes modifications, API rate limits, Gemini model downtime, LLM workflow tools, Speculative decoding

Hermes Modifications Yield Performance Boost: A user detailed modifications made to llama3.c, achieving an impressive 43.44 tok/s in prompt processing and outperforming other implementations using Intel’s MKL functions.
- They noted performance gains due to using local arrays for matrix calculations, significantly improving processing speed.
OpenRouter API Rate Limits for Users: Questions about if OpenRouter operates on a single API key suggested potential rate limit issues but responses indicated there may be private agreements that allow flexibility.
- The presence of varying contract terms highlights OpenRouter’s tailored relationship with its providers.
Gemini Model Response Issues Reported: A user reported receiving empty responses from the Gemini 1.5 model, causing speculation about its operational status.
- Confirmation that some users were able to access the model suggests the issue might be isolated to specific setups.
Interest in Comprehensive LLM Workflow Platforms: A user inquired about platforms enabling complex prompt chaining for tasks like book writing, emphasizing the need for human interaction at various stages.
- The requirement for versioning and flow adjustments per input indicates a demand for sophisticated project management tools integrated with AI capabilities.
Clarification on OpenRouter Token Limits: One user queried about potential token limits for the org’s usage, initially observing a 30k limit, only to realize it might have stemmed from their own account.
- This serves as a reminder for users to verify their individual token metrics before attributing limits to organizational accounts.

Links mentioned:

no title found: no description found
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Explaining how LLMs work in 7 levels of abstraction: Overview
Toledo1 – Free 30 day License! – Toledo1: no description found
llama3.c/run.c at master · jameswdelancey/llama3.c: A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct models. - jameswdelancey/llama3.c
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts

OpenRouter (Alex Atallah) ▷ #beta-feedback (10 messages🔥):

Custom Provider Keys, Beta Integration Access

Multiple Requests for Custom Provider Keys: Several members requested access to custom provider keys, expressing their gratitude in each message.
- Requests were made from users like mzh8936 (email [email protected]) and perspectivist, highlighting a strong interest in these keys.
Desire for Beta Integration Feature: Multiple users, including itzmetimmy88, expressed a request for access to the beta integration key.
- This indicates a keen interest in testing new features before general release.

Nov 22, 2024 Vision Everywhere: Apple AIMv2 and Jina CLIP v2

Sat, Nov 23, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Claude 3.5 Haiku renaming, Model ID changes, Discord requests for models

Claude 3.5 Haiku gets a ‘dot’ rename: The model Claude 3.5 Haiku has been renamed to use a dot instead of a dash in its ID, affecting its availability.
- New model IDs can be found at Claude 3.5 Haiku and Claude 3.5 Haiku 20241022, but users are advised that these may not be available.
Multiple model IDs specified: Additional model IDs include Claude 3.5 Haiku:beta and Claude 3.5 Haiku 20241022:beta, however they are also not available.
- Users can request these models by visiting our Discord for assistance.
Previous IDs still functional: Despite the changes, the previous IDs associated with the models should still work without issues.
- Acknowledgment and thanks were given for users flagging the changes made to the model identification.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts

OpenRouter (Alex Atallah) ▷ #general (118 messages🔥🔥):

Gemini Model Issues, OpenRouter API Usage, Tax on OpenRouter Credits, Prompt Engineering Strategies, Engineering Community Updates

Gemini Model Faces Quota Limitation: Users reported receiving quota errors when trying to access the free Gemini Experimental 1121 model, particularly when using OpenRouter.
- Recommendations included connecting directly to Google Gemini for better access.
OpenRouter API Token Count Discrepancies: Issues were raised about the Qwen 2.5 72B Turbo model not returning token counts via the API, with users noting that other providers function correctly.
- However, activity reports on the OpenRouter page do show token usage correctly.
Tax Implications for OpenRouter in Europe: A user inquired why purchasing credits for OpenRouter does not incur tax, unlike services from OpenAI or Anthropic which add VAT.
- The response indicated that VAT is the user’s responsibility to calculate and that future plans may include automatic tax calculations.
Prompt Engineering Techniques Explored: Discussion included strategies for few-shot prompting, described as effective through structured user/assistant role examples.
- Users shared references to resources and examples that highlight best practices in prompt design.
Community Support and Feedback: General community interactions included troubleshooting issues with the Chat UI and model performance, reflecting ongoing user engagement.
- Members exchanged helpful links and suggested tools to enhance their experiences with API integrations.

Links mentioned:

neThing.xyz - AI Text to 3D CAD Model: 3D generative AI for CAD modeling. Now everyone is an engineer. Make your ideas real.
OpenRouter Integration - Helicone OSS LLM Observability: no description found
Tweet from GCP Weekly (@gcpweekly): Announcing Mistral AI’s Large-Instruct-2411 and Codestral-2411 on Vertex AI #googlecloud https://cloud.google.com/blog/products/ai-machine-learning/announcing-mistral-ais-large-instruct-2411-and-codes...
GitHub - Aider-AI/aider: aider is AI pair programming in your terminal: aider is AI pair programming in your terminal. Contribute to Aider-AI/aider development by creating an account on GitHub.
anthropic-cookbook/skills/retrieval_augmented_generation/guide.ipynb at main · anthropics/anthropic-cookbook: A collection of notebooks/recipes showcasing some fun and effective ways of using Claude. - anthropics/anthropic-cookbook
Transforms | OpenRouter: Transform data for model consumption
hackaprompt/hackaprompt-dataset · Datasets at Hugging Face: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (8 messages🔥):

Access to custom provider keys

Widespread Requests for Custom Provider Keys Access: Several users, including sportswook420 and vneqisntreal, expressed a desire to gain access to the custom provider keys.
- Requests were made repeatedly, highlighting a significant interest in this feature across the channel.
Multiple Appeals for Activation: Users such as hawk1399 and lokiwong specifically asked, Hi can I get access to custom provider keys? emphasizing urgency.
- This highlights a clear demand for guidance on how to access these keys.
Enthusiasm for Feature Access: intern111_29945 voiced their eagerness, stating, I’d love to get access to this feature, reinforcing overall enthusiasm.
- This suggests a sense of community interest in enhancing functionality through access to custom provider keys.

Nov 22, 2024 LMSys killed Model Versioning (gpt 4o 1120, gemini exp 1121)

Fri, Nov 22, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

New models release

High context provider selection

New models introduced this week: The GPT-4o has launched with better prose, details available here. Other new models include Mistral Large (link), Pixtral Large (link), Grok Vision Beta (link), and Gemini Experimental 1114 (link).
Selecting providers for high context prompts: Users have shown confusion about how to select providers that support high context; OpenRouter automatically routes to those that do. If you send a long prompt or max tokens, providers with small context or max output are filtered out.

Links mentioned:

GPT-4o (2024-11-20) - API, Providers, Stats: The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with...
Mistral Large 2411 - API, Providers, Stats: Mistral Large 2 2411 is an update of [Mistral Large 2](/mistralai/mistral-large) released together with [Pixtral Large 2411](mistralai/pixtral-large-2411) It is fluent in English, French, Spanish, Ge...
Pixtral Large 2411 - API, Providers, Stats: Pixtral Large is a 124B open-weights multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images. Run Pixtra...
Grok Vision Beta - API, Providers, Stats: Grok Vision Beta is xAI's experimental language model with vision capability. . Run Grok Vision Beta with API
Gemini Experimental 1114 - API, Providers, Stats: Gemini 11-14 (2024) experimental model features "quality" improvements.. Run Gemini Experimental 1114 with API

OpenRouter (Alex Atallah) ▷ #general (162 messages🔥🔥):

Mistral Model Issues

OpenRouter API Functionality

Gemini Experimental Models

File Upload Capabilities

Community Engagement in OpenRouter

Mistral Model Facing Deprecation: Users reported that the Mistral Medium model has been deprecated, causing an error when accessed, indicating that priority is not enabled for it.
- Members suggested switching to Mistral-Large, Mistral-Small, or Mistral-Tiny to continue using the service.
OpenRouter API Documentation Cleared Up: Users expressed confusion regarding certain functionalities in the OpenRouter API documentation, specifically around context window capabilities.
- It was suggested to enhance clarity in documentation to aid understanding for users integrating OpenRouter with tools like LangChain.
New Gemini Experimental Models Update: The Gemini Experimental 1121 model has been introduced, with claims of improved coding, reasoning, and vision capabilities.
- Users noted the existing quota restrictions shared with the LearnLM model and expressed curiosity about the model’s performance.
File Upload Capability in Models: Discussion arose regarding file upload limitations, with users questioning if any models accept non-image formats.
- It was clarified that image uploads are supported, and the recent infrastructure upgrades may have lifted the previous 4MB restriction.
Community Building and Founder Insights: A user inquired about the founding of OpenRouter and the motivations behind its creation.
- Community engagement was highlighted as a significant factor in OpenRouter’s development, and suggestions for a write-up on its story were mentioned.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
LFM 40B MoE (free) - API, Providers, Stats: Liquid's 40.3B Mixture of Experts (MoE) model. Run LFM 40B MoE (free) with API
Llama 3.1 8B Instruct (free) - API, Providers, Stats: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 8B Instruct (free) with API
no title found: no description found
Despicable Me Animation GIF - Despicable Me Animation Movies - Discover & Share GIFs: Click to view the GIF
OpenRouter Status: OpenRouter Incident History
Requests | OpenRouter: Handle incoming and outgoing requests
Requests | OpenRouter: Handle incoming and outgoing requests

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

Claude 3.5

Custom provider key requests

User requests custom provider key for Claude 3.5 Sonnet: A member requested a custom provider key for Claude 3.5 Sonnet, expressing frustration with running out of usage on the main Claude app.
- They hope that this request will provide a viable alternative to their current constraints.
Concerns about Claude app usage limits: The discussion highlighted issues regarding the usage limits on the main Claude app, leading to user frustration.
- Members are seeking solutions to manage their usage more effectively and improve their experience.

Nov 21, 2024 DeepSeek-R1 claims to beat o1-preview AND will be open sourced

Thu, Nov 21, 2024

OpenRouter (Alex Atallah) ▷ #general (120 messages🔥🔥):

Gemini 1114 performance

DeepSeek updates

Prompt caching

GPT-4o model issues

RP model comparisons

Gemini 1114 struggles with input handling: Users reported that Gemini 1114 often ignores image inputs during conversations, leading to hallucinated responses, unlike models such as Grok vision Beta.
- Members are hoping for confirmation and fixes, expressing frustration over recurring issues with the model.
DeepSeek launches new reasoning model: A new model, DeepSeek-R1-Lite-Preview, was announced, boasting enhanced reasoning capabilities and performance on AIME & MATH benchmarks.
- However, some users noted the model’s performance is slow, prompting discussions about whether DeepInfra might be a faster alternative.
Clarifications on prompt caching: Prompt caching is available for specific models like DeepSeek, with users questioning the caching policies of other providers.
- Some members discussed how caching works differently between systems, particularly noting Anthropic and OpenAI protocols.
Issues with GPT-4o model description: Users identified discrepancies in the newly released GPT-4o, noting the model incorrectly listed an 8k context and wrong descriptions linked to GPT-4.
- After highlighting the errors, members saw quick updates and fixes to the model card, restoring accurate information.
Comparisons of RP models: Members discussed alternatives to Claude for storytelling and role-playing, with suggestions for Hermes due to its perceived quality and cost-effectiveness.
- Users indicated a mix of experiences with these models, with some finding Hermes preferable while others remain loyal to Claude.

Links mentioned:

Tweet from DeepSeek (@deepseek_ai): 🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! 🔍 o1-preview-level performance on AIME & MATH benchmarks. 💡 Transparent thought process in real-time. 🛠️ Open-sour…
Yi Large - API, Providers, Stats: The Yi Large model was designed by 01.AI with the following usecases in mind: knowledge search, data classification, human-like chat bots, and customer service. Run Yi Large with API
anthropic-cookbook/misc/prompt_caching.ipynb at main · anthropics/anthropic-cookbook: A collection of notebooks/recipes showcasing some fun and effective ways of using Claude. - anthropics/anthropic-cookbook
GPT-4o (2024-11-20) - API, Providers, Stats: The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with…
Prompt Caching | OpenRouter: Optimize LLM cost by up to 90%
Provider Routing | OpenRouter: Route requests across multiple providers

OpenRouter (Alex Atallah) ▷ #beta-feedback (6 messages):

Custom provider keys

Key integration access

Anthropic Claude 3.5 Sonnet

x-ai/grok-beta

xai

Multiple Requests for Custom Provider Keys: Several members have expressed their desire to request a custom provider key, including for x-ai/grok-beta and Anthropic Claude 3.5 Sonnet.
- One user noted that they already have an account with credits that would be beneficial for use with OpenRouter.
Inquiries about Key Integration Access: A member inquired about the process to gain access to key integration, expressing enthusiasm to test it out.
- This shows an ongoing interest in exploring available features and tools.

Nov 20, 2024 Perplexity starts Shopping for you

Wed, Nov 20, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Activity page outage

Activity Page Outage Investigation: The Activity page experienced an outage, prompting an ongoing investigation into the issue.
- An update was provided stating that it was back up and running at 12:31 PM ET.
Activity Page Restoration Update: The team communicated that the outage on the Activity page has been resolved and service is restored as of 12:31 PM ET.
- Users can now access the page normally following the brief disruption.

OpenRouter (Alex Atallah) ▷ #general (119 messages🔥🔥):

O1 Preview and Streaming Support

Gemini Model Issues

Mistral API Limitations

OpenRouter Error Reports

Developer Requests and Suggestions

O1 Streaming now Available: Discussion revealed that OpenAI’s o1-preview and o1-mini models now support real streaming capabilities, as confirmed by OpenAIDevs. This change opens up access for developers across all paid usage tiers.
- Members noted past limitations with ‘fake’ streaming methods and expressed interest in better clarity around updates.
Frequent Errors with Gemini Models: Users reported high error rates, particularly 503 errors, while using Google’s Flash 1.5 and Gemini Experiment 1114, indicating potential rate limiting issues. Some community members speculated about potential bugs related to the newer experimental models.
- Additionally, errors related to resource exhaustion were common, prompting suggestions for improved communication from OpenRouter.
Mistral Model Limitations: A user expressed issues when using Mistral models with OpenRouter, particularly regarding infinite loops and repeated outputs. This seems to be a recurring pattern across multiple models, including Mistral Nemo and Gemini.
- Community members suggested adjustments, like lowering temperature settings, but acknowledged the challenges in addressing these technical difficulties.
OpenRouter Dashboard Errors: Users faced issues accessing the OpenRouter settings panel, particularly in the Brave browser, citing missing Redis configuration parameters. Alex addressed these concerns, confirming investigation and announcing that the panel was back online.
- Other users noted similar issues in Chrome, highlighting discrepancies across different browsers.
Developer Feature Requests: Community members discussed potential enhancements to the OpenRouter platform, including a request for a ‘copy’ button for code outputs and a feature to review account activity. These requests reflect users’ desire for improved usability and functionality.
- Suggestions were well-received, with some members indicating the reasonableness of implementing such features.

Links mentioned:

Large Enough: Today, we are announcing Mistral Large 2, the new generation of our flagship model. Compared to its predecessor, Mistral Large 2 is significantly more capable in code generation, mathematics, and reas…
Tweet from OpenAI Developers (@OpenAIDevs): Streaming is now available for OpenAI o1-preview and o1-mini. 🌊 https://platform.openai.com/docs/api-reference/streaming And we’ve opened up access to these models for developers on all paid usage t…
Models | OpenRouter: Browse models on OpenRouter
How generative AI expands curiosity and understanding with LearnLM: LearnLM is our new Gemini-based family of models for better learning and teaching experiences.
Self Report Among Us GIF - Self report Among us Troll - Discover & Share GIFs: Click to view the GIF
Cerebras Now The Fastest LLM Inference Processor; Its Not Even Close: The company tackled inferencing the Llama-3.1 405B foundation model and just crushed it.
DOJ Will Push Google to Sell Chrome to Break Search Monopoly - Bloomb…: no description found
no title found: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (5 messages):

Custom Provider Keys

Beta Custom Provider Keys

Bring Your Own API Keys

Requests for Custom Provider Keys flood in: Several users have requested access to custom provider keys, expressing a keen interest in utilizing them for various applications.
- One user noted, ‘I’d like to request access for custom provider keys please’, showing a clear demand for these resources.
Interest in Beta Custom Provider Keys: A user specifically requested access to beta custom provider keys, highlighting a desire for the latest features available.
- The phrasing offers insight into a growing trend of users seeking early access or testing new capabilities in the project.
Bring Your Own API Keys: Another user mentioned a desire to request access to bring their own API keys, indicating a push for more customizable solutions.
- This reflects a shift towards user-defined integrations within the platform.
Plus One for Custom Provider Key Access: A user showed support for custom provider keys with a simple ‘+1’, emphasizing community backing for key access requests.
- Such endorsements may suggest a growing consensus on the importance of these keys among the users.

Nov 19, 2024 Pixtral Large (124B) beats Llama 3.2 90B with updated Mistral Large 24.11

Tue, Nov 19, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Perplexity models

Citations attribute

Chat completions

Perplexity models beta launches Grounding Citations: All Perplexity models now support a new citations attribute in beta, enhancing the information provided in completion responses with associated links.
- This feature enables users to obtain reliable sources directly from the output, with example URLs like BBC News and CBS News.
Structured citation format enhances usability: The output structure includes not just the completion but also the citations array, which lists URLs relevant to the completion context for better reference.
- This change is aimed at improving user experience by providing easier access to the sources of information presented in the chat completion.

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

Threaded Conversations

Model Switching

vnc-lm Discord Bot

WordPress Chatbot Feature

Market Competition against Intercom and Zendesk

Threaded Conversations Enhancements: A member shared that support for threaded conversations has been added, allowing changes made inside the thread to be reflected in future messages.
- Keywords from the initial prompt are used to name the thread, making it easier to rejoin conversations.
Quick Model Switching on-the-fly: The update includes the ability to switch models mid-conversation by simply sending + followed by part of the model name, maintaining context and settings.
- For example, sending + claude switches to anthropic/claude-3-sonnet:beta seamlessly.
Launch of vnc-lm Discord Bot: vnc-lm is introduced as a Discord bot that integrates leading language model APIs, enhancing user interactions.
- Its emphasis on utility in the Discord environment is highlighted by an overview in the provided GitHub link.
New WordPress Plugin Feature: An update on a WordPress plugin was shared, featuring the ability to create a custom website chatbot, inclusive of OpenRouter support.
- Details are available on their features page.
Competitors Take Note!: A light-hearted comment noted that this new chatbot feature will likely disrupt competitors like Intercom and Zendesk.
- The member noted this with amusement, hinting at the growing competitiveness in the space.

Links mentioned:

no title found: no description found
GitHub - jake83741/vnc-lm: vnc-lm is a Discord bot that integrates leading large language model APIs.: vnc-lm is a Discord bot that integrates leading large language model APIs. - jake83741/vnc-lm

OpenRouter (Alex Atallah) ▷ #general (998 messages🔥🔥🔥):

Gemini Fast Performance

Mistral API Issues

Self-Moderated vs OR Moderated APIs

OpenAI O1 Streaming Feature

User Discussions on Prompt Engineering

Gemini Experimental Model Insights: Users have reported that the new Gemini experimental model, identified as gemini-exp-1114, displays improved creativity and less censorship compared to earlier versions, making it a more dynamic option for prompting.
- However, it may sometimes produce gibberish responses or require careful prompting to avoid excessive censorship.
Mistral API Instability: Users are experiencing instability with the mistral-large-2407 model, which occasionally returns gibberish despite temperature settings, while the mistral-large-2411 appears to be more sensible in its responses.
- There is a discussion on the differences in output quality and temperature sensitivity, indicating variability in the performance of Mistral models.
Understanding Self-Moderated vs OR Moderated APIs: The community discussed the differences between self-moderated and OR moderated APIs, with a focus on cost implications and the likelihood of receiving moderated content.
- Self-moderated models tend to result in charges even for failed requests, while non-self-moderated versions would not incur costs on moderation restrictions.
OpenAI O1 Models Support Streaming: OpenAI announced that streaming is now available for their o1-preview and o1-mini models, opening access to developers across all paid usage tiers.
- This enhancement will allow for improved interactivity in applications using these models, moving beyond previously implemented fake streaming methods.
User Interaction and Humor: Amidst technical discussions, users shared light-hearted comments and humor about various models, including comparisons and playful banter regarding usage experiences.
- The community maintained a lively atmosphere with references and jokes addressing the ongoing developments in AI models and their capabilities.

Links mentioned:

Chub Venus AI: no description found
Tweet from Tom’s Hardware (@tomshardware): Meta is using more than 100,000 Nvidia H100 AI GPUs to train Llama-4 — Mark Zuckerberg says that Llama 4 is being trained on a cluster “bigger than anything that I’ve seen” https://trib.al/fynPPuR
Perspective API: no description found
Bocchi The Rock Hitori Gotoh GIF - Bocchi The Rock Hitori Gotoh Bocchi - Discover & Share GIFs: Click to view the GIF
Tweet from heiner (@HeinrichKuttler): Fun at 100k+ GPU scale: Our training just briefly broke because a step counter overflowed 32 bits. 😅
Tweet from Tibor Blaho (@btibor91): “chatgpt-4o-latest-20241111” on LMSYS Chatbot Arena?
SillyTavern - LLM Frontend for Power Users: no description found
Models: ‘setti’ | OpenRouter: Browse models on OpenRouter
Hey GIF - Hey - Discover & Share GIFs: Click to view the GIF
Tweet from OpenAI Developers (@OpenAIDevs): Streaming is now available for OpenAI o1-preview and o1-mini. 🌊 https://platform.openai.com/docs/api-reference/streaming And we’ve opened up access to these models for developers on all paid usage t…
Exposing The Flaw In Our Phone System: Can you trust your phone? Head to https://brilliant.org/veritasium to start your free 30-day trial and get 20% off an annual premium subscription.A huge than…
Tweet from Qwen (@Alibaba_Qwen): After the release of Qwen2.5, we heard the community’s demand for processing longer contexts. https://qwenlm.github.io/blog/qwen2.5-turbo/ Today, we are proud to introduce the new Qwen2.5-Turbo ver…
Sign the Petition: Push OpenRouter to include filtering for illegal content.
Provider Routing | OpenRouter: Route requests across multiple providers

OpenRouter (Alex Atallah) ▷ #beta-feedback (21 messages🔥):

Custom Provider Keys Access Requests

Integration Beta Feature Requests

Custom Provider Keys Access Requests Galore: Multiple users expressed their desire to get access to custom provider keys, stating messages such as ‘I’d love to get access!’ and ‘requesting access to custom provider keys’.
- Concerns were raised about the lack of responses, with one user noting ‘1 day has passed and nobody has responded to my key request’.
Integration Beta Feature Access in Demand: Several users inquired about gaining access to integration beta features, highlighting a common interest among members in utilizing APIs.
- Messages like ‘Hi, could I apply for the access to the integration beta feature?’ reflect the urgency and collective interest in gaining access.

Nov 16, 2024 Stripe lets Agents spend money with StripeAgentToolkit

Sat, Nov 16, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

MistralNemo StarCannon

Celeste

Perplexity Citations

Beta Features

Model Updates

MistralNemo StarCannon and Celeste Support Dropped: MistralNemo StarCannon and Celeste are no longer available as the only provider has dropped support for them.
- This change affects all users relying on these models for their projects.
Perplexity Introduces Grounding Citations in Beta: Perplexity models now feature a beta implementation of citations, enabling URLs to be returned with completion responses.
- This new attribute provides users with direct links for further information, enhancing the reliability of the generated content.

OpenRouter (Alex Atallah) ▷ #general (65 messages🔥🔥):

Gemini API Availability

Rate Limits on OpenRouter

Hermes 405B Model

Perplexity Citations Live

Magnum 72B Model Evaluation

Gemini API now accessible: Members discussed that Gemini is now available through the API, prompting excitement about its capabilities.
- Despite some users still not seeing the changes, it’s clear that this model is anticipated to improve user interactions.
Understanding Rate Limits on OpenRouter: The conversation highlighted the rate limits for free models, specifically noting a 200 requests per day constraint.
- Members emphasized that this limit effectively hinders utilization in production environments, making free models less practical.
Frequent Returns to Hermes 405B: Fry69_dev pointed out that after testing multiple models, they keep returning to Hermes 405B for its efficiency.
- Despite its costs impacting profit margins, its performance remains unmatched for many users.
Perplexity Citations now active on OpenRouter: Alex Atallah announced that perplexity citations have gone live on OpenRouter, generating interest among users.
- However, some users reported they could not access the feature yet due to API response requirements.
Evaluation of Magnum 72B for Writing Style: Frehref mentioned that the Magnum 72B model is recognized for its good writing style despite being pricey.
- Takefy planned to test this model but expressed concerns about the costs associated with its use.

Links mentioned:

Limits | OpenRouter: Set limits on model usage
no title found: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (9 messages🔥):

Custom Provider Keys Access

High Demand for Custom Provider Keys: Several users, including giampie.ro and @wyatt02146, expressed a desire to request access to Custom Provider Keys.
- The volume of requests indicates a significant interest in these keys, prompting questions about the acquisition process.
Inquiry on Access Process for Custom Provider Keys: A member questioned whether acquiring access to Custom Provider Keys could be automated given the numerous requests from the community.
- A lot of people here have requested access suggests that users are eager for clarity on the steps involved in gaining access.
General Requests Abound for Custom Provider Keys: Users consistently reiterated their requests for Custom Provider Keys, with expressions of interest from multiple members like @cuts2k and @schwemschwam.
- The repeated requests reflect a growing anticipation for access among community members eager to engage with the functionality.
Support and Next Steps Requested: Members like @pjtidder have sought information on the next steps for accessing beta Custom Provider Keys and indicate the need for support.
- Such requests underscore the community’s interest in clear guidance on moving forward with their access requests.

Nov 15, 2024 Gemini (Experimental-1114) retakes #1 LLM rank with 1344 Elo

Fri, Nov 15, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

UnslopNemo 12B v4

SorcererLM

Inferor 12B

Model Status Updates

UI Improvements

Introducing UnslopNemo 12B v4 for Adventure Writing: The latest model, UnslopNemo 12B, designed for adventure writing and role-play scenarios, has been launched.
- Access a free variant for 24 hours with this link: UnslopNemo 12B Free.
Advanced Roleplay with SorcererLM: The new SorcererLM is fine-tuned on WizardLM-2-8x22B for enhanced storytelling experiences.
- Join our Discord to request access or for further inquiries.
Inferor 12B is the Ultimate Roleplay Model: Inferor 12B combines top roleplay models, although users should set reasonable output limits to avoid excessive text.
- Request this model through our Discord for access.
Service Downtime Briefly Disrupts Operations: A brief downtime of about 1.5 minutes occurred due to an environment syncing issue but has since been resolved.
- Further updates and status can always be found at OpenRouter Status.
User Experience Enhanced with UI Improvements: Recent updates include visibility of max context length on model pages and the introduction of a document search functionality using cmd + K.
- A new table list view also allows for better model visualization, making it easier to find information.

Links mentioned:

OpenRouter Status: OpenRouter Incident History
OpenRouter): LLM router and marketplace
OpenRouter): LLM router and marketplace
OpenRouter): LLM router and marketplace
OpenRouter): LLM router and marketplace

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

GitHub open source project policies

WordPress Chatbot Plugin Launch

Companion Discord Bot Features

Inquiry on GitHub Open Source Posting Rules: A user inquired about the rules and policies for posting GitHub open source projects.
- Another member responded that the guidelines are very lax, stating that if you use OpenRouter in any way, it should be acceptable.
Launch of WordPress Chatbot Plugin: A user announced their WordPress chatbot plugin is live with features for custom shortcodes and dynamic tags.
- They noted that the chatbot can serve multiple roles such as a support bot or sales bot, and confirmed support for OpenRouter.
Companion: Enhancing Discord Security and Interaction: A member introduced Companion, a program aimed at personalizing Discord personas while enhancing safety through automated moderation.
- It features impersonation detection, age exploit detection, and allows for dynamic message rate adjustments to improve server engagement.

Links mentioned:

no title found: no description found
Home: An AI-powered Discord bot blending playful conversation with smart moderation tools, adding charm and order to your server. - rapmd73/Companion

OpenRouter (Alex Atallah) ▷ #general (201 messages🔥🔥):

Unslopnemo 12b

DeepSeek context limitations

Gemini API updates

OpenRouter API Issues

AI Studio generateSpeech API

Unslopnemo 12b searchability issue: Unslopnemo 12b is searchable but does not appear in the newest models sort feature on the models page.
- This discrepancy prompted a brief discussion about whether sorting mechanics are functioning properly.
DeepSeek’s context error: Users reported that despite documentation claiming a 128k context capacity, DeepSeek’s API fails with inputs exceeding 47k tokens.
- After further investigation, it was determined that the actual maximum context length is 65k tokens.
Gemini API and model availability: It was discussed that while Gemini has experimental models available, they are not yet accessible via the OpenRouter API.
- Users noted that a particular model, gemini-exp-1114, is currently limited to AI Studio.
OpenRouter API stability: There was a brief downtime reported for OpenRouter services, causing some users to experience issues with various models.
- The situation was clarified, confirming the services returned to normal and models like Claude were functioning.
New AI Studio features: AI Studio is launching a new generateSpeech API endpoint designed to create speech from specified models based on input transcripts.
- This feature aims to enhance the capabilities of existing models in generating audio output from text.

Links mentioned:

Quick Start | OpenRouter: Start building with OpenRouter
Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Elevated errors on the API: no description found
Models | OpenRouter: Browse models on OpenRouter
no title found: no description found
2024-11-14-214227 hosted at ImgBB: Image 2024-11-14-214227 hosted in ImgBB
OpenRouter: LLM router and marketplace
Anthropic Status: no description found
Models | OpenRouter: Browse models on OpenRouter
OpenRouter: LLM router and marketplace
Provider Routing | OpenRouter: Route requests across multiple providers
OpenRouter Status: OpenRouter Incident History

OpenRouter (Alex Atallah) ▷ #beta-feedback (7 messages):

Custom Provider Keys

Customer Integration Access

Multiple Requests for Custom Provider Keys: Several members requested access to Custom Provider Keys, citing their interest and need for the feature.
- One member explicitly stated, ‘I would like to request Custom Provider Keys please.’
Inquiry about Customer Integration Access: One member sought clarification on how to obtain access for customer integration.
- They asked, ‘How do we get access for customer integration?’ indicating interest in utilizing related features.

Nov 14, 2024 Common Corpus: 2T Open Tokens with Provenance

Thu, Nov 14, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

UnslopNemo 12B

SorcererLM

Inferor 12B

Mistral Parameter Updates

UI Improvements

UnslopNemo 12B v4 launched for adventurers: The latest model, UnslopNemo 12B, designed for adventure writing and role-play scenarios, is now available.
- A free variant is also available for 24 hours at UnslopNemo 12B Free with requests directed to Discord.
SorcererLM introduces advanced storytelling: SorcererLM is an advanced roleplay model built using Low-rank 16-bit LoRA fine-tuned on WizardLM-2-8x22B, available for trial.
- Requests for this model can be directed to our Discord.
Inferor 12B merges top roleplay models: The new Inferor 12B combines the best features of existing roleplay models.
- Users are advised to set reasonable max output limits to prevent excessive text generation, with requests also going to Discord.
Mistral and Gemini gain parameter enhancements: Both Mistral and Gemini have added support for Frequency Penalty and Presence Penalty, enhancing their parameter capabilities.
- Mistral implementation now includes tools for seed adjustments as well.
New UI features enhance user experience: Recent UI improvements include a document search functionality activated by cmd + K, facilitating model searches significantly.
- The introduction of a new table list view allows users to observe more models concurrently, enhancing overall navigability.

Links mentioned:

OpenRouter): LLM router and marketplace
OpenRouter): LLM router and marketplace
OpenRouter): LLM router and marketplace
OpenRouter): LLM router and marketplace

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

GitHub Open Source Posting Rules

Inquiry on GitHub Open Source Posting Policies: A member inquired about the rules and policies for posting GitHub open source projects.
- They requested to be tagged in responses, highlighting their interest in receiving detailed information on the topic.
Seeking Clarification on Posting Guidelines: The same member emphasized the importance of understanding the rules to effectively share projects on GitHub.
- They expressed gratitude in advance for any insights shared by others.

OpenRouter (Alex Atallah) ▷ #general (186 messages🔥🔥):

Model Performance Issues

Tool Calling Functionality

Image Generation APIs

Qwen Model Updates

Mistral Large Output Quality

Challenges with Mistral Large Output: A user reported receiving gobbledygook from Mistral Large, despite trying various system prompts and restarting their instance.
- The issue was resolved after adjusting the settings for frequency and presence penalties, which were recently added.
Confusion over Tool Calling: Users discussed the tool calling feature, which is designed to enhance interactions with models by injecting tools into prompts.
- However, some found that while tool calling was enabled, it did not seem to impact token usage as expected.
Performance of Qwen Model on OpenRouter: There were discussions about the Qwen model and its capabilities regarding tool calling, as users expressed skepticism over its effectiveness.
- It was noted that while the model theoretically supports tool calling, some users experienced issues with implementation.
Image Generation API Recommendations: Users sought recommendations for reliable image generation APIs along with suitable platforms and models to consider.
- The conversation hinted at the need for optimal performance and pricing for API services in this area.
High Token Processing Volume: One user mentioned processing over 3 million tokens daily while developing an AI chatbot focused on a niche vertical.
- This raised questions regarding potential price reductions for high-volume token processing on certain models.

Links mentioned:

Bloomberg - Are you a robot?: no description found
OpenRouter: LLM router and marketplace
Avian.io: Avian.io is home of the worlds fastest inference for Llama 405B and more. Try our AI cloud platform and API now with no rate limits.
Prompt Caching | OpenRouter: Optimize LLM cost by up to 90%
Grok Beta - API, Providers, Stats: Grok Beta is xAI’s experimental language model with state-of-the-art reasoning capabilities, best for complex and multi-step use cases. It is the successor of [Grok 2](https://x. Run Grok Beta w…
Responses | OpenRouter: Manage responses from models
Requests | OpenRouter: Handle incoming and outgoing requests

OpenRouter (Alex Atallah) ▷ #beta-feedback (7 messages):

Custom Provider Keys Access Requests

Members eager for Custom Provider Keys: Multiple users requested access to Custom Provider Keys, highlighting a strong demand for this feature.
- Each request emphasized a desire to utilize the keys effectively for their specific needs.
Community engagement on access requests: A variety of members contributed to the discussion, showcasing an active interest in obtaining Custom Provider Keys.
- Requests varied from simple expressions of need to direct appeals for access.

Nov 13, 2024 BitNet was a lie?

Wed, Nov 13, 2024

OpenRouter (Alex Atallah) ▷ #announcements (8 messages🔥):

Qwen2.5 Coder 32B

Gemini models updates

Scheduled Downtime

Qwen2.5 Coder 32B surpasses competitors: New high-performing open-source model, Qwen2.5 Coder 32B, is out and reportedly beats Sonnet and GPT-4o at several coding benchmarks, as noted in a tweet from OpenRouter.
- However, some members raised concerns about the accuracy of these claims, suggesting that tests like MBPP and McEval might be deceptive.
Gemini models get new features: Gemini 1.5 Flash, Pro, and 8B models now incorporate frequency penalty, presence penalty, and seed adjustments, as detailed in the official update from OpenRouter.
- Links to further information include Gemini Flash 1.5 8B and Gemini Pro 1.5.
Scheduled Downtime Notification: A notice announced a scheduled 5-minute downtime at 9:30 AM EST, indicating that services would be back online shortly after.
- The upgrade concluded successfully in less than a minute; users were thanked for their patience during the downtime.

Links mentioned:

Tweet from undefined: no description found
Tweet from OpenRouter (@OpenRouterAI): New, high-performing open-source model from @Alibaba_Qwen: Qwen2.5 Coder 32B! Beats Sonnet and GPT-4o at several coding benchmarks. Great pricing from @hyperbolic_labs and @FireworksAI_HQ Quoting D…
Gemini Flash 1.5 - API, Providers, Stats: Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video…
Gemini Flash 1.5 - API, Providers, Stats: Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video…
Gemini Pro 1.0 - API, Providers, Stats: Google’s flagship text generation model. Designed to handle natural language tasks, multiturn text and code chat, and code generation. Run Gemini Pro 1.0 with API

OpenRouter (Alex Atallah) ▷ #general (107 messages🔥🔥):

Gemini 1.5 Flash updates

Qwen 2.5 Coder performance

Anthropic's computer use tool

Model knowledge limitations

OpenRouter pricing and features

Gemini 1.5 Flash shows improvement: Users noted that Gemini 1.5 Flash appears to have improved its performance, particularly when using it at temperature 0.
- One member speculated it might be an experimental version being used on Google AI Studio.
Excitement around Qwen 2.5 Coder: Members expressed excitement over trying the Qwen 2.5 32B Coder, noting that recent prices have become more favorable, around a dollar per million tokens.
- One user stated they had previously resorted to using DeepSeek due to higher costs.
Anthropic’s computer use tool compatibility: Discussion arose about whether Anthropic’s new computer use tool works with OpenRouter, with confirmation that it does not currently support it.
- It was mentioned that a special beta header is required, which is not supported via OpenRouter yet.
Models lacking knowledge of specific content: Concerns were raised regarding models like Hunyuan and Qwen, Yi, which reportedly lack critical knowledge about Western media and copyright issues.
- Users noted differences in performance, with some models managing to handle copyright content better than others.
OpenRouter pricing structure: It was clarified that using OpenRouter may incur around 5% additional costs for tokens through credits, as per their terms of service.
- This raised questions from users regarding the transparency of pricing and how it compares when using models directly.

Links mentioned:

no title found: no description found
Models Overview | Mistral AI Large Language Models: Mistral provides two types of models: free models and premier models.
Qwen2.5 Coder 32B Instruct - API, Providers, Stats: Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Run Qwen2.5 Coder 32B Instruct with API
Magnum v4 72B - API, Providers, Stats: This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3. Run Magnum v4 72B with API
OpenRouter: LLM router and marketplace

OpenRouter (Alex Atallah) ▷ #beta-feedback (6 messages):

Custom Provider Keys Access

General Request for Custom Provider Keys Access: Multiple users have expressed interest by requesting access to custom provider keys for beta testing.
- One member noted that this access would help them navigate Google’s rate limit issues.
Heightened Interest in Custom Provider Keys: A total of five users have requested access, showcasing a strong interest in custom provider keys within the community.
- Several users mentioned their hopes to utilize these keys for better functionality in their projects.

Nov 12, 2024 FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

Tue, Nov 12, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

3D Object Generation API

Deprecation of 3D Object Generation API: The 3D Object Generation API will be removed this Friday due to lack of interest, with fewer than five requests every few weeks.
- For more details, refer to the documentation.
Future of Alternative Features: With the removal of the 3D Object Generation API, attention may shift to alternative features that require more community engagement and interest.
- It appears that the team is focusing on improving offerings that are more actively utilized.

Link mentioned: OpenRouter): LLM router and marketplace

OpenRouter (Alex Atallah) ▷ #general (317 messages🔥🔥):

Hermes performance

Llama models usage

Qwen 2.5 Coder model

AI model updates

OpenRouter usability

Hermes struggles with stability: Users reported inconsistent responses from the Hermes model, with issues persisting for both free and paid versions under different conditions.
- Some speculated these issues might be linked to rate limits or problems on the OpenRouter’s side.
Llama 3.1 70B gaining popularity: The Llama 3.1 70B Instruct model is noted for its rising adoption, particularly within the Skyrim AI Follower Framework community.
- Comparisons are being drawn with Wizard models regarding pricing and performance, as users express curiosity about its capabilities.
Introduction of Qwen 2.5 Coder model: The Qwen 2.5 Coder model has been released, reportedly matching previous coding capabilities of Sonnet at 32B parameters.
- Users expressed excitement about its potential impacts on coding tasks within the community.
Gemini 1.5 Flash updates: Some users noticed improvements in the Gemini 1.5 Flash model, suggesting potential updates enhancing its performance and coding abilities.
- There is curiosity about possible experimental versions being tested outside the normal updates.
OpenRouter usability concerns: Feedback on OpenRouter highlighted that accessing the chatroom requires multiple steps, with requests for the process to be streamlined.
- Users expressed a desire for better usability to enhance overall engagement with the platform.

Links mentioned:

no title found: no description found
LICENSE.txt · tencent/Tencent-Hunyuan-Large at main: no description found
sbintuitions/sarashina2-8x70b · Hugging Face: no description found
OpenRouter: LLM router and marketplace
Deus Ex Deus GIF - Deus Ex Deus Ex - Discover & Share GIFs: Click to view the GIF
Apps Using Anthropic: Claude 3.5 Sonnet: See apps that are using Anthropic: Claude 3.5 Sonnet - New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good a…
OpenRouter: LLM router and marketplace
tencent/Tencent-Hunyuan-Large · Hugging Face: no description found
Meta: Llama 3.1 70B Instruct – Recommended Parameters: Check recommended parameters and configurations for Meta: Llama 3.1 70B Instruct - Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned…
Parameters API | OpenRouter: API for managing request parameters
Models: ‘meta-llama’ | OpenRouter: Browse models on OpenRouter
GitHub - QwenLM/Qwen2.5-Coder: Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.: Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud. - QwenLM/Qwen2.5-Coder

OpenRouter (Alex Atallah) ▷ #beta-feedback (7 messages):

Custom provider keys access

Integration beta feature access

Beta testing enthusiasm

High Demand for Custom Provider Keys Access: Multiple members, including requests from derpenstein69 and sohanemon, are seeking access to the custom provider keys beta feature.
- derpenstein69 expressed gratitude in their request, showing eagerness for access.
Urgent Requests for Integration Beta Feature: Members such as nanakotsai and wendic1 are actively requesting access to the integration beta feature.
- wendic1 specifically inquired about applying for access, indicating strong interest in the feature.
Betas Met with Humor and Creativity: doditz humorously pledged to be an entertaining tester while requesting access to the integration beta feature, incorporating creative elements into their message.
- Their lighthearted approach included jokes and a quirky integration idea featuring three hamsters and a rubber duck.
Playful Requests for Beta Participation: Members have taken a playful tone in their access requests, with cruciflyco humorously stating they are ‘requesting access.’
- This shows a community spirit and willingness to engage in the beta testing process enthusiastically.

Nov 08, 2024 not much happened today

Sat, Nov 9, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

New Rankings Page

MythoMax Performance

New Rankings Page Launched: The New Rankings page has been introduced to display completion request counts over time.
- Users can expect a redesign of this page in the future, enhancing the data presentation.
MythoMax Remains Dominant: MythoMax continues to hold the title of the <:hugging_king:936261298273001503>, showcasing its strong position in request counts.
- The community is acknowledging MythoMax’s consistent performance despite impending changes to the rankings page.

OpenRouter (Alex Atallah) ▷ #general (303 messages🔥🔥):

OpenRouter performance

Rate limits

Model comparisons

API issues

Command R+ alternatives

OpenRouter encounters performance issues: Users reported freezing and crashing issues when using OpenRouter on mobile devices, particularly on Android 12, leading to frustration.
- The performance issues may be related to specific chatroom activities or memory usage, as other sites work fine under similar conditions.
Confusion over rate limits and credits: There was confusion regarding the rate limit structure, where users debated the relationship between credits and requests per second, with a cap at 200.
- Users clarified that credits are not refundable, and the displayed dollar amounts are not a one-to-one match due to associated fees.
Discussions on effective AI interaction: Users shared techniques to prompt AI models effectively, suggesting that playful approaches, like offering virtual rewards, can lead to better responses.
- The conversation included observations on Gemini 1.5’s performance, noting some models are significantly better than others for task-specific outcomes.
Exploring alternatives to Command R+: After experimenting with various models, users discussed alternatives to Command R+, expressing interest in options like Hermes 405B, Euryale, and Mythomax.
- Some users mentioned the affordability of Rocinante 12B and questioned whether mythomax on OpenRouter differs from its Chub counterpart.
Command R+ model inquiry: Questions arose about the quality of Command R+ and its comparison to other models, suggesting more effective alternatives like Claude Sonnet.
- Users noted discrepancies in performance among different providers for models like Wizard, suggesting further testing is needed.

Links mentioned:

Limits | OpenRouter: Set limits on model usage
SillyTavern - LLM Frontend for Power Users: no description found
no title found: no description found
Reddit - Dive into anything: no description found
Calculate and view the bills of Qwen - Alibaba Cloud Model Studio - Alibaba Cloud Documentation Center
: no description found
Mistral Moderation API: We are introducing our new moderation service enabling our users to detect undesirable text content along several policy dimensions.
Mistral Batch API: Lower cost API for AI builders.
Mistral AI API | Mistral AI Large Language Models: Our Chat Completion and Embeddings APIs specification. Create your account on La Plateforme to get access and read the docs to learn how to use…

OpenRouter (Alex Atallah) ▷ #beta-feedback (4 messages):

Integration Beta Feature

OpenRouter Monetization

Integration Beta Feature Access Request: Multiple users have requested access to the integration beta feature, demonstrating significant interest in its rollout.
- One user received a response mentioning a forthcoming method to click a button for addition to the beta list, suggesting improvements in user experience.
OpenRouter’s Monetization Strategy: A user inquired about how OpenRouter plans to monetize its bring your own key system, raising questions about its economic viability.
- This concern highlights a vital discussion point regarding the sustainability of the platform and potential revenue streams.

Nov 08, 2024 not much happened today

Fri, Nov 8, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Completion API migration

Scheduled downtime for database upgrade

Completion API migration improves speed: All completion API requests have been migrated to the newly rewritten API, which should enhance performance and is expected to be even faster.
- Users have been encouraged to report any issues in the designated support channel.
Scheduled downtime for upgrading database: A notice has been issued about scheduled downtime on Tuesday Nov 12 at 9:30 AM ET for a database upgrade, expected to last 5 minutes.
- This upgrade is part of ongoing efforts to improve system reliability and performance.

OpenRouter (Alex Atallah) ▷ #general (224 messages🔥🔥):

Hermes Resurgence

Claude API Changes

Mistral's New Features

OpenRouter API Issues

Chinese AI Models Pricing

Hermes is Showing Signs of Life: After a tumultuous period, Hermes appears to be working again, with reports of response times ranging from 3 to 8 seconds.
- While some users still experience latency, many express optimism about its return and ongoing improvements.
Claude’s API Experiences Fluctuations: Users reported receiving a unsupported_country_region_territory error when trying to access OpenAI models via the OpenRouter API.
- Several users suggested this issue might be related to a migration to Cloudflare Workers affecting endpoint responses.
Mistral Introduces New Functionalities: Mistral has rolled out two new APIs: a moderation tool and a batch API that processes requests at 50% lower cost than synchronous calls.
- This showcases Mistral’s commitment to affordable, scalable solutions amid rising API prices in the industry.
OpenRouter Encountering API Challenges: Multiple users reported encountering a 404 error with the OpenRouter API, specifically noting an extra ’/’ in the API URL as a common mistake.
- Discussions highlighted the recent changes in API strictness leading to issues that users previously did not face.
Pricing Disparities in Chinese AI Models: There are conversations about how some Chinese AI models like Qwen and DeepSeek offer competitive pricing despite international limitations.
- However, users expressed skepticism regarding the sustainability of such low pricing compared to established models like those from OpenAI.

Links mentioned:

Parameters | OpenRouter: Configure parameters for requests
Anthropic Status: no description found
Calculate and view the bills of Qwen - Alibaba Cloud Model Studio - Alibaba Cloud Documentation Center
: no description found
Model Deprecations - Anthropic: no description found
Mistral Moderation API: We are introducing our new moderation service enabling our users to detect undesirable text content along several policy dimensions.
Mistral Batch API: Lower cost API for AI builders.
Mistral AI API | Mistral AI Large Language Models: Our Chat Completion and Embeddings APIs specification. Create your account on La Plateforme to get access and read the docs to learn how to use…

OpenRouter (Alex Atallah) ▷ #beta-feedback (5 messages):

Customer Provider Keys

Integration Beta Features

Requesting Access to Customer Provider Keys: Multiple users have expressed interest in testing out customer provider keys, requesting access to this beta feature.
- steven1015 stated, ‘Requesting access to custom provider keys beta!’ while others echoed similar requests.
Integration Beta Feature Access: A user inquired about gaining access to the integration beta feature, indicating a need for more participation in testing.
- mrhein simply asked, ‘Hello, could you give me access to the integration beta feature?’ which showcases the growing demand for this feature.

Nov 07, 2024 Not much happened today

Thu, Nov 7, 2024

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

API Migration

Latency Optimization

Completion API Updates

API Migration eliminates 524 errors: The team has rebuilt their API and migrated Chatroom requests, which has resulted in zero 524 errors detected during tests.
- Stability needs to hold for a day before migrating the rest of the API, with users encouraged to test via /api/alpha/chat/completions.
Predicted Outputs enhance editing latency: The new predicted output feature for OpenAI’s GPT4 models allows for improved latency in editing and rewriting tasks through the prediction property.
- This is implemented by providing a content-based prediction that enhances performance during text transformations.
Completion API revamped for speed: All completion API requests have transitioned to a newly rewritten API that promises enhanced speed and better performance.
- Users have been invited to report any issues using the designated feedback channel.

OpenRouter (Alex Atallah) ▷ #general (161 messages🔥🔥):

Hermes 3 performance

Claude API updates

Llama model comparisons

Rate limits and errors

PDF support for Claude

Hermes 3 shows slow responses: Users reported varied response speeds for Hermes 3, with some experiencing delays due to internet issues.
- Despite initial concerns, it seems to be functioning again, though some still notice lag.
Claude API migration and issues: There were reported 524 errors with the Claude family that seem to coincide with a migration to a new API, which was expected to resolve shortly.
- After the update, users noted that the service was functioning with occasional timeouts and suggested trying the new alpha endpoint.
Feedback on Llama and Claude models: Users discussed Llama models, particularly the frustrations with free versions like Llama 3.1, experiencing rate limit messages despite light usage.
- In contrast, paid models like Claude are reportedly functioning without issues, though some noted inconsistencies.
Rate limits and how to manage errors: Questions around rate limits and response handling led to discussions about handling 429 codes that may indicate provider issues.
- The community shared strategies for parsing such errors and managing unexpected behaviors from various LLM providers.
Potential PDF support for Claude models: Users inquired about support for PDF inputs in the new Claude Sonnet 3.5, particularly for its capabilities with visuals.
- While PDF support is still in beta, it is hinted that this functionality may become available through OpenRouter in the future.

Links mentioned:

Model Equality Testing: Which Model Is This API Serving?: Users often interact with large language models through black-box inference APIs, both for closed- and open-weight models (e.g., Llama models are popularly accessed via Amazon Bedrock and Azure AI Stu…
Limits | OpenRouter: Set limits on model usage
Gemini Flash 1.5 - API, Providers, Stats: Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video…
Anthropic Status: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (3 messages):

Custom Provider Keys

Beta Feature Access

Questions on Custom Provider Keys Requests: Members are inquiring about how to request custom provider keys and whether there are benefits beyond account maintenance.
- They expressed curiosity regarding the potential advantages these keys might provide in their projects.
Request for Beta Feature Access with Provider Keys: A member requested access to the beta feature using provider keys, indicating interest in testing this functionality.
- This sentiment was echoed by others, showing a collective eagerness to explore custom provider keys further.

Nov 06, 2024 Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data

Wed, Nov 6, 2024

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Claude 3.5 Haiku

Free Llama 3.2 models

PDF functionality in Chatroom

Sporadic timeout investigation

Predicted output for latency

Claude 3.5 Haiku released.: Anthropic launched Claude 3.5 in both standard and self-moderated variants, with additional dated options available here.
- We’re excited to see how this latest model performs in real-world applications.
Free access to Llama 3.2 models.: The Llama 3.2 models, including 11B and 90B, now offer a fast endpoint for free, achieving 280tps and 900tps respectively see details here.
- This move is expected to increase community engagement with open source models.
New PDF functionalities in Chatroom.: A new feature allows users to paste or attach a PDF in the chatroom for analysis with any model on OpenRouter.
- Additionally, the maximum purchase limit has been raised to $10,000.
Resolution of 524 errors.: The team has rebuilt the API and successfully migrated Chatroom requests, achieving zero 524 errors since the change.
- They plan to continue the migration if the stability holds over the next day, inviting users to test the new API.
Improved latency via predicted output.: The predicted output feature is now available for OpenAI’s GPT-4 models, optimizing edits and rewrites through the prediction property.
- An example code snippet demonstrates its use for more efficient processing of large text requests.

Links mentioned:

Claude 3.5 Haiku - API, Providers, Stats: Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Run Claude 3.5 Haiku with API
Claude 3.5 Haiku - API, Providers, Stats: Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Run Claude 3.5 Haiku with API
Claude 3.5 Haiku - API, Providers, Stats: Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Run Claude 3.5 Haiku with API
Claude 3.5 Haiku - API, Providers, Stats: Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Run Claude 3.5 Haiku with API
Llama 3.2 90B Vision Instruct - API, Providers, Stats: The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioni…
Llama 3.2 11B Vision Instruct - API, Providers, Stats: Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. Run Llama 3.2 11B Vision Instruct with API

OpenRouter (Alex Atallah) ▷ #general (340 messages🔥🔥):

Hermes model status

Pricing concerns with AI models

User experiences with OpenRouter

Rate limits and credits

Model recommendations for specific use cases

Hermes Model Experiences: The free version of the Hermes 405B model has been inconsistently performing, with some users reporting it works at certain times but fails often.
- Many users express hope that issues with the model signify that updates or fixes are underway.
Concerns Over Pricing and Performance: Users are discussing the high pricing for models like Claude 3.5 and Haiku, with some stating that the quality does not justify the cost.
- Conversations highlight dissatisfaction with recent downtimes and requests for prioritization of paid API requests.
User Experience on OpenRouter: Several users share mixed experiences with OpenRouter’s services, noting issues like 524 errors and choosing between various models.
- Some users have found alternatives, such as WizardLM-2 8x22B, while expressing frustrations with the current state of services.
Understanding Rate Limits and Credits: When inquiring about credits on OpenRouter, a user learns that their dollar balance directly correlates to their credits, meaning $30 equates to 30 credits.
- Rate limits are explained as being account-specific and linked to the amount of credits available.
Model Recommendations for Specific Tasks: Users discuss the suitability of various models for specific tasks, with recommendations for alternatives like Hermes and Euryale for roleplaying.
- Suggestions emphasize using open-source models for less restricted outputs compared to proprietary vendors.

Links mentioned:

New OpenAI feature: Predicted Outputs: Interesting new ability of the OpenAI API - the first time I’ve seen this from any vendor. If you know your prompt is mostly going to return the same content …
PDF.js - Home: no description found
Tweet from OpenRouter (@OpenRouterAI): PDFs in the Chatroom! You can now paste or attach a PDF on the chatroom to analyze using ANY model on OpenRouter:
Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Limits | OpenRouter: Set limits on model usage
Elevated errors for requests to Claude 3.5 Sonnet: no description found
Grok Beta - API, Providers, Stats: Grok Beta is xAI’s experimental language model with state-of-the-art reasoning capabilities, best for complex and multi-step use cases. It is the successor of [Grok 2](https://x. Run Grok Beta w…
Keys | OpenRouter: Manage your keys or create new ones
Tweet from OpenAI Developers (@OpenAIDevs): Introducing Predicted Outputs—dramatically decrease latency for gpt-4o and gpt-4o-mini by providing a reference string. https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outpu…
Hermes 3 405B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren…
tencent/Tencent-Hunyuan-Large · Hugging Face: no description found
Models | OpenRouter: Browse models on OpenRouter
Gemini Flash 1.5 - API, Providers, Stats: Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video…
Models | OpenRouter: Browse models on OpenRouter
OpenRouter Status: OpenRouter Incident History
Models & Pricing | DeepSeek API Docs: The prices listed below are in unites of per 1M tokens. A token, the smallest unit of text that the model recognizes, can be a word, a number, or even a punctuation mark. We will bill based on the tot…

OpenRouter (Alex Atallah) ▷ #beta-feedback (4 messages):

Custom Provider Beta Keys

Accessing BYOK Feature

Advantages of Custom Keys

Requesting Custom Provider Beta Keys: Multiple users expressed interest in obtaining custom provider beta keys for their development scripts, indicating they would like to experiment with this feature.
- Thanks! was a common expression of gratitude for the assistance in their requests.
Accessing Bring Your Own Keys Beta Feature: A user inquired about how to request access to the bring your own keys (BYOK) beta feature, highlighting a desire to utilize it.
- Clarification on the process for accessing BYOK was a key focus of the discussion.
Exploring Advantages of Custom Keys: Questions arose regarding the advantages of using custom keys beyond account organization, prompting speculation on additional benefits.
- One user noted potential benefits but requested further details to understand the full scope of advantages available.

Nov 05, 2024 OpenAI beats Anthropic to releasing Speculative Decoding

Tue, Nov 5, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Claude 3.5 Haiku

Free Llama 3.2

PDF functionality

Increased Credit Limit

Claude 3.5 Haiku launched: Anthropic has released its latest and fastest model, Claude 3.5 Haiku, in standard, self-moderated, and a dated variant for convenience. Check the official releases at Claude 3.5 Overview.
- The model’s variants include links for users to access updates: standard, dated, and beta releases.
Free access to Llama 3.2 models: The powerful open-source Llama 3.2 models now have a free endpoint available, with 11B and 90B variants offering enhanced speed. Users can access the models at 11B variant and 90B variant.
- The 11B variant boasts a performance of 900 tps while the 90B variant shows an impressive 280 tps.
New PDF capabilities in chatroom: The chatroom now supports direct PDF analysis through attachment or pasting, expanding the utility for users utilizing OpenRouter models. This feature enhances the interaction and versatility of model usage.
- Members can now seamlessly analyze PDFs in their communications, enhancing the collaborative capabilities of the chatroom.
Increased Maximum Credit Purchase: The maximum credit that can be purchased by users has increased to $10,000, providing more flexibility for intensive use of OpenRouter services. This change aims to accommodate larger-scale projects and user needs.
- This update allows users to manage their resources more effectively, ensuring they have sufficient credit for their operations.

Links mentioned:

Claude 3.5 Haiku - API, Providers, Stats: Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Run Claude 3.5 Haiku with API
Claude 3.5 Haiku - API, Providers, Stats: Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Run Claude 3.5 Haiku with API
Claude 3.5 Haiku - API, Providers, Stats: Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Run Claude 3.5 Haiku with API
Claude 3.5 Haiku - API, Providers, Stats: Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Run Claude 3.5 Haiku with API
Llama 3.2 90B Vision Instruct - API, Providers, Stats: The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioni…
Llama 3.2 11B Vision Instruct - API, Providers, Stats: Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. Run Llama 3.2 11B Vision Instruct with API

OpenRouter (Alex Atallah) ▷ #app-showcase (7 messages):

vnc-lm Discord Bot

Freebie Alert Chrome Extension

OpenRouter API integration

Scraping and tool calling

vnc-lm Discord Bot Adds OpenRouter Support: The developer announced that they added OpenRouter support to their Discord bot vnc-lm, originally built with ollama integration, which now supports multiple APIs.
- Features include creating conversation branches, refining prompts, and easy model switching with a quick setup via docker compose up --build. You can check it out here.
Chrome Extension Alerts for Free Samples: A member created a Chrome extension called Freebie Alert that notifies users about free samples available on amazon.co.uk while shopping.
- The extension, which interacts with the OpenRouter API, aims to help users save money and try new brands for free, and has been shared in a YouTube video.
Question on OpenRouter Relation: A user inquired whether the Freebie Alert Chrome extension is related to OpenRouter.
- The developer confirmed that it indeed calls the OpenRouter API for its functionality.
Concerns Over Scraping Practices: Multiple members discussed scraping activities, notably sharing pictures involving members using OpenRouter keys to gain free AI access.
- This poses concerns over ethical usage and the implications of public access to these keys.

Links mentioned:

GitHub - jake83741/vnc-lm: vnc-lm is a Discord bot with Ollama, OpenRouter, Mistral, Cohere, and Github Models API integration: vnc-lm is a Discord bot with Ollama, OpenRouter, Mistral, Cohere, and Github Models API integration - jake83741/vnc-lm
Freebie Alert - Chrome Web Store: Alerts for freebies while you’re shopping on amazon.co.uk

OpenRouter (Alex Atallah) ▷ #general (663 messages🔥🔥🔥):

Hermes 405b issues

API rate limits

Pricing changes for Haiku

Lambda API integration

OpenRouter features

Hermes 405b Uncertainty: The free version of Hermes 405b has been experiencing significant issues, with many users facing high latency or errors when trying to access it.
- Despite these problems, some users manage to get responses intermittently, leading to speculation about whether it’s an error related to rate limiting or temporary outages.
API Rate Limits Confusion: Users reported hitting rate limits on various models, with the ChatGPT-4o-latest model having a notably low request limit for organizations.
- There is ongoing confusion regarding the titles of models, particularly the differentiation between GPT-4o and ChatGPT-4o versions on OpenRouter.
Pricing Changes for Haiku: The pricing for Claude 3.5 Haiku recently increased significantly, raising concerns among users about its future viability as an affordable option.
- Users expressed frustration over the increased cost, especially when compared to other alternatives like Gemini Flash.
Lambda API and PDF Feature Announcement: OpenRouter introduced a feature that allows users to upload PDFs in the chatroom to analyze using any model, enhancing usability for various applications.
- This update has generated discussions on how the PDF integration works and the potential for future enhancements.
Integration of New Models: Discussions arose about possible alternatives to Hermes, with mentions of Llama 3.1 Nemotron producing good results for some users.
- There is ongoing interest in integrating other free models while navigating the current limitations of available options.

Links mentioned:

Chatroom | OpenRouter): LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Creating and highlighting code blocks - GitHub Docs: no description found
Tweet from OpenAI Developers (@OpenAIDevs): Introducing Predicted Outputs—dramatically decrease latency for gpt-4o and gpt-4o-mini by providing a reference string. https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outpu…
Unveiling Hermes 3: The First Full-Parameter Fine-Tuned Llama 3.1 405B Model is on Lambda’s Cloud: Introducing Hermes 3 in partnership with Nous Research, the first fine-tune of Meta Llama 3.1 405B model. Train, fine-tune or serve Hermes 3 with Lambda
Apps Using OpenAI: ChatGPT-4o: See apps that are using OpenAI: ChatGPT-4o - Dynamic model continuously updated to the current version of GPT-4o in ChatGPT. Intended for research and evaluation. Note: This model i…
Tweet from OpenRouter (@OpenRouterAI): PDFs in the Chatroom! You can now paste or attach a PDF on the chatroom to analyze using ANY model on OpenRouter:
hermes3:8b-llama3.1-q5_K_M: Hermes 3 is the latest version of the flagship Hermes series of LLMs by Nous Research
Anthropic: Claude 3.5 Sonnet (self-moderated) – Recent Activity: See recent activity and usage statistics for Anthropic: Claude 3.5 Sonnet (self-moderated) - The new Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same So…
Tweet from Nous Research (@NousResearch): no description found
Limits | OpenRouter: Set limits on model usage
Using the Lambda Chat Completions API - Lambda Docs: Using the Lambda Chat Completions API
Llama 3.2 3B Instruct - API, Providers, Stats: Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Run Llama 3.2 …
Grok Beta - API, Providers, Stats: Grok Beta is xAI’s experimental language model with state-of-the-art reasoning capabilities, best for complex and multi-step use cases. It is the successor of [Grok 2](https://x. Run Grok Beta w…
Hermes 3 405B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren…
Anthropic Status: no description found
Build software better, together: GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
Call (using import { createOpenRouter } from ‘@openrouter/ai-sdk-provider’;i - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Requests | OpenRouter: Handle incoming and outgoing requests
GitHub - homebrewltd/ichigo: Llama3.1 learns to Listen: Llama3.1 learns to Listen. Contribute to homebrewltd/ichigo development by creating an account on GitHub.
OpenRouter Status: OpenRouter Incident History
LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps
GitHub - OpenRouterTeam/ai-sdk-provider: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of AI models through the OpenRouter chat and completion APIs.: The OpenRouter provider for the Vercel AI SDK contains support for hundreds of AI models through the OpenRouter chat and completion APIs. - OpenRouterTeam/ai-sdk-provider
worldsim: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (16 messages🔥):

Beta access requests

Integrations feature

Custom provider keys

Waves of Requests for Beta Integration Access: Multiple users expressed their desire to gain access to the integrations feature, emphasizing a keen interest in utilizing this capability.
- Messages included persistent requests for beta access, illustrating high demand and enthusiasm among users.
Excitement for Custom Provider Keys: Several members articulated interest in obtaining access to custom provider keys, hinting at potential projects and feedback for the team.
- One user specifically mentioned developing a script and requested custom provider beta keys to explore this functionality.

Nov 01, 2024 not much happened today

Sat, Nov 2, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Hermes 3 405B removal

/api/v1/models API speedup

Hermes 3 405B Version Consolidation: The Hermes 3 405B extended version has been removed and consolidated into the standard variant, as detailed in the official announcement on OpenRouter.
- This change reflects a shift towards streamlining the available models for better user experience.
API v1 Models Speeds Up: The /api/v1/models API is undergoing a migration to a new cloud provider today, which will improve caching and significantly enhance speed.
- Post-migration, per_request_limits will be set to null always, particularly affecting users logged out or sending no API key; feedback is sought in the dedicated channel.

Link mentioned: Hermes 3 405B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren…

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Rubik's AI search interface

Beta testing opportunity

Promotional offer for premium access

Rubik’s AI Search Interface Gets a Makeover: The updated Rubik’s AI search interface has been launched, focusing on enhancing the advanced research assistant functionality significantly.
- The team is eager for feedback on the new interface and is offering an opportunity to participate in beta testing.
Call for Beta Testers: The community is invited to become beta testers for the revamped interface over the coming weeks, with participants receiving 1 month free premium access to top models including Mistral Large and Gemini-1.5 Pro.
- Interested users can utilize the promo code NEW24 at checkout to experience the new features.
Explore More About Rubik’s AI: For detailed information about the updates and offers, users can visit Rubik’s AI, and review the Terms and Privacy Policy.
- Additionally, there’s an option to join the Discord community for ongoing discussions and support.

Link mentioned: Rubik’s AI - AI Research Assistant & Search Engine: Access powerful AI models for NLP, computer vision, and more. Get instant answers from Groq, Claude-3.5 Sonnet, and GPT-4o.

OpenRouter (Alex Atallah) ▷ #general (137 messages🔥🔥):

Hermes 3 Issues

OpenRouter Setup

Alternatives to Hermes 3

ChatGPT Model Changes

Novel Writing Tools

Hermes 3 free version currently down: Users report that the free version of hermes-3-llama-3.1-405b is hanging and not returning responses in OpenRouter chat, while the standard version is functioning correctly.
- The issue is believed to be temporary, as models are still listed on OpenRouter.
Setting up OpenRouter account for novel writing: New users are encouraged to use their OpenRouter API key in conjunction with tools like Novel Crafter for writing support.
- Novel Crafter allows seamless integration, letting users manage their stories effectively.
Searching for alternatives to Hermes 3: Users are looking for free alternatives to Hermes 3, with llama-3.1-405b-instruct suggested as a potential option.
- However, some users express that no other models match the user experience provided by Hermes 3.
Concerns about ChatGPT model updates: Users discuss changes in performance with the latest chatgpt-4o model, noting the lack of search capabilities via API following recent releases.
- OpenAI admits that the model is frequently updated without user notifications, leading to concerns about consistency.
Discussion on model parameters and performance: A dialogue indicates that higher parameter counts generally lead to better performance in models, with Hermes 3 being favored over other alternatives.
- It is suggested that while parameter counts are important, the specific formatting for roleplay applications also plays a significant role in user satisfaction.

Links mentioned:

Limits | OpenRouter: Set limits on model usage
The Novel Writing Toolbox: no description found
Activity | OpenRouter: See how you’ve been using models on OpenRouter.
Llama 3.1 405B Instruct (free) - API, Providers, Stats: The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs. Meta’s latest c…
Hang First Time GIF - Hang First Time Smiles - Discover & Share GIFs: Click to view the GIF
Tweet from Shannon Code (@shannonNullCode): 👀 30 seconds from signup to live decentralized AI accessible wallet. Quoting Emblem Vault (@EmblemVault) 🏛️Emblem Vault September Town Hall🏛️ Let’s review our September highlights with a qu…
OAuth PKCE | OpenRouter: Secure user authentication via OAuth
Anthropic Status: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (6 messages):

Access to Integrations

Beta Access Requests

Multiple requests for Integration Access: Several members expressed their desire to gain access to the integrations feature, with requests stated in various forms.
- Ahoy, I would get access to integrations was a common phrase used, demonstrating a collective interest.
Inquiries on Requesting Integration Access: A member asked, how do we request for integration access? indicating a need for clarity on the process.
- This reflects a greater demand for guidance on accessing these features.
Requests for Beta Access: One member expressed enthusiasm by stating, Would love to get beta access in a lighthearted manner.
- This highlights a growing interest in participating in upcoming integrations.

Nov 01, 2024 The AI Search Wars Have Begun — SearchGPT, Gemini Grounding, and more

Fri, Nov 1, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Request timeout issues

Network connection improvements

Investigating sporadic request timeouts: The team is currently addressing an odd, sporadic network connection issue between two cloud providers that has resulted in 524 errors.
- Recent improvements appear to be helping, but the issue remains under investigation and both cloud providers are now involved.
Awaiting further updates on network issues: Members are informed that an update will be provided once more information about the request timeout issues becomes available.
- The focus continues on ensuring better connectivity between the involved cloud services.

OpenRouter (Alex Atallah) ▷ #general (107 messages🔥🔥):

OpenAI Speech-to-Speech API

Claude 3.5 Debates

OpenRouter Credits and Models

Google Search Grounding in Gemini API

Llama 3.2 Usage Limits

OpenAI Speech-to-Speech API Uncertainty: A user inquired about the availability of the new OpenAI Speech-to-Speech API, but it was stated that there is currently no estimated time of arrival.
- This lack of information left participants curious and seeking specifics regarding its rollout.
Discussion on Claude 3.5 Features: There was a heated debate regarding a supposed new ‘concise mode’, where users expressed frustration about Claude’s responses being overly restricted.
- Participants shared varied experiences, with some claiming they haven’t noticed significant changes in the API’s output.
Understanding OpenRouter Credits: Users discussed the pricing of OpenRouter credits, clarifying that it’s about $1 for about 0.95 credits after fees, which can be used to cover token costs in paid models.
- It’s also noted that free models come with limits, specifically a cap of 200 requests per day and that paid models have different rates based on usage.
Gemini API Introduces Google Search Grounding: The Gemini API has added support for Google Search Grounding, similar to its functionality in Vertex AI, though users noted that the pricing may be somewhat high.
- Discussion included how this feature could assist in grounding technical queries based on live documentation.
Llama 3.2 and Production Use: Questions arose regarding the feasibility of using Llama 3.2 for production, especially concerning its request limits and necessary credits for higher usage.
- It was pointed out that moving to paid models might be essential if one intends to exceed the free tier limits.

Links mentioned:

Limits | OpenRouter: Set limits on model usage
Quick Start | OpenRouter: Start building with OpenRouter
Activity | OpenRouter: See how you’ve been using models on OpenRouter.
Supported Models: Access Meta’s Llama 3.2 and 3.1 family of models at full precision via the SambaNova Cloud API! All models are available to all tiers, including the free tier. SambaNova is the only provider to offer…
Generative AI Scripting: GenAIScript, scripting for Generative AI.
no title found: no description found
Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (7 messages):

Integration Feature Request

Demand for Integration Access Soars: Multiple members expressed their desire for access to the integration feature, highlighting a growing interest in this capability.
- Requests came from users with varied usernames, reinforcing the notion that integration is a hot topic in the community.
Integration Request Flood: A wave of requests for integration access has emerged, with usernames like andycando14_09990 and futurplanet requesting access.
- This reflects a strong collective desire for enhancing functionality within the platform.

Oct 30, 2024 Creating a LLM-as-a-Judge

Thu, Oct 31, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Oauth issue

API key creation

Oauth authentication broke this morning: Apps utilizing openrouter.ai/auth for API key creation were affected by an Oauth issue this morning.
- The team has identified the issue, and a fix has been confirmed to be live shortly after the announcement.
Quick fix for Oauth disruption: Members noted the disruption to API key creation would be resolved quickly as the fix was confirmed shortly after the report.
- This swift response ensured minimal downtime for applications relying on the Oauth system.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Flexible Chat App for macOS

Alpha Testing

User Feedback

Seeking Alpha Testers for New Chat App: A developer is looking for alpha testers for a flexible chat app they are building for macOS and provided a link to screenshots.
- Interested users are encouraged to DM the developer for more information and to become involved in the testing phase.
Screenshots Available for Review: Screenshots of the chat app are available on Imgur showcasing its current design and features.
- The developer is eager to receive feedback from potential testers to refine the app before its public release.

Link mentioned: imgur.com: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and …

OpenRouter (Alex Atallah) ▷ #general (114 messages🔥🔥):

OpenRouter Key Issues

Model Selection in OpenRouter

Haiku 3.5 Release

Prompt Caching for Models

OpenRouter Chat Functionality

OpenRouter API Key Scraping Concerns: A discussion arose about the security of OpenRouter API keys, highlighting that keys can be scraped and misused by others, particularly in paid proxy setups like Sonnet 3.5 and Mythomax.
- Just because you think the key is secure doesn’t mean it is secure was a notable comment, emphasizing the necessity of vigilance with sensitive information.
Discrepancies in Model Selection: Users expressed confusion over the automatic selection of specific models by OpenRouter, particularly when utilizing ‘openrouter/auto’, which consistently selected Llama 3 70B Instruct despite expectations for Claude 3.5 Sonnet or GPT-4o.
- A request for examples of prompts that could trigger selections of these models was made, suggesting a need for clearer understanding of the system’s behavior.
Anticipation for Haiku 3.5 Release: The community eagerly awaited the release of Haiku 3.5, with hints suggesting it might occur within a day, despite the model not being readily available in the GCP model garden yet.
- The model slug for GCP was shared as claude-3-5-haiku@20241022, but it remains behind allow lists and is not generally available yet.
Utilization of Prompt Caching: Members discussed prompt caching’s role in reducing costs when using certain models in OpenRouter, with suggestions to enable such caching to improve efficiency.
- Clarification was provided on how prompt caching functions, and its potential limitations with specific providers, emphasizing its benefits for overall cost management.
OpenRouter Chat Saving Features: Users inquired about the saving functionality of chats within OpenRouter, confirming that chats are stored locally in the browser, which could lead to lost data if not managed properly.
- A shared link highlighted this aspect of OpenRouter, which appears to affect users trying to revisit earlier discussions.

Links mentioned:

Prompt Caching | OpenRouter: Optimize LLM cost by up to 90%
Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found
OpenRouter Status: OpenRouter Incident History
Models | OpenRouter: Browse models on OpenRouter
LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps

OpenRouter (Alex Atallah) ▷ #beta-feedback (5 messages):

Integration Feature Access

Community Request for Integration Feature Access: Multiple users expressed interest in gaining access to the integration feature within the platform, emphasizing its importance for their needs.
- One member humorously noted, ‘I would like to rerequest integration feature!’, highlighting the eagerness for this capability.
Repeated Requests for Integration Access: Several users stated their desire to test out the integration feature, indicating a broader curiosity about its functionalities.
- Comments like, ‘Hi, I would like to get access to integrations’, were common, showcasing the demand for this feature.

Oct 30, 2024 GitHub Copilot Strikes Back

Wed, Oct 30, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Inflection

Billing Issues

Inflection is back online: The billing issue last week has been fixed, and Inflection is now operational again.
- For more details, check out the links to Inflection 3 PI and Inflection 3 Productivity.
Inflection’s services restored: After resolving the previous billing issues, Inflection’s services have been fully restored to users.
- This update signifies a return to normal operations, enhancing productivity for all users.

Link mentioned: Inflection 3 Productivity - API, Providers, Stats: Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. Run Inflection 3 Productivity with API

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Flexible chat app for macOS

Alpha testers recruitment

Seeking Alpha Testers for macOS Chat App: A developer is looking for alpha testers for their new flexible chat app designed for macOS, sharing screenshots to showcase the current progress.
- DM if interested in participating in the testing phase as the project reaches this crucial milestone.
Screenshots Showcase Excitement: The shared screenshots highlight various features and user interface designs of the upcoming chat app.
- Feedback on the design and functionality is welcomed as the developer seeks thorough testing from interested users.

OpenRouter (Alex Atallah) ▷ #general (46 messages🔥):

OpenRouter API issues

API Key Security

Service Outages

Activity Logging

Usage Tracking Tools

OpenRouter API responses plagued with issues: Users reported persistent 524 errors leading to stalled requests across various models, prompting concerns about stability before going public.
- One user indicated that they might need to consider switching providers due to the recurring slowdown issues affecting multiple requests.
Concerns over API key security: There was a discussion regarding potential scraping of API keys, with suggestions that models like Claude 3.5 Sonnet could be used by unauthorized proxies in exploitation scenarios.
- Users highlighted the importance of keeping keys secure, but questions arose about how vulnerabilities may lead to leaks despite perceived safety measures.
Activity logging inquiries: A user inquired about fetching their activity programmatically, receiving guidance that only the /generations endpoint is currently available.
- Further discussion emphasized the lack of comprehensive logging capabilities and effectiveness in tracking all activities without the OpenRouter UI.
Tracking usage with external tools: Members were encouraged to utilize tools like Helicone for tracking usage and managing API activities effectively.
- This was recommended in light of concerns raised over unexpected surges in activity not initiated by the users themselves.
Impact of sensitive information leaks: The conversation shifted towards the risks associated with leaking personally identifiable information (PII) or sensitive data during LLM interactions.
- One member shared a personal anecdote about revealing their name through inadvertently pasting terminal commands into an LLM, illustrating the potential for data exposure.

Links mentioned:

Activity | OpenRouter: See how you’ve been using models on OpenRouter.
OpenRouter Integration - Helicone OSS LLM Observability: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (8 messages🔥):

Access to Integrations

Beta Access Requests

Multiple Requests for Access to Integrations: Several members expressed their desire for access to integrations, stating phrases like ‘I would like to get access’ and ‘I kindly ask you to grant me access’.
- Thanks in advance was a common sentiment, emphasizing the polite requests made across the board.
Student Researcher Seeks Beta Access: One member, identifying as a student researcher, specifically requested access to the beta, indicating a potential academic interest in the project.
- This request was among various similar access inquiries focusing on integrations.

Oct 28, 2024 not much happened this weekend

Tue, Oct 29, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Inflection

Inflection Billing Issue Resolved: Inflection has returned online after resolving the recent billing issue from last week, ensuring uninterrupted access for users.
- For more details, visit the Inflection 3 PI page and Inflection 3 Productivity page.
Inflection Product Offering Clarification: Alongside the billing resolution, Inflection clarified its product offerings, showcasing features aimed at productivity enhancement.
- Details can be explored through their dedicated product links shared previously.

Links mentioned:

Inflection 3 Pi - API, Providers, Stats: Inflection 3 Pi powers Inflection’s Pi chatbot, including backstory, emotional intelligence, productivity, and safety. Run Inflection 3 Pi with API
Inflection 3 Productivity - API, Providers, Stats: Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. Run Inflection 3 Productivity with API

OpenRouter (Alex Atallah) ▷ #general (364 messages🔥🔥):

OpenRouter Connectivity Issues

Sonnet Model Performance Changes

Grok 2 Multimodal Release

Prompt Engineering Techniques

Model Parameters and Providers

OpenRouter Connectivity Issues Persist: Users have reported intermittent connectivity issues with OpenRouter, experiencing Cloudflare errors such as 520 and 524, despite the status page showing green.
- Some users suggested testing connectivity through different browsers, and there were indications that issues might be more pronounced from Europe.
Sonnet Model Performance Changes Observed: Several users noted a decline in response quality and an increase in follow-up questions from the Sonnet model, which they found to be more generic than before.
- There are suspicions that the model’s behavior may have changed following the restriction of the free version, with users feeling that it is less responsive compared to previous interactions.
Grok 2 Multimodal Capabilities Released: Discussion centered around the recent announcement of Grok 2’s multimodal features, which allows it to understand images alongside text.
- Users expressed curiosity about the implications and capabilities of this new model, especially in comparison to existing models.
Effective Prompt Engineering Techniques: Tips for crafting effective prompts were shared, focusing on how to elicit longer responses from models by providing detailed instructions.
- Users shared examples of structured prompts designed to maximize output length and quality, highlighting the importance of specificity.
Understanding Model Parameters and Providers: The conversation included discussions about listing model capabilities and the potential differences across providers concerning output length and response quality.
- Users queried the existence of models with extended token limits and shared information about different providers’ capabilities in handling them.

Links mentioned:

Internet Speed Test - Measure Network Performance | Cloudflare: Test your Internet connection. Check your network performance with our Internet speed test. Powered by Cloudflare’s global edge network.
Activity | OpenRouter: See how you’ve been using models on OpenRouter.
Open VLM Leaderboard - a Hugging Face Space by opencompass: no description found
Pixar Draw GIF - Pixar Draw Single - Discover & Share GIFs: Click to view the GIF
Tweet from Niels Rogge (@NielsRogge): A new video LLM by @Meta dropped on the hub, and it’s the new SOTA for open-source video understanding > builds on top of SigLIP/DINOv2 and Qwen2/Llama 3.2 > includes a 3B parameter model f…
OpenRouter Status: OpenRouter Incident History
Automated clearing house - Wikipedia: no description found
The killer app of Gemini Pro 1.5 is video: Last week Google introduced Gemini Pro 1.5, an enormous upgrade to their Gemini series of AI models. Gemini Pro 1.5 has a 1,000,000 token context size. This is huge—previously that …
Hermes 3 405B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren…
Tweet from Elon Musk (@elonmusk): Grok now understands images, even explaining the meaning of a joke. This is an early version. It will rapidly improve. https://x.com/i/grok/share/roBaNzwhuOYzfHUuQLo4OWXjO
GitHub - sashabaranov/go-openai: OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go: OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go - sashabaranov/go-openai

OpenRouter (Alex Atallah) ▷ #beta-feedback (7 messages):

Access to Integrations

Many Users Seek Access to Integrations: Numerous users have expressed their interest in obtaining access to integrations, indicating a strong demand for this feature.
- The requests were polite and consistent, with phrases like ‘I would get access to integrations. Thanks’ signaling a unified approach.
Polite Appeals for Integration Permissions: Users have been courteously requesting to gain access to integrations, pointing to an engaged community eager for new features.
- Messages repeatedly highlighted the desire for access, with multiple instances of users thanking others for potential assistance.

Oct 26, 2024 not much happened today

Sat, Oct 26, 2024

OpenRouter (Alex Atallah) ▷ #general (127 messages🔥🔥):

Cerebras API Acceptance

Censorship Issues with Models

Prompt Caching

Token Limits and Errors

Performance of New Models

Cerebras API Acceptance and Usage: Several users shared their experiences with Cerebras API, with one mentioning they received access over a month ago, while others reported obtaining keys without formal acceptance.
- Discussion included the ease of accessing the API key and the potential issues concerning manageable chip costs versus performance.
Censorship Concerns Arise with Hermes-3: A user raised the question of whether hermes-3-llama-3.1-405b has been censored, indicating community concerns around model restrictions.
- The discourse reflects ongoing uncertainty about acceptable content parameters for AI models.
Prompt Caching Capabilities Discussed: The community discussed the availability of prompt caching for Sonnet models on OpenRouter, highlighting its benefits in optimizing API usage.
- A user noted implementation difficulties, specifically when working with external applications like SillyTavern.
API Token Limits Cause Confusion: A user expressed frustration over receiving a max tokens limit error despite having $16 in credits, sparking conversation around API key limits and potential configurations.
- The consensus suggested creating new API keys as a probable solution, alongside checking on account credit status.
Performance Issues and API Reliability: Users reported experiencing slowdowns and receiving error 520, indicating concerns over system reliability.
- Several discussions highlighted the challenges related to hardware supply issues affecting performance, especially around advanced models.

Links mentioned:

Notes on the new Claude analysis JavaScript code execution tool: Anthropic released a new feature for their Claude.ai consumer-facing chat bot interface today which they’re calling “the analysis tool”. It’s their answer to OpenAI’s ChatGPT Code Interpreter mode: Cl…
Internet Speed Test - Measure Network Performance | Cloudflare: Test your Internet connection. Check your network performance with our Internet speed test. Powered by Cloudflare’s global edge network.
Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Tweet from Andrej Karpathy (@karpathy): It’s a bit sad and confusing that LLMs (“Large Language Models”) have little to do with language; It’s just historical. They are highly general purpose technology for statistical model…
Credits | OpenRouter: Manage your credits and payment history
Running prompts against images and PDFs with Google Gemini: New TIL. I’ve been experimenting with the Google Gemini APIs for running prompts against images and PDFs (in preparation for finally adding multi-modal support to LLM…
Tweet from Cerebras (@CerebrasSystems): We broke all records when we launched Cerebras Inference in August. Today we are tripling our performance from 650 t/s to 2100 t/s. Cerebras Inference speed is in a league of its own – 16x faster than…
Prompt caching (beta) - Anthropic: no description found
Llama 3.1 70B: API Provider Performance Benchmarking & Price Analysis | Artificial Analysis: Analysis of API providers for Llama 3.1 Instruct 70B across performance metrics including latency (time to first token), output speed (output tokens per second), price and others. API providers benchm…
Elevated errors for requests to Claude 3.5 Sonnet: no description found
Prompt Caching | OpenRouter: Optimize LLM cost by up to 90%
Settings | OpenRouter: Manage your accounts and preferences
Provider Routing | OpenRouter: Route requests across multiple providers
OpenRouter Status: OpenRouter Incident History
Keys | OpenRouter: Manage your keys or create new ones
OpenRouter: LLM router and marketplace

OpenRouter (Alex Atallah) ▷ #beta-feedback (7 messages):

OpenRouter Integrations

Anthropic/Claude API Access

Users Seek Access to Integrations: Multiple users are inquiring about how to obtain access to integrations on the platform, expressing their interest in utilizing various features.
- Many members reiterated their requests, highlighting a common desire to integrate their workflows.
Plugging Anthropic/Claude API into OpenRouter: One user mentioned the intention to connect their Anthropic/Claude API key to OpenRouter for utilizing Sonnet, indicating a push towards integration.
- This shows a growing interest in leveraging APIs within the OpenRouter environment for enhanced functionality.

Oct 25, 2024 s{imple|table|calable} Consistency Models

Fri, Oct 25, 2024

OpenRouter (Alex Atallah) ▷ #general (211 messages🔥🔥):

OpenRouter Tool Use

Cloudflare Issues

Hermes 3.5 Access

Cerebras Speed Improvements

Anthropic Analysis Tool

OpenRouter’s Tool Use Guidelines: Users discussed how to check if a model supports tool use, directed to a specific page for details.
- There was confusion over maintaining functionality when mixing models with and without tool calls, indicating prior issues with tool role usage.
Cloudflare Experiences: Several users noted intermittent access issues with OpenRouter, citing Cloudflare errors such as 524 and reports of stuck loading screens.
- Some members confirmed that these issues were temporary and resolved upon reloading the site.
Access Problems with Hermes 3.5: Users shared their inability to access the Hermes 3.5 405B instruct model, with some experiencing empty responses or 404 errors.
- It was found that adjusting provider settings in OpenRouter resolved these access issues for some users.
Speed Improvements from Cerebras: Cerebras announced a new speed improvement following previous performance updates, but some users noted fluctuating TPS rates.
- It was speculated that this fluctuating performance could be due to dynamic throttling during high volume usage.
Anthropic’s New Analysis Tool: Anthropic unveiled an analysis tool for their Claude chatbot, allowing users to execute code directly within the client browser, as an alternative to traditional secure sandboxes.
- This tool was demonstrated through attempts to upload a dependency file and prompting the AI to generate a parser and visualization for it.

Links mentioned:

Notes on the new Claude analysis JavaScript code execution tool: Anthropic released a new feature for their Claude.ai consumer-facing chat bot interface today which they’re calling “the analysis tool”. It’s their answer to OpenAI’s ChatGPT Code Interpreter mode: Cl…
Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
OpenRouter): LLM router and marketplace
Not Today GIF - Miami Heat Defense - Discover & Share GIFs: Click to view the GIF
Settings | OpenRouter: Manage your accounts and preferences
Inflection 3 Pi - API, Providers, Stats): Inflection 3 Pi powers Inflection’s Pi chatbot, including backstory, emotional intelligence, productivity, and safety. Run Inflection 3 Pi with API
Models | OpenRouter): Browse models on OpenRouter
Claude 3.5 Sonnet (2024-06-20) - API, Providers, Stats: Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Run Claude 3.5 Sonnet (2024-06-20) with API
OpenRouter Status: OpenRouter Incident History
Requests | OpenRouter: Handle incoming and outgoing requests

OpenRouter (Alex Atallah) ▷ #beta-feedback (7 messages):

Integration Access Requests

OpenRouter Usage

Failover Options

Increased Demand for Integration Access: Multiple users expressed their interest in gaining access to integrations settings, highlighting a growing need for functionality.
- The requests emphasize the urgency, with one user noting they heavily rely on OpenRouter for their workloads.
Critical Need for Failover Options: A user highlighted the necessity of integration access, mentioning it as a failover option due to erratic responses in certain models.
- They stated, ‘We really need this as an option to failover,’ showcasing the importance of reliable integration.

Oct 24, 2024 not much happened today

Thu, Oct 24, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Claude 3.5 Sonnet

Lumimaid v0.2

Magnum v4

Discounts on Models

Claude 3.5 Sonnet Versions Released: Older versions of Claude 3.5 Sonnet have been released and are available for download, timestamped for reference: Claude 3.5 Sonnet and Claude 3.5 Sonnet: Beta.
- These releases come from OpenRouter, providing users access to previous iterations.
Lumimaid v0.2 Introduced: Lumimaid v0.2 is now available, serving as a finetuned version of Llama 3.1 70B with a significantly enhanced dataset compared to Lumimaid v0.1, accessible at this link.
- Users can expect improved performance due to the updates in the dataset specifics.
Magnum v4 Launches with Unique Features: Magnum v4 has been released and finetuned to replicate the prose quality similar to Sonnet and Opus, available here.
- This model continues the trend of enhancing prose quality in AI written outputs.
Exciting Discounts on Magnum Models: Both Magnum v1 and Magnum v4 are currently available at half price from Mancer for a limited time.
- This discount offers a great opportunity for users to explore these new models at a reduced cost.

Links mentioned:

Lumimaid v0.2 70B - API, Providers, Stats**): Lumimaid v0.2 70B is a finetune of [Llama 3. Run Lumimaid v0.2 70B with API
Magnum v4 72B - API, Providers, Stats**): This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3. Run Magnum v4 72B with API
Claude 3.5 Sonnet (2024-06-20) - API, Providers, Stats: Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Run Claude 3.5 Sonnet (2024-06-20) with API
Claude 3.5 Sonnet (2024-06-20) (self-moderated) - API, Providers, Stats: Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Run Claude 3.5 Sonnet (2024-06-20) (self-moderated) with API

OpenRouter (Alex Atallah) ▷ #general (272 messages🔥🔥):

OpenRouter Model Updates

API Key Usage

Prompt Caching

Tool Use in Models

Website Access Issues

OpenRouter Model Updates: The discussions highlighted the release of the new Sonnet 3.5 model and the implications for existing users, who are navigating between different versions.
- Users were informed that the API names for models remain the same, meaning current implementations are likely using the new version.
API Key Usage: Multiple users mentioned the differences in API costs when using OpenRouter compared to direct provider keys, with some reporting unexpected charges.
- Alerts were raised regarding the importance of understanding how different models incur costs under OpenRouter.
Prompt Caching: Concerns were raised about the functionality of prompt caching, with several users noting that it appears to be not working correctly for some models.
- It was suggested that prompt caching had been tested before the app switched to a new model version.
Tool Use in Models: Users expressed interest in integrating tool use selectively with models that do not use tool calls by default, seeking advice on implementation strategies.
- Questions were raised about OpenRouter’s support for a ‘tool’ role and how effectively tool calling can be implemented across different models.
Website Access Issues: Several users reported difficulties accessing OpenRouter’s website, with some experiencing loading issues while using different browsers.
- After some time, users confirmed that the website began functioning properly again.

Links mentioned:

Full Stack && Web3 Developer: I am a highly skilled blockchain and full stack developer with extensive experience in designing and implementing complex decentralized applications and web solutions.
Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
A Small Step Towards Reproducing OpenAI o1: Progress Report on the Steiner Open Source Models
OpenRouter: LLM router and marketplace
Aider LLM Leaderboards: Quantitative benchmarks of LLM code editing skill.
Prompt Caching | OpenRouter: Optimize LLM cost by up to 90%
Claude 3.5 Sonnet - API, Providers, Stats: New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Run Claude 3.5 Sonnet with API
Cocomelon Jj Cocomelon GIF - Cocomelon Jj Cocomelon Loose Tooth Song - Discover & Share GIFs: Click to view the GIF
Meta: Llama 3.1 70B Instruct – Provider Status: See provider status and make a load-balanced request to Meta: Llama 3.1 70B Instruct - Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-t…
Models | OpenRouter): Browse models on OpenRouter
Quick Start | OpenRouter: Start building with OpenRouter
OpenRouter: LLM router and marketplace
Discord - Group Chat That’s All Fun & Games: Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.
OpenRouter: LLM router and marketplace
Activity | OpenRouter: See how you’ve been using models on OpenRouter.
OpenRouter: LLM router and marketplace
OpenRouter: LLM router and marketplace
Keys | OpenRouter: Manage your keys or create new ones
Requests | OpenRouter: Handle incoming and outgoing requests
OpenRouter: LLM router and marketplace
AI SDK Core: Tool Calling: Learn about tool calling with AI SDK Core.
Models | OpenRouter: Browse models on OpenRouter
Conceptual guide | 🦜️🔗 Langchain: This section contains introductions to key parts of LangChain.
GitHub - mem0ai/companion-nextjs-starter: Contribute to mem0ai/companion-nextjs-starter development by creating an account on GitHub.

OpenRouter (Alex Atallah) ▷ #beta-feedback (10 messages🔥):

Beta Access for Custom Provider Keys

Integrations Settings Access Requests

Custom Provider Keys in Beta: Custom provider keys are currently in beta, with requests for access facilitated through a specific Discord channel.
- Self-signup isn’t available, but members can DM their OpenRouter email addresses for access.
Multiple Requests for Integrations Access: Several members expressed interest in beta access for the integrations settings, asking for directions on how to proceed.
- One member noted, ‘Hello, I also need a beta access for the integrations settings,’ reflecting the common desire for access.

Oct 23, 2024 Claude 3.5 Sonnet (New) gets Computer Use

Wed, Oct 23, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Claude 3.5 Sonnet

Llama 3.1 Nitro

Ministral updates

Grok Beta

Claude self-moderated endpoints

Claude 3.5 Sonnet achieves benchmark improvements: The Claude 3.5 Sonnet shows significant improvements across various benchmarks with no code changes required for users to try it out. More details can be found in the launch announcement here.
- Members noted that hovering over the info icon next to providers reveals when models get upgraded, making it easy to track improvements.
Lightning Fast Llama 3.1 Nitro is here: The Llama 3.1 405b Nitro is now available, boasting a speed increase of about 70% over the next fastest provider. Direct links to the new endpoints are provided: 405b and 70b.
- These super-fast and premium endpoints promise a throughput of around 120 tps, captivating user interest.
Ministral brings powerful new models to the table: Mistral introduced the Ministral 8b, capable of 150 tps and featuring a high context of 128k, currently ranking #4 for tech prompts. An economical 3b model has also been made available at this link.
- Users expressed excitement about the performance and price, with both models appealing to different budget ranges.
Grok Beta emerges with expanded capabilities: Grok 2 has now been renamed to Grok Beta, featuring an increased context length of 131,072 and a new output price of $15/m. Furthermore, the legacy x-ai/grok-2 requests are aliased to x-ai/grok-beta for user continuity.
- The community welcomed this update, anticipating improved functionalities and clarifications in the pricing model.
Poll on ideal experiences for Claude self-moderated endpoints: A poll was initiated to gather community feedback on the ideal experience for the Claude self-moderated (:beta) endpoints, which are currently topping the leaderboard. Members can voice their opinions by voting in the poll here.
- The engagement from users indicates a strong interest in shaping the future experience of these endpoints.

Links mentioned:

Llama 3.1 405B Instruct (nitro) - API, Providers, Stats: The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs. Meta’s latest c…
Llama 3.1 70B Instruct (nitro) - API, Providers, Stats: Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 70B Instruct (nitro) with API
Ministral 8B - API, Providers, Stats: Ministral 8B is an 8B parameter model featuring a unique interleaved sliding-window attention pattern for faster, memory-efficient inference. Designed for edge use cases, it supports up to 128k contex…

OpenRouter (Alex Atallah) ▷ #general (455 messages🔥🔥🔥):

New Claude 3.5 Sonnet

OpenRouter API

Computer Use feature

Model pricing

Haiku 3.5 release

New Claude 3.5 Sonnet released: The new Claude 3.5 Sonnet model has been officially launched and is available on OpenRouter.
- Users expressed excitement about its capabilities and recent improvements, with comments noting speed and performance.
OpenRouter API keys and usage: New users inquired about how to obtain and use API keys from the OpenRouter platform, confirming that keys allow access to all available models.
- It was suggested that users use OpenRouter Playground for ease of access and testing.
Introduction of Computer Use feature: Anthropic announced a new ‘Computer Use’ feature that allows users to provide their own computer for the AI to operate.
- This capability was described as innovative and useful, although concerns about potential misuse and security were also raised.
Model pricing discussion: The pricing for using models like Claude has been discussed, highlighting costs of around $18 per million tokens for some options.
- Users mentioned comparing costs among various models, including DeepSeek and Qwen, within a context of lower-cost alternatives.
Upcoming Haiku 3.5 Release: The release date for the new Haiku 3.5 model was announced to be later this month, although specific details are still awaited.
- Users were looking forward to this release and speculated on its impact and performance compared to existing models.

Links mentioned:

Tweet from Anthropic (@AnthropicAI): Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use. Developers can now direct Claude to use computers the way …
Initial explorations of Anthropic’s new Computer Use capability: Two big announcements from Anthropic today: a new Claude 3.5 Sonnet model and a new API mode that they are calling computer use. (They also pre-announced Haiku 3.5, but that’s …
Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Quick Start | OpenRouter: Start building with OpenRouter
Activity | OpenRouter: See how you’ve been using models on OpenRouter.
abacusai/Dracarys2-72B-Instruct - Featherless.ai: Featherless - The latest LLM models, serverless and ready to use at your request.
Full Stack && Web3 Developer: I am a highly skilled blockchain and full stack developer with extensive experience in designing and implementing complex decentralized applications and web solutions.
Keys | OpenRouter: Manage your keys or create new ones
abacusai/Dracarys2-72B-Instruct · Hugging Face: no description found
Malding Weeping GIF - Malding Weeping Pov - Discover & Share GIFs: Click to view the GIF
Models | OpenRouter: Browse models on OpenRouter
OpenRouter: LLM router and marketplace
Uncaught APIError in utils.py line 8123 · Issue #2104 · Aider-AI/aider: Aider version: 0.59.1 Python version: 3.11.9 Platform: Windows-10-10.0.22631-SP0 Python implementation: CPython Virtual environment: Yes OS: Windows 10 (64bit) Git version: git version 2.43.0.windo…
What’s the benefits of using this provider instead of @ai-sdk/openai? · Issue #4 · OpenRouterTeam/ai-sdk-provider: It would be helpful to explain the difference between the two. I can use OpenRouter’s model through @ai-sdk/openai, and it’s actively maintained.
Reddit - Dive into anything: no description found

Oct 22, 2024 DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

Tue, Oct 22, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Inflection Payment Issues

Grok Beta Rename

Grok Pricing Increase

Liquid LFM Pricing Updates

Inflection’s Payment Processor Down: Due to payment processing issues, both Inflection 3 Pi and Inflection 3 Productivity models are currently down until further notice.
- This situation directly affects the usage and access to these models for all users.
Grok 2 Renamed to Grok Beta: xAI has requested that Grok 2 be renamed to Grok Beta, with requests to x-ai/grok-2 now aliasing to x-ai/grok-beta.
- This change reflects the product’s positioning in its development phase.
Grok Pricing Now at $15/M: The pricing for Grok completions has increased to $15/M with a note of excitement as the context length has been expanded to 131,072.
- This extended context allows for more complex and detailed interactions.
Liquid LFM Pricing Adjustments: Starting this week, Liquid LFM 40b will be priced at $1/M input and $2/M output, while the :free variant will still be available.
- These pricing changes aim to enhance the model’s value and accessibility.

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

AI powered text summarizer

Vercel function timeout

OpenRouter API response time

Streaming responses

Alternative models

Building an AI Summarizer Faces Vercel Timeout: A developer shared their struggle with deploying an AI powered text summarizer using Gemma 2 27B on Vercel’s hobby plan, experiencing a FUNCTION TIMEOUT error after 10 seconds of response time from the OpenRouter API.
- They provided a link to their project and a GitHub Repo for further exploration.
Increasing Vercel Function Execution Time: A suggestion was made to raise the default timeout duration for Vercel functions from 10 seconds to a maximum of 60 seconds as per the Vercel documentation.
- It was emphasized that this change is crucial to avoid function termination that occurs when exceeding the set maximum duration.
Exploring Alternative Solutions for Timeout Issues: Alternatives were proposed, including streaming responses to prevent waiting for full summaries, which could help mitigate the timeout problem.
- Suggestions were also made to consider using faster models like Gemini Flash or one of the Llama models with Samba Nova for improved performance.

Links mentioned:

Configuring Maximum Duration for Vercel Functions: Learn how to set the maximum duration of a Vercel Function.
no title found: no description found

OpenRouter (Alex Atallah) ▷ #general (225 messages🔥🔥):

OpenRouter model issues

Grok 2

Hermes 3

Billing problems

AI model capabilities

Grok 2 experiences fluctuations and pricing updates: Users are experiencing frequent downtimes with Grok 2, alongside repeated pricing changes that have raised costs to $15 per month.
- Some users express frustration over the inconsistent performance and the need for better features to justify the price increase.
Issues with Hermes 3 model performance: Several users report receiving a 429 error when using the Hermes 3 model, indicating they are hitting rate limits more frequently than before.
- This has caused dissatisfaction as users note it used to function without these restrictions.
Billing issues faced by users: A user reports problems with the OpenRouter billing system, which has led to unexpected charges despite having credits.
- Others confirm they had similar issues and suggest contacting support for resolution.
Discussion on model capabilities for structured prompts: Users are exploring which models, like airoboros-70b, are best for handling structured outputs and requests for specific tasks.
- There is an ongoing inquiry about performance comparisons among various models in terms of uncensored content generation.
Concerns over Azure and HareProxy services: Users express concerns over the HareProxy service showing up unexpectedly in their activity feed, noting reports of it being unreliable.
- Discussions also touch on Azure’s reliability compared to other model providers, with some users preferring specific alternatives.

Links mentioned:

AGI House: no description found
AGI House): no description found
AGI House: no description found
AGI House): no description found
Full Stack && Web3 Developer: I am a highly skilled blockchain and full stack developer with extensive experience in designing and implementing complex decentralized applications and web solutions.
no title found: no description found
hareproxy-inst-1: no description found
Nous: Hermes 3 405B Instruct – Provider Status: See provider status and make a load-balanced request to Nous: Hermes 3 405B Instruct - Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabili…
Hermes 3 405B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren…
OpenRouter Status: OpenRouter Incident History
GitHub - deepseek-ai/Janus: Contribute to deepseek-ai/Janus development by creating an account on GitHub.
every-chatgpt-gui/README.md at main · billmei/every-chatgpt-gui: Every front-end GUI client for ChatGPT, Claude, and other LLMs - billmei/every-chatgpt-gui

OpenRouter (Alex Atallah) ▷ #beta-feedback (3 messages):

Custom Provider Keys

Self-Service Integration Sign Up

Request for Beta Access to Custom Provider Keys: A member expressed interest in obtaining beta access for custom provider keys, stating their desire directly.
- No immediate response was given, and the member remained understanding about the situation.
Self-Service Integration Sign Up Delayed: A member highlighted that self-service sign up for integrations has been promised but is not yet available.
- They suggested that the interested member will have to wait, providing a link to the relevant discussion: Integration Updates.

Oct 18, 2024 DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality

Sat, Oct 19, 2024

OpenRouter (Alex Atallah) ▷ #general (167 messages🔥🔥):

Nemotron 70B performance

OpenRouter data policies

GPT-4o model responses

Privacy policy linking

Kuzco as a provider

Nemotron 70B and Llama Comparison: Discussion emerged comparing the Nemotron 70B and Llama 70B models, with varying opinions on performance and capabilities. A notable point mentioned was that Nvidia did not market Nemotron as a knowledge improvement model but focused on its helpfulness.
- Users speculated about the upcoming 405B model and discussed the cost-effectiveness of various models.
Clarifying OpenRouter Data Policies: Questions arose regarding data policies for providers on OpenRouter, including the security practices and legal guarantees around user data. It was noted that turning off model training settings ensures requests are not used for training as confirmed by privacy policies.
- Concerns were raised about the lack of links to privacy policies for some providers, which were subsequently addressed.
Inconsistencies with GPT-4o Model Responses: Users reported that when inquiring about the model used in chat sessions, GPT-4o-mini and GPT-4o returned inaccurate references to GPT-3 and GPT-3.5, respectively. This discrepancy was normal as models often lack awareness of their branding and versioning.
- It’s common for models to provide inaccurate self-references unless specifically prompted about their architecture.
Updating Privacy Policies for Providers: Various users pointed out missing links to privacy policies for providers like Mistral and Together, which were later acknowledged. The importance of linking privacy policies for transparency about data usage was emphasized.
- It was confirmed that providers are required to have a data-related agreement linked to their ToS for user confidence.
Kuzco Considered as a New Provider: A discussion stemmed around the potential addition of Kuzco as a provider for Llama due to their attractive pricing model. Early conversations were ongoing but prioritization had yet to be finalized.
- Participants expressed interest in the new provider while remaining informed on their offerings.

Links mentioned:

Berkeley Function Calling Leaderboard V3 (aka Berkeley Tool Calling Leaderboard V3) : no description found
Parameters API | OpenRouter: API for managing request parameters
Kuzco | LLM Inference Network: no description found
Tweet from OpenAI Developers (@OpenAIDevs): 🔊 The Chat Completions API supports audio now. Pass text or audio inputs, then receive responses in text, audio, or both. https://platform.openai.com/docs/guides/audio
deepseek-ai/Janus-1.3B · Hugging Face: no description found
OpenRouter: LLM router and marketplace
Models | OpenRouter: Browse models on OpenRouter
Limits | OpenRouter: Set limits on model usage
Reddit - Dive into anything: no description found
Models | OpenRouter: Browse models on OpenRouter
Llama 3.1 Nemotron 70B Instruct - API, Providers, Stats: NVIDIA’s Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Run Llama 3.1 Nemotron 70B Instruct with API

Oct 18, 2024 not much happened today

Fri, Oct 18, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

NVIDIA Nemotron 70B

Grok 2 Pricing Update

NVIDIA Nemotron 70B Crushes Competition: The NVIDIA Nemotron 70B has outperformed Llama 3.1 405B, GPT-4o, and Claude 3.5 Sonnet in several evaluations, reporting scores of 85.0 in Arena Hard, 57.6 in AlpacaEval 2 LC, and 8.98 in MT Bench.
- You can check out the results and try it here.
Grok 2 Repriced as Costs Rise: xAI has increased the pricing for Grok 2, now costing $5/m input and $10/m output, while the Grok 2 Mini remains unavailable.
- Despite the price hike, Grok 2 is trending and can be accessed here.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Big day for open source: NVIDIA Nemotron 70B Nemotron beat Llama 405B, GPT-4o & Claude 3.5 Sonnet on several evals: Nemotron 70B vs Claude 3.5 vs GPT4o: > Arena Hard: 85.0 | 79.2 …
Grok 2 - API, Providers, Stats: Grok 2 is xAI’s frontier language model with state-of-the-art reasoning capabilities, best for complex and multi-step use cases. To use a faster version, see Grok 2 Mini. Ru…

OpenRouter (Alex Atallah) ▷ #general (233 messages🔥🔥):

Grok 2 status

OpenRouter performance

Voice interaction models

Deepseek model updates

O1 model performance

Grok 2 returns with higher prices: Grok 2 has made a comeback, now priced at $5/$10, while the mini version is still unavailable. Users expressed their surprise at the price increase and discussed its implications.
- A link to its current offering was shared, providing more details on its features and pricing here.
OpenRouter’s models and pricing overview: Discussion covered various models available via OpenRouter, with notable mentions of SambaNova and its scalable nature compared to Groq. Users noted the intriguing pricing of Yi Lightning at $0.14/m input and its advantages over competitors.
- It was suggested that deeper insights into the pricing structures of in-house chip inference providers might be forthcoming as pay-as-you-go models become more widely available.
Voice interaction limitations with various LLMs: Concerns were raised about the performance of voice features in models like GPT-4o, particularly its handling of different languages and audio output quality. Users highlighted that while voice input works adequately, the voice output can become ‘funky’ especially with languages like Chinese.
- The consensus was that Google’s Gemini managed to release their voice input earlier due to their consistent design standards.
Updates on Deepseek Models: The conversation included updates on Deepseek and speculation around potential new versions, like Deepseek-vl 2. Users expressed curiosity about the current state of models from Deepseek and their future capabilities.
O1 model’s usability concerns: Users discussed O1’s performance, noting difficulties in following instructions and issues with referencing prior conversation history. Some found it rambled excessively without providing coherent outputs, raising concerns about its practical applications in various tasks.

Links mentioned:

Gyazo:
Quick Start | OpenRouter: Start building with OpenRouter
OpenRouter: LLM router and marketplace
OAuth PKCE | OpenRouter: Secure user authentication via OAuth
Grok 2 - API, Providers, Stats: Grok 2 is xAI’s frontier language model with state-of-the-art reasoning capabilities, best for complex and multi-step use cases. To use a faster version, see Grok 2 Mini. Ru…
Llama 3.1 Nemotron 70B Instruct - API, Providers, Stats: NVIDIA’s Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Run Llama 3.1 Nemotron 70B Instruct with API

Oct 17, 2024 Did Nvidia's Nemotron 70B train on test?

Thu, Oct 17, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Grok 2 Maintenance

NVIDIA Nemotron 70B Performance

Grok 2 Temporarily Down for Maintenance: xAI has taken Grok 2 down for temporary maintenance lasting a couple of hours, resulting in users receiving a 404 error if they attempt to access it.
- An announcement will be made when the models are ready to return.
NVIDIA Nemotron 70B Dominates Benchmark Tests: Nemotron 70B outperformed Llama 3.1 405B, GPT-4o, and Claude 3.5 Sonnet across multiple evaluations: Arena Hard score of 85.0 compared to 79.2 and 79.3.
- For further details, try it here and check out the announcement for the big day in open source.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Big day for open source: NVIDIA Nemotron 70B Nemotron beat Llama 405B, GPT-4o & Claude 3.5 Sonnet on several evals: Nemotron 70B vs Claude 3.5 vs GPT4o: > Arena Hard: 85.0 | 79.2 …

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

ChatGPT advanced voice mode

Personalized AI learning

Self-learning with AI

Vocabulary teaching examples

ChatGPT voice mode teaches vocab from Naruto: A user demonstrated using ChatGPT advanced voice mode to teach new vocabulary with examples from Naruto, declaring the experience to be ‘absolutely wild!’
- They shared a demo link for feedback on the effectiveness of this teaching method.
Future of personalized AI learning: The user expressed excitement for personalized AI learning, predicting it to be a revolutionary force in education.
- They noted that these new voice models are ‘shockingly effective,’ hinting at significant innovations on the horizon.
Impact of AI on self-learning: The conversation highlighted the transformative power of AI in self-learning processes, emphasizing advancements in voice models.
- It’s very interesting to see what comes up soon, indicating a future filled with potential developments in educational technology.

Link mentioned: Tweet from Ahmet ☕ (@ahmetdedeler101): ChatGPT voice mode teaching me vocabulary with examples from Naruto Personalized AI learning is the future. It’s shockingly effective 😂

OpenRouter (Alex Atallah) ▷ #general (168 messages🔥🔥):

Grok 2 Issues

Infermatic Provider Problems

Yi Lightning and Model Performance

OpenRouter Credit and API Key Questions

Mistral's New Models

Grok 2 off-line: Grok 2 seems to be down as xAI has pulled the API, leaving users frustrated with no access.
- A user lamented its absence, claiming it outperformed other models like Llama 3.2 in Python coding and chatbots.
Infermatic provider faces network issues: Infermatic’s provider is experiencing network issues, causing models to produce gibberish responses, especially after reaching an 8k context limit.
- Users are advised that the provider is working on reverting their build to address the VLLM inference issues affecting service.
Yi Lightning model performance under scrutiny: Some users are skeptical about Yi Lightning’s reported performance, noting possible discrepancies in evaluation results compared to expected outputs.
- Discussions arose regarding whether the model’s success is legitimate or if it’s a product of gaming the evaluation metrics.
OpenRouter credits and API key confusion: New users reported difficulties in purchasing credits and managing API keys, with mixed messages about credit expiration and usage.
- Clarifications were provided around the functionality of keys versus credits, with users expressing frustration over the system’s complexity.
Mistral launches new ‘Pocket LLMs’: Mistral introduced two new models, Ministral 3B and 8B, aimed at edge use cases and promising enhanced performance.
- These models support larger context lengths and aim to extend capabilities in knowledge and reasoning tasks.

Links mentioned:

Tweet from Vaibhav (VB) Srivastav (@reach_vb): We’re soo back!: Nvidia Nemotron 70B - beats Llama 3.1 405B, GPT4o & Claude 3.5 Sonnet! 🔥 Evals (Nemotron 70B vs Claude 3.5 vs GPT4o) > Arena Hard - 85.0 vs 79.2 vs 79.3 > AlpacaEval 2 LC…
Un Ministral, des Ministraux: Introducing the world’s best edge models.
OpenRouter | docs.ST.app: Don’t have access to OpenAI / Claude APIs due to geolocking or waitlists? Use OpenRouter.
Tweet from Rohan Paul (@rohanpaul_ai): Andrej Karpathy on the importance of extremely smaller-sized distilled models (even 1Bn param model should be good enough) Video Credit - Original video from “No Priors: AI, Machine Learning, Tec…
OAuth PKCE | OpenRouter: Secure user authentication via OAuth
Llama 3.1 Nemotron 70B Instruct - API, Providers, Stats: NVIDIA’s Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Run Llama 3.1 Nemotron 70B Instruct with API

Oct 15, 2024 not much happened today

Wed, Oct 16, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Hermes 3 Llama 3.1 405B

Nous Hermes Yi 34B Deprecation

Hermes 3 Llama 3.1 405B is now a paid model: The Hermes 3 Llama 3.1 405B Instruct model is now available for $1.79/month, though a free variant remains accessible at OpenRouter.
- Don’t miss out on this updated pricing structure for powerful AI functionality!
Nous Hermes Yi 34B is deprecated: The Nous Hermes Yi 34B model has been deprecated by all service providers, making it no longer available for use.
- Users are encouraged to transition to alternative models in light of this deprecation.

OpenRouter (Alex Atallah) ▷ #general (90 messages🔥🔥):

AI Model Performance

Chatbot Development

OpenRouter Features

Model Comparison

Provider Issues

Ranking of AI Models: Users discussed the performance of various AI models for chatting and role-playing, with Llama-3-8b-Instruct and GPT-4o being highlighted for following instructions well.
- Grok 2 mini and Gemini 1.5 Pro were also mentioned as viable options, though Opus faced some critique regarding its quirks.
Chatbot Design Techniques: A user inquired about creating a hidden AI chatbot that avoids generic refusal messages to insults, proposing the use of another LLM for filtering bad content.
- Others suggested using models like Llama Guard for additional support in checking messages before allowing a response.
OpenRouter’s Features and Usage: The community discussed how to restrict model usage within OpenRouter by utilizing headers to filter out unwanted providers, enhancing privacy settings.
- A member highlighted the LiteLLM guardrails feature, which offers safeguards like request checking using models like Llama Guard.
Issues with Infermatic Provider: One user reported issues with the Infermatic provider, stating that their chats had begun outputting irrelevant responses unexpectedly.
- The community was alerted to potential service disruptions that may have arisen within a short timeframe.
User Experiences and Feedback: Users expressed excitement about new features in the OpenRouter playground, such as drag-and-drop file uploads and image pasting capabilities.
- Another user noted a significant delay of over 90 seconds for prompts in the Gemini model, indicating varying performance experiences.

Links mentioned:

Privacy | OpenRouter: Manage your privacy settings
Dubesor LLM Benchmark table: no description found
Provider Routing | OpenRouter: Route requests across multiple providers
LlamaGuard 2 8B - API, Providers, Stats: This safeguard model has 8B parameters and is based on the Llama 3 family. Just like is predecessor, [LlamaGuard 1](https://huggingface. Run LlamaGuard 2 8B with API
LLM Rankings: roleplay | OpenRouter: Language models ranked and analyzed by usage for roleplay prompts

Oct 14, 2024 Not much (in AI) happened this weekend

Tue, Oct 15, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Inflection models

Grok 2 launch

MythoMax endpoint

Chatroom improvements

Inflection models now live on OpenRouter: The @inflectionAI model, powering @Pi, is now live on OpenRouter with no minimum spend and a preference for emojis 🍓. Check out the announcement here.
- Inflection models not only enhance user interactions but also embrace a playful element with their heavy use of emojis 🤗.
Grok 2 has touched down 🚀: Open access to Grok 2 and Grok 2 Mini is now available on OpenRouter, although these models are initially rate limited at a pricing of $4.2/m input and $6.9/m output. More details on the performance stats can be found here.
- This launch highlights the commitment to providing robust capabilities while managing resource demands.
Introducing MythoMax Endpoint for Free: A new free MythoMax endpoint has been launched, offering users more options on their platform. Detailed utilization metrics were shared to help gauge its effectiveness.
- This move aims to broaden the accessibility of advanced models to a wider audience, enhancing the user experience.
Chatroom Enhancements: Send Images Effortlessly: Major improvements in chatrooms now allow users to drag and drop or paste images directly, making interactions smoother and more engaging. This functionality enhances overall communication within the platform.
- These changes reflect a focus on user-friendly experiences and streamlined workflows.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): The @inflectionAI model, powering @Pi, is now available on OpenRouter. It uses a lot of emojis 🤗 Come try it out!
Tweet from OpenRouter (@OpenRouterAI): In other news, Grok has finally landed 🚀 Grok 2 + Grok 2 Mini are now available for all. Perf stats below:

OpenRouter (Alex Atallah) ▷ #general (357 messages🔥🔥):

Grok API issues

Model comparisons for roleplaying and chatting

OpenRouter interface performance

Claude model access and errors

Model prompt caching features

Grok API experiences: Users have reported difficulties with the Grok API, encountering errors like ‘500 Internal Server Error’ and ‘Rate limit exceeded’. Some users have emphasized that Grok is still classified as experimental, which may lead to internal errors.
- Responses from Grok are varying, and it’s recommended to try alternative models or beta versions when facing issues.
Best models for chatting and roleplaying: Users discussed the best-performing AI models for chatting and roleplaying, with recommendations including Llama-3 and GPT-4o. Benchmarking data showed some models outperforming others based on the ability to follow instructions.
- The top models mentioned for conversing effectively include Llama-3-8b-Instruct and GPT-4o, indicating varying performance across contexts.
Performance of OpenRouter web interface: Several users expressed concerns over the stability and responsiveness of the OpenRouter web interface, particularly during longer chat sessions. Reports suggested that issues like slow performance or lag might be influenced by the browser’s capabilities.
- Users have recommended trying dedicated clients for better performance during extensive chats, citing the IndexedDB storage method’s impact on the interface’s speed.
Claude model access and errors: Timotheeee and others encountered frequent error codes (like 429) when using the Claude model, prompting suggestions to use the betas or alternative models. It was advised to implement fallbacks when facing these errors to ensure smoother usage.
- Accessing Claude through OpenRouter highlighted differences in performance compared to using native platforms, leading to discussions on models’ reliability.
Model caching features: Discussion about prompt caching features revealed that some models, like gpt4o-mini, automatically cache context to improve performance. Knowing how cache works can significantly enhance the efficiency of interactions, especially for repeated queries.
- Users clarified the utilities of caching in relation to how model prompts and responses are processed, with varied opinions on its effectiveness for different tasks.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Boosting Llama 3.1 405B Throughput by Another 1.5x on NVIDIA H200 Tensor Core GPUs and NVLink Switch | NVIDIA Technical Blog: The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of applications, each with diverse deployment&#...
Tweet from Toby Pohlen (@TobyPhln): We're giving early access to our new API at today's @xai hackathon. Great to see so many people in the room.
Claude 3.5 Sonnet (self-moderated) - API, Providers, Stats: Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Run Claude 3.5 Sonnet (self-moderated) with API
no title found: no description found
every-chatgpt-gui/README.md at main · billmei/every-chatgpt-gui: Every front-end GUI client for ChatGPT, Claude, and other LLMs - billmei/every-chatgpt-gui
LLM Rankings: roleplay | OpenRouter: Language models ranked and analyzed by usage for roleplay prompts

Oct 11, 2024 not much happened today

Sat, Oct 12, 2024

OpenRouter (Alex Atallah) ▷ #general (75 messages🔥🔥):

Usage issues

Model pricing differences

LLM writing tips

Chat model sharing

Account access glitches

Addressing Usage Issues: A member inquired about DMing staff for billing and usage issues, to which Alex Atallah suggested waiting longer for IDs to show up when hitting the /generation API.
- This aligns with various user experiences related to API requests and response issues.
Understanding Model Pricing Differences: Members discussed the price discrepancies between Mistral Nemo 12B Starcannon and Rocinante 12B, with observations indicating Mistral’s competitive pricing strategy.
- The conversation highlighted competitive dynamics, noting the lack of other providers for Rocinante 12B may enable higher pricing.
Enhancing Writing with LLMs: A user shared that employing LLMs just for specific parts of an article improved their writing quality significantly.
- Another member emphasized that while not everyone is a natural writer, LLMs can help nearly anyone produce decent texts with some effort.
Functionality of Share Models Feature: Users learned that the ‘share models’ button copies a link to share the current chatroom’s model settings but does not include parameters or prompts.
- This feature provides a quick way to share model settings but lacks comprehensive detail for deeper sharing.
Account Access Glitches Noted: A user reported glitches allowing access to their old account chats through a different account on the same device, raising concerns about cookies retaining cache.
- This led to discussions about how chats are stored locally in browser tools, hinting at possible privacy and data management issues.

Link mentioned: Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.

Oct 10, 2024 State of AI 2024

Fri, Oct 11, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

MythoMax API

MythoMax performance

Free API Endpoint for MythoMax Launched: OpenRouter has just launched a free API endpoint for MythoMax 🎁, powered by TogetherCompute Lite with int4 quantization.
- This announcement highlights that MythoMax is an ancient llama2 merge from August 2023, consistently processing 10B tokens per week.
MythoMax Passes Strawberry Test: OpenRouter proudly mentions that MythoMax passes the strawberry test 🍓.
- This reinforces confidence in its capabilities and performance for users exploring its new endpoint.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Just launched a free API endpoint for MythoMax 🎁 Powered by @togethercompute Lite (with int4 quantization). Why MythoMax is great 👇 Quoting OpenRouter (@OpenRouterAI) A reminder that MythoMax, …

OpenRouter (Alex Atallah) ▷ #general (162 messages🔥🔥):

NotebookLM Deep Dive podcast

Gemini moderation concerns

Automated podcast apps

Claude model issues

Grok model integration

NotebookLM Enhancements: A user expresses enthusiasm for the NotebookLM Deep Dive podcast and notes they’re creating notebooks for various papers for on-the-go listening.
- The discussion touches on the need for automation in managing podcasts, with mentions of new open-source apps like ai-podcast-maker and groqcasters.
Moderation in Gemini AI: Users discuss whether Gemini moderates inputs, with concerns about potential bans stemming from user behavior.
- An insight is shared that Gemini has hard filters and that OpenRouter itself does not provide bans, leading to further discussion on moderation flags.
Claude Model Errors: Issues are raised regarding the Claude 3.5 model returning 404 errors, with users uncertain about the cause and resolution.
- The consensus is that it may be a rate limit issue related to server overload, as some users successfully execute requests while others experience failures.
Grok Model Integration Plans: A member inquires about the potential addition of the Grok model to their resources, with an optimistic outlook shared by another member about the upcoming meetings.
- There is encouragement for users to upvote a Grok thread on a specific channel to demonstrate demand for its integration.
Llama Model Capabilities Inquiry: A question arises about whether the Llama 3.1 models hosted by Together AI truly support a 128k context window, particularly comparing variants available.
- Discussion includes details about context window limitations and the points cost of models across different platforms as users seek clarity on accessibility.

Links mentioned:

Tweet from Jimmy Apples 🍎/acc (@apples_jimmy): It is time
OpenRouter: LLM router and marketplace
no title found: no description found
Provider Routing | OpenRouter: Route requests across multiple providers
VertexAI [Anthropic, Gemini, Model Garden] | liteLLM: vertex_ai/ route
no title found: no description found

Oct 10, 2024 not much happened today

Thu, Oct 10, 2024

OpenRouter (Alex Atallah) ▷ #general (71 messages🔥🔥):

Prompt Caching

Inflection 3.0 and Enterprise

OpenRouter API Rate Limits

NotebookLM Deep Dive Podcast

User Concerns about Gemini Moderation

Prompt Caching Explained: Members discussed the mechanics and usefulness of prompt caching, identifying situations where it may be disadvantageous, such as changing contexts or short prompts.
- One noted, ‘You cannot disable prompt caching for those providers who do automatic prompt caching,’ highlighting the limitations set by certain providers.
Intrigue Surrounding Inflection 3.0: The launch of Inflection 3.0 has sparked curiosity, especially due to its potential integration with Intel Gaudi 3 for improved performance.
- However, discussions reveal skepticism about the hype, with some members noting they’ve seen minimal concrete information, particularly regarding benchmarks.
OpenRouter API Rate Limits: Clarifications were made regarding OpenRouter API request limits, indicating these are dynamic based on account credits.
- One member shared a GET request example to check rate limit usage and credits associated with an API key, which can help guide usage.
NotebookLM Podcast Utilization: Members shared positive feedback about the NotebookLM Deep Dive podcast, with some creating notebooks to listen to the content while on the go.
- One user expressed interest in automation tools like ai-podcast-maker, noting that while the audio may not be as smooth, ‘automation ftw.’
Concerns about Gemini Moderation: A user raised concerns regarding whether Gemini moderates inputs, expressing fear about potential bans due to users’ input.
- This highlights a broader discussion on user experience and content moderation in AI applications.

Links mentioned:

no title found: no description found
Inflection AI Developer Playground: Let’s build a better enterprise AI.
Patterns of Application Development Using AI: Discover practical patterns and principles for building intelligent, adaptive, and user-centric software systems that harness the power of AI.
Introducing Inflection for Enterprise: Introducing Inflection for Enterprise, powered by our innovative Inflection 3.0 AI system in collaboration with Intel. This solution delivers exceptional price and performance for GenAI deployments us…
Limits | OpenRouter: Set limits on model usage
Provider Routing | OpenRouter: Route requests across multiple providers
Inflection AI: It’s simple. We train and tune it. You own it. Let’s do enterprise AI right.
no title found: no description found

Oct 09, 2024 The AI Nobel Prize

Wed, Oct 9, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

OpenAI prompt caching

Prompt caching audits

Cost savings with caching

Updates on model endpoints

Anthropic beta endpoints

OpenAI Prompt Caching Launched: Last week, OpenAI prompt caching was launched, enabling significant cost savings on inference costs, potentially up to 50%.
- It is automatically enabled for 8 OpenAI models and works with providers like Anthropic and DeepSeek, with more to follow.
Audit Your Caching Savings: Users can now audit their savings from prompt caching directly from the openrouter.ai/activity page.
- This feature can also be accessed via the /generation API to track how much was saved on each generation.
Cache Usage Insights Available: The cache_discount field in the response body reveals how much the response saved on cache usage, aiding user decisions.
- However, some providers like Anthropic may show a negative discount on cache writes, impacting the overall cost savings.
Model Endpoint Updates Rolled Out: Free model endpoints will now display the accurate endpoint context length on the model page to enhance user clarity.
- Feedback is requested on the Anthropic :beta self-moderated endpoints as they approach a planned exit from beta soon.

Links mentioned:

Prompt Caching | OpenRouter: Optimize LLM cost by up to 90%
Prompt Caching | OpenRouter: Optimize LLM cost by up to 90%

OpenRouter (Alex Atallah) ▷ #general (112 messages🔥🔥):

OpenRouter Performance Issues

Anthropic API Usage

Prompt Caching Details

Model Provider Selection

Rate Limits

OpenRouter encountering double generation issues: Users reported experiencing double generation per request. One member suspected it might be related to their setup, while another suggested increasing timeouts for better handling.
Challenges with Anthropic 3.5 Sonnet moderation: A member faced moderation issues with Claude 3.5 Sonnet, realizing that using the :beta endpoint could avoid some of these problems. The regular endpoint imposes mandatory moderation, while the :beta variant allows for self-moderation.
Insights on prompt caching mechanics: Discussion emerged around prompt caching, covered in detail by OpenRouter’s documentation. Members noted it automates caching for several providers but required manual activation for Anthropic, costing +25% for input tokens.
Provider selection strategies for requests: User queries led to explanations on how to route requests to specific providers like Anthropic to avoid rate limit errors. The default load balancing behavior and manual pinning of providers were highlighted as options.
Rate limit concerns with Google Vertex: A user reported frequent 429 errors while using Sonnet, indicating resource exhaustion. Suggestions were made to disallow fallback options to redirect requests to Anthropic instead.

Links mentioned:

no title found: no description found
Provider Routing | OpenRouter: Route requests across multiple providers
Prompt Caching | OpenRouter: Optimize LLM cost by up to 90%
Prompt Caching | OpenRouter: Optimize LLM cost by up to 90%
Limits | OpenRouter: Set limits on model usage
Requests | OpenRouter: Handle incoming and outgoing requests

Oct 08, 2024 not much happened this weekend

Tue, Oct 8, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

OpenRouter integration with Fal.ai

LLM and VLM workflows

OpenRouter collaborates with Fal.ai: OpenRouter has announced a partnership with Fal.ai, now enhancing LLM and VLM capabilities within Fal’s image workflows via this link.
- Reimagine your workflow with Fal by utilizing Gemini through OpenRouter, streamlining your image processing tasks.
Enhancement of Image Workflows: The integration allows users to leverage the capabilities of LLMs and VLMs in their image workflows, promising improved efficiency and output.
- The announcement emphasizes the potential for users to rethink their processes and outcomes with the new functionalities introduced.

Link mentioned: Tweet from batuhan taskaya (@isidentical): Reimagine workflow with fal (using gemini thru OpenRouter)

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

API4AI

AI Assisted Coding Tool

Sci Scope Newsletter

API4AI: Powering AI with New APIs: The API4AI platform enables seamless integration with services like OpenAI and Azure OpenAI, offering robust tools for developing AI applications and real-world interaction.
- APIs provided include capabilities for weather forecasts, internet searches, email handling, and image generation, enhancing AI functionality.
AI Assisted Coding via Web Chat: An innovative tool was created that leverages web chat for AI-assisted coding, particularly useful for OpenAI’s new o1 models which don’t allow attachments.
- The GitHub repository offers a command-line tool for copying code context to clipboard to streamline interactions in LLM chats.
Stay Updated with Sci Scope: The Sci Scope newsletter provides a weekly roundup of new ArXiv papers, summarizing similar topics to keep researchers informed effortlessly.
- Personalized summaries are available, tailored to user interests, ensuring you never miss vital research developments relevant to your work.

Links mentioned:

Sci Scope: An AI generated newsletter on AI research
GitHub - cyberchitta/llm-context.py: A command-line tool for copying code context to clipboard for use in LLM chats: A command-line tool for copying code context to clipboard for use in LLM chats - cyberchitta/llm-context.py
LLM Context: Harnessing Vanilla AI Chats for Development: The case for a tool that enables efficient use of web-based AI chat interfaces for software development, offering an alternative to IDE-integrated solutions.
API for AI: no description found
GitHub - dbapibuilder/API4AI: Contribute to dbapibuilder/API4AI development by creating an account on GitHub.

OpenRouter (Alex Atallah) ▷ #general (286 messages🔥🔥):

OpenRouter functionality

Image and media models

Double generation issue

Math model performance

Discounts for non-profits

Discussion on OpenRouter capabilities: Users expressed interest in whether OpenRouter will support image, video, and audio models, suggesting that media integration appears to be a logical progression.
- Some users believe multimodal models are becoming increasingly important in the AI landscape.
Issues with double generation responses: A user reported receiving double generation responses when calling the OpenRouter API, which seemed to be an issue specific to their setup.
- After adjusting their response parser for retries, they noted that some API requests returned 404 errors, suggesting a possible timeout or availability delay.
Math models performing well: During discussions, o1-mini was highlighted as the preferred model for math STEM tasks due to its effectiveness in rendering outputs.
- Users queried about LaTeX rendering capabilities for math formulas within the OpenRouter chat room.
Feedback on usage metrics in responses: New usage metrics detailing prompt and completion tokens have been noticed in API responses, which some users were unaware of until now.
- The usage information is applicable across all models available through OpenRouter and follows the GPT4 tokenizer standards.
Inquiries about discounts for non-profit organizations: One user asked about potential discounts or credit options on OpenRouter for non-profit educational organizations in Africa.
- This inquiry reflects broader interests in accessibility and supportive pricing for non-profit initiatives within the AI community.

Links mentioned:

Limits | OpenRouter: Set limits on model usage
no title found: no description found
Activity | OpenRouter: See how you've been using models on OpenRouter.
no title found: no description found
no title found: no description found
no title found: no description found
no title found: no description found
Requests | OpenRouter: Handle incoming and outgoing requests
Prompt Caching | OpenRouter: Optimize LLM cost by up to 90%
GitHub - stanford-oval/storm: An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.: An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations. - stanford-oval/storm

Oct 05, 2024 Contextual Document Embeddings: `cde-small-v1`

Sat, Oct 5, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

SambaNova AI on OpenRouter

Gemini 1.5 Flash-8B Release

SambaNova AI Hits OpenRouter with Fastest Throughput: SambaNova AI announced their endpoints for Llama 3.1 and 3.2 are live on OpenRouter, boasting the fastest throughput measurements they’ve recorded.
- They mentioned, ‘These are the fastest we’ve seen’, highlighting that their throughput measurements are generally more conservative than others.
Gemini 1.5 Flash-8B Now Available: The Gemini 1.5 Flash-8B model has been officially launched and can be accessed for use here.
- Additionally, the model’s ID has been renamed for consistency, while the old ID will still function via an alias.

Links mentioned:

Tweet from SambaNova Systems (@SambaNovaAI): We’re up on @OpenRouter! They say it’s the fastest throughput measurements they’ve seen. 🚀🚀🚀 Thanks for the shoutout! Quoting OpenRouter (@OpenRouterAI) .@SambaNovaAI endpoints for Llama 3.1 an...
Gemini 1.5 Flash-8B - API, Providers, Stats: Gemini 1.5 Flash-8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. Run Gemini 1.5 Flash-8B with API

OpenRouter (Alex Atallah) ▷ #general (140 messages🔥🔥):

Gemini 1.5 Flash

o1 Mini performance

Anthropic's model development

Model alignment techniques

OpenRouter infrastructure updates

Gemini 1.5 Flash impresses with low costs: The Gemini 1.5 Flash-8B model offers a competitive price of $0.0375 per million tokens, leading to discussions about its performance and pricing structure compared to other models.
- Members speculate on the potential scaling and applicability of Gemini’s more recent offerings.
o1 Mini showcases improved solving capability: Users noted that o1 Mini has been solving complex tasks effectively, surprising those in the community who did not expect its performance to exceed that of other models.
- One participant plans to use o1 Mini in a bot to facilitate image descriptions, highlighting its enhanced usability.
Anthropic’s strategic advantage with funding: Discussion reveals that Anthropic’s success can be attributed to its team of ex-OpenAI engineers and backing from Amazon, allowing for rapid development of their Claude models.
- There’s speculation on how they maintain competitive performance despite less financial backing compared to larger corporations.
Innovative alignment techniques debated: Members discuss how models like Anthropic’s handle alignment, mentioning its effectiveness in training without post-model filtering, in contrast to OpenAI’s methods.
- The conversation also touches on concepts of prompt injections and model moderation techniques.
OpenRouter infrastructure improvements: User expressed anticipation for future expansions of OpenRouter to support a wider range of model functionalities, including image and audio processing.
- Development lead confirmed ongoing upgrades to manage increased traffic and new model releases.

Links mentioned:

Gemini 1.5 Flash-8B is now production ready: no description found
Reddit - Dive into anything: no description found
Privacy | OpenRouter: Manage your privacy settings
Provider Routing | OpenRouter: Route requests across multiple providers
GPT-4 Vision - API, Providers, Stats: Ability to understand images, in addition to all other [GPT-4 Turbo capabilties](/models/openai/gpt-4-turbo). Training data: up to Apr 2023. Run GPT-4 Vision with API

Oct 03, 2024 Canvas: OpenAI's answer to Claude Artifacts

Fri, Oct 4, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

alexatallah: https://x.com/SambaNovaAI/status/1841901026821210131

OpenRouter (Alex Atallah) ▷ #general (112 messages🔥🔥):

DeepInfra Outage

GPT-4o Price Drop

Claude 2.1 Moderation Issues

NVLM 1.0 Release

Flash 8B Model Pricing and Speed

DeepInfra experiences a brief outage: DeepInfra experienced an outage for about 15 minutes but is reportedly recovering.
GPT-4o sees a significant price reduction: The GPT-4o model is now 50% cheaper for input and around 33% cheaper for output, effective today.
- This change relates to the updated model, GPT-4o-2024-08-06, which has been available since August.
Claude 2.1 raises concerns over moderation: Users reported that Claude 2.1 and other models are erroneously flagging SFW prompts, impacting user interactions.
- One specific example involved a character description that was flagged for ‘sexual’ content, raising questions about moderation standards.
NVIDIA releases NVLM 1.0 model: NVIDIA announced the NVLM 1.0 model, which is competitive with leading proprietary models and offers open-sourced weights and code.
- This release is expected to enhance accuracy for both vision-language tasks and text-only capabilities.
Flash 8B model enters production: The Flash 8B model is now in production but reportedly offers 200 tokens per second, which is considered slower compared to normal Flash.
- Discussions suggest potential future speed upgrades and considerations for lower hardware utilization.

Links mentioned:

no title found: no description found
EvalPlus Leaderboard: no description found
Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
OpenAI DevDay: Let’s build developer tools, not digital God: I had a fun time live blogging OpenAI DevDay yesterday—I’ve now shared notes about the live blogging system I threw other in a hurry on the day (with assistance from …
NVLM: Open Frontier-Class Multimodal LLMs: We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g.,...
Not Diamond: Not Diamond is the world's most powerful AI model router.
Dolphin Llama 3 70B 🐬 - API, Providers, Stats: Dolphin 2.9 is designed for instruction following, conversational, and coding. Run Dolphin Llama 3 70B 🐬 with API
nvidia/NVLM-D-72B · Hugging Face: no description found
GitHub - OpenRouterTeam/open-webui: User-friendly WebUI for LLMs (Formerly Ollama WebUI): User-friendly WebUI for LLMs (Formerly Ollama WebUI) - OpenRouterTeam/open-webui

Oct 02, 2024 Not much technical happened today

Thu, Oct 3, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Llama 3.1 and 3.2 Endpoints

Gemini Token Standardization

Cohere Model Discounts

Chatroom Upgrades

Samba Nova launches free Llama endpoints: In collaboration with Samba Nova, five free bf16 endpoints for Llama 3.1 and 3.2 are now available on their new inference chips, aimed at measuring performance.
- The best-in-class throughput for the 405B Instruct model is already making waves as a promising addition for Nitro if performance lives up to expectations.
Gemini Models undergo token standardization: The Gemini and PaLM models have now been standardized to use the same token sizes, which will result in approximately 2x higher prices with inputs being 25% shorter.
- Costs are expected to reduce by 50% despite the changes, reassuring users of overall affordability here.
Cohere’s new discount and tool calling feature: Cohere models are now offered on OpenRouter with a 5% discount and have been upgraded to their v2 API, complete with tool calling support.
- Users can now access these tools more efficiently, aiming to enhance overall user experience.
Chatroom landing page gets an upgrade: The new Chatroom landing page enhances model comparisons and includes intelligence tests for evaluating model performance on openrouter.ai/chat.
- Users can now benefit from an improved LaTeX formatter and a better code formatter to enhance their coding experience.

Links mentioned:

Llama 3.1 8B Instruct (free) - API, Providers, Stats: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 8B Instruct (free) with API
Llama 3.1 70B Instruct (free) - API, Providers, Stats: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 70B Instruct (free) with API
Llama 3.1 405B Instruct (free) - API, Providers, Stats: The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs. Meta's latest c...
Llama 3.2 3B Instruct (free) - API, Providers, Stats: Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Run Llama 3.2 ...
Llama 3.2 1B Instruct (free) - API, Providers, Stats: Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Run Llama 3.2 1B Instruc...

OpenRouter (Alex Atallah) ▷ #general (244 messages🔥🔥):

Realtime API Updates

OpenRouter Model Performance

File Upload Limitations

OpenAI Caching

Free Credit Programs

Discussion on Realtime API: Users inquired about the potential support of the new Realtime API by OpenRouter, highlighting its current limitations regarding audio input and output.
- There is ongoing interest in integrating new functionalities, but no definite timeline has been established.
OpenRouter Model Performance Concerns: Members expressed concerns about model performance and availability, particularly with regard to loading different providers under varied circumstances.
- Specific instances of encountering greyed-out providers and changes in charging rates were discussed, indicating the need for users to stay aware of price changes.
Limitations on File Uploads: Questions were raised about file upload capabilities within OpenRouter, with users reporting issues related to mobile devices and unsupported file types.
- Suggestions were made to rename files or input them as text, but the underlying HTML limitations were noted as a problem.
Insights on OpenAI Caching: A comprehensive breakdown of various commercial context caching implementations was shared, explaining how they function and their respective discount rates.
- This was seen as valuable information for users looking to understand cost implications across different models.
Free Credit Programs Inquiry: Users inquired about available free credit programs, expressing challenges in working with limited resources.
- It was clarified that while no universal credit program exists, research-related credits may be issued based on user activity.

Links mentioned:

Tweet from Jordi Bruin (@jordibruin): MacWhisper 10.0 is available as a free update now! - Supports the new Whisper Turbo model for up to 21x realtime at super high accuracy - Support for local AI models using Ollama - Support for custom...
Nova: Discover Nova, the advanced AI solutions from Rubik's AI. Experience intelligent reasoning, mathematics, coding, and image generation.
UGI Leaderboard - a Hugging Face Space by DontPlanToEnd: no description found
Tweet from Rubiks AI (@RubiksAI): 🚀 Introducing Nova: The Next Generation of LLMs by Nova! 🌟 We're thrilled to announce the launch of our latest suite of Large Language Models: Nova-Instant, Nova-Air, and Nova-Pro. Each designe...
Privacy | OpenRouter: Manage your privacy settings
Models: 'uncensored' | OpenRouter: Browse models on OpenRouter
Llama 3.2 3B Instruct - API, Providers, Stats: Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Run Llama 3.2 ...
Llama 3.1 405B: API Provider Performance Benchmarking & Price Analysis | Artificial Analysis: Analysis of API providers for Llama 3.1 Instruct 405B across performance metrics including latency (time to first token), output speed (output tokens per second), price and others. API providers bench...
Provider Routing | OpenRouter: Route requests across multiple providers
GitHub - bobcoi03/opencharacter: Open Source Alternative to Character.AI - Create your own characters uncensored no filters: Open Source Alternative to Character.AI - Create your own characters uncensored no filters - bobcoi03/opencharacter

Oct 02, 2024 OpenAI Realtime API and other Dev Day Goodies

Wed, Oct 2, 2024

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Gemini Flash Ratlimits

Liquid 40B Model

Samba Nova Collaboration

Gemini Token Standardization

Cohere Model Updates

Gemini Flash Ratlimits Resolved: The capacity issue for Gemini Flash 1.5 has been resolved, lifting previous ratelimits as requested by users.
- This change encourages more robust usage of the model by removing previous constraints.
Introducing Liquid 40B Model: A new mixture of experts model, LFM 40B, is now available for free on OpenRouter at this link.
- Users are encouraged to try out this innovative model that enhances the offering of tools at their disposal.
Samba Nova Delivers Speedy Llamas: In partnership with Samba Nova, five free bf16 endpoints for Llama 3.1 and 3.2 have been launched on new inference chips, showcasing exceptional throughput particularly on 405B Instruct.
- If performance metrics remain high, these models will be added to Nitro for further enhancements.
Gemini Token Standardization Achieved: With the new updates, Gemini models now share standardized token sizes with other Google models, reducing prices by about 50% despite context lengths dropping to 25% of previous capacities.
- Sigh of relief was expressed over these changes, which seem to balance pricing and performance expectations for users.
Cohere Models Get Discount & Tool Calling: Cohere models are now offered at a 5% discount on OpenRouter and have been upgraded to their v2 API with tool calling capabilities.
- This upgrade aims to enhance functionality and reduce costs for users utilizing the Cohere ecosystem.

Links mentioned:

LFM 40B MoE (free) - API, Providers, Stats: Liquid's 40.3B Mixture of Experts (MoE) model. Run LFM 40B MoE (free) with API
Gemini Flash 1.5 - API, Providers, Stats: Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video...

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

Mem0 Toolkit

Long-term memory for AI apps

Integration of memory features

OpenRouter API

Mem0 Launches Long-term Memory Toolkit: Taranjeet, CEO of Mem0, announced the release of a toolkit for adding long-term memory to AI companion apps, enhancing user interaction continuity. The toolkit is demonstrated in action at this site.
- The system also provides access to open source code and a detailed blog post on integrating Mem0 into applications.
Addressing AI Companions’ Memory Challenges: Mem0 aims to solve the issue where AI companions struggle to store long-term memories without additional developer input. The toolkit allows AI to self-update and maintain personalized conversations by learning user preferences.
- Taranjeet expressed interest in feedback from developers building companion apps and emphasized the importance of OpenRouter for LLM access in this development.
Community Excitement for memory integration: A response from the community highlighted enthusiasm for integrating memory features in companion platforms, indicating broader interest in addressing similar challenges. The user expressed hope that various platforms would find benefits from this new capability.

Links mentioned:

Voicenotes | AI Voice Notes App: Voicenotes is an intelligent note-taking app. Record your thoughts freely, get them transcribed using state-of-the-art AI, and ask about every word you spoke.
Companion Starter Code: no description found
GitHub - mem0ai/companion-nextjs-starter: Contribute to mem0ai/companion-nextjs-starter development by creating an account on GitHub.
How to Add Long-Term Memory to AI Companions: A Step-by-Step Guide: You can find a notebook with all the code mentioned in this guide here. AI Companions are among the most evident and exciting use cases of large language models (LLMs). However, they have a problem. ...

OpenRouter (Alex Atallah) ▷ #general (134 messages🔥🔥):

OpenAI DevDay announcements

Nova Model Launch

SambaNova Context Limitations

OpenRouter Payment Methods

LLM Translation Capabilities

Exciting Updates from OpenAI DevDay: OpenAI announced new features such as prompt caching with discounts, a real-time API for voice input and output, and vision fine-tuning capabilities.
- The real-time API can handle stateful, event-based communication and is positioned to enhance interactive applications.
Introduction of Nova Models: Rubiks AI launched their suite of LLMs called Nova, featuring Nova-Pro, Nova-Air, and Nova-Instant, set to redefine AI interactions with impressive benchmarks and specialized capabilities.
- Notably, Nova-Pro achieved 88.8% on the MMLU benchmarking, highlighting its excellence in reasoning and math tasks.
SambaNova’s 4k Context Limitation: Discussion emerged about SambaNova operating with a mere 4k context, being deemed insufficient for certain use cases, particularly given the expectations for larger models.
- In contrast, Groq reportedly operates with a full 131k, attracting attention for its superior capability.
OpenRouter Payment Alternatives: A query regarding payment methods on OpenRouter revealed that it primarily accepts what Stripe allows, leaving users to seek alternatives like crypto, which holds legal complications in some regions.
- Users expressed concerns over the lack of prepaid card and PayPal options for payments, particularly highlighting restrictions in various countries.
LLM Translation Capabilities Evaluation: A paper evaluating the translation capabilities of various LLMs using OpenRouter received approval for publication, acknowledging the platform in its research.
- Discussion ensued regarding the nuances of context limits and token generation rates for models like SambaNova and others.

Links mentioned:

OpenAI DevDay 2024 live blog: I’m at OpenAI DevDay in San Francisco, and I’m trying something new: a live blog, where this entry will be updated with new notes during the event.
Tweet from Rowan Cheung (@rowancheung): Rollout for public beta starting today
Tweet from Rubiks AI (@RubiksAI): 🚀 Introducing Nova: The Next Generation of LLMs by Nova! 🌟 We're thrilled to announce the launch of our latest suite of Large Language Models: Nova-Instant, Nova-Air, and Nova-Pro. Each designe...
UGI Leaderboard - a Hugging Face Space by DontPlanToEnd: no description found
Hugging Face – The AI community building the future.: no description found
Llama 3.1 405B: API Provider Performance Benchmarking & Price Analysis | Artificial Analysis: Analysis of API providers for Llama 3.1 Instruct 405B across performance metrics including latency (time to first token), output speed (output tokens per second), price and others. API providers bench...

Oct 01, 2024 Liquid Foundation Models: A New Transformers alternative + AINews Pod 2

Tue, Oct 1, 2024

OpenRouter (Alex Atallah) ▷ #general (193 messages🔥🔥):

OpenRouter Rate Limits

Model Performance Issues

Translation Model Recommendations

Frontend Chat GUI Options

Gemini and Search Functionality

OpenRouter faces rate limiting challenges: Users report frequent 429 errors when using Gemini Flash due to quota exhaustion, with hopes for a quota raise from Google soon.
- The traffic load is a constant issue, impacting the usability of the platform, as indicated by recent discussions among users.
Concerns over model performance post-maintenance: Certain models, like Hermes 405B free, have shown a drop in performance quality after maintenance updates, leading to speculation about provider changes.
- Users are encouraged to check their Activity pages in OpenRouter to see if they are still using their preferred providers.
Recommendations for translation models: A user inquired about efficient translation models without strict limitations for dialogue translation, citing frustrations with GPT4o Mini.
- Open weight models with dolphin fine-tunings were suggested as options offering more flexibility.
Frontend chat GUI suggestions: A user sought advice for a chat GUI allowing middleware flexibility for managing interactions with AI models, with Streamlit mentioned as a potential solution.
- Other options like Typingmind were highlighted for their customizable functionalities in engaging with multiple AI agents.
Gemini model search functionality: There was interest in enabling direct search capabilities with Gemini models comparable to Perplexity, but limitations on usage remain unclear.
- Discussions referenced Google’s Search Retrieval API parameter, though implementation and effectiveness are still under consideration.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Separating code reasoning and editing: An Architect model describes how to solve the coding problem, and an Editor model translates that into file edits. This Architect/Editor approach produces SOTA benchmark results.
SillyTavern - LLM Frontend for Power Users: no description found
Tweet from OpenRouter (@OpenRouterAI): The Chatroom now shows responses from models with their reasoning collapsed by default. o1 vs Gemini vs Sonnet on 🍓:
no title found: no description found
Create multiple AI Agents within a chat instance: Creating multiple AI agents within a single chat instance allows for a personalized and dynamic interaction experience. By customizing each AI agent with specific datasets, you can get a wide range of...
Mobile App version of AnythingLLM? · Issue #1476 · Mintplex-Labs/anything-llm: What would you like to see? Not sure this is the right place to ask for this but is there any desire to have mobile apps for anythingllm? Any work in progress in that regard? If not, I would love t...
Mintplex Labs: AI applications for everyone. Mintplex Labs has 16 repositories available. Follow their code on GitHub.

Sep 27, 2024 not much happened today

Sat, Sep 28, 2024

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Gemini Tokenization

Database Upgrade Delay

Chatroom UI Enhancements

Gemini Token Changes Simplified: OpenRouter will transition to counting tokens instead of characters for Gemini models, effectively reducing token numbers by a factor of ~4 on the /activity page.
- In addition, prices will double to align with the lower tier per-token prices on AI Studio, leading to an estimated 50% cost cut for Flash and 1.5 Pro models.
Database Upgrade Downtime Cancelled: Scheduled downtime for a database upgrade was initially set to begin in 10 minutes, but the upgrade has been delayed, resulting in no downtime.
- An update will be provided once a new schedule is determined.
Chatroom UI Gets a Facelift: OpenRouterAI announced enhanced UI for the Chatroom, showing responses from models with their reasoning collapsed by default.
- More improvements are on the way, promising an even better user experience in the future.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): The Chatroom now shows responses from models with their reasoning collapsed by default. o1 vs Gemini vs Sonnet on 🍓:

OpenRouter (Alex Atallah) ▷ #general (186 messages🔥🔥):

Llama 3.2 vision parameters

OpenRouter error messages

Claude 3.5 Sonnet tool calling issues

Translation model recommendations

Model hosting criteria on OpenRouter

Llama 3.2 vision parameters: A user inquired about parameters to use with Llama 3.2 vision to avoid rejections, especially when evaluating attractiveness.
- Members discussed that the model might be trained not to respond to such queries due to safety concerns.
OpenRouter error messages: Several users reported encountering a 429 Resource Exhausted error, indicating that the model could not process requests due to hitting rate limits.
- Responses indicated that OpenRouter has been pushing for higher rate limits with Google to mitigate these issues.
Claude 3.5 Sonnet tool calling issues: A user encountered an error while trying to use the Claude 3.5 Sonnet model, noting discrepancies in the required message formatting.
- Discussions revealed that omitting parameters in function calls works for OpenAI models but causes issues with Anthropics’ models.
Translation model recommendations: A user sought advice for translation models without strict content restrictions, particularly for translating fictional dialogues.
- They shared a prompt they were using but faced challenges with dialogues being flagged for inappropriate content.
Model hosting criteria on OpenRouter: A user asked about the criteria for adding new models to the OpenRouter infrastructure, looking into hosting options.
- It was clarified that a provider must be able to host the model at scale for it to be considered.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Molmo by Ai2: Multimodal Open Language Model built by Ai2
Tweet from OpenRouter (@OpenRouterAI): The Chatroom now shows responses from models with their reasoning collapsed by default. o1 vs Gemini vs Sonnet on 🍓:
Transforms | OpenRouter: Transform data for model consumption
Credits | OpenRouter: Manage your credits and payment history
every-chatgpt-gui/README.md at main · billmei/every-chatgpt-gui: Every front-end GUI client for ChatGPT. Contribute to billmei/every-chatgpt-gui development by creating an account on GitHub.
Requests | OpenRouter: Handle incoming and outgoing requests
Add OpenRouter support with Claude 3.5 Sonnet model by PierrunoYT · Pull Request #61 · e2b-dev/ai-artifacts: This pull request adds support for OpenRouter as a new AI provider, specifically integrating the Claude 3.5 Sonnet mod from Anthropic through OpenRouter's API. Key changes: Updated lib/model...
no title found: no description found

Sep 26, 2024 not much happened today

Fri, Sep 27, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Vision Llama Release

Gemini Tokenization Changes

Gemini Pricing Adjustments

OpenRouter Endpoints

Vision Llama Hits OpenRouter with Free Endpoint: The first vision Llama is now available on OpenRouter, featuring a free endpoint. In total, five new endpoints have been introduced, powered by multiple providers.
- Users are encouraged to enjoy the latest features, marked by the celebratory icon 🎁🦙.
Gemini Tokenization Simplifies Costs: OpenRouter will transition to counting tokens instead of characters for Gemini models, reducing apparent token counts by a factor of ~4. This aims to normalize and cut costs for developers.
- These changes will also lead to doubling of current prices as they align tokens to a per-token pricing model, which is set to adjust further after October 1.
Upcoming Gemini Pricing Adjustments: As part of the upcoming changes, Gemini prices will be adjusted to match current lower tier per-token prices from AI Studio. This pricing strategy is designed to enhance the developer experience and standardization.
- The anticipated price cuts on October 1 promise further reductions for users, leading to an overall improved cost structure moving forward.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Five new endpoints for today’s Llama 3.2 release are now live Including a free vision Llama! 🎁🦙

OpenRouter (Alex Atallah) ▷ #general (87 messages🔥🔥):

OpenRouter Updates

Llama Model Restrictions

Chat Completion Models

Gemini Performance Changes

Math Quiz Rewriting Models

OpenRouter Credits and Invoice Issues: Users reported difficulties with credit transactions and visibility on OpenRouter, stating that transactions might take time to appear after payments are made, as illustrated by a user who eventually received their credits.
- Another mentioned that a backend delay or provider issues might be causing disruption in seeing transaction history.
Llama 3.2 Restrictions for EU Users: Meta’s policy on the use of their vision models in the EU raises concerns over the accessibility and legality for users in that region, especially regarding inference provision.
- Members noted confusion regarding the implications of provider locations and licenses, suggesting that compliance with Meta’s rules could be problematic.
Chat Completion Model Limitations: Users discussed perceived limitations in OpenRouter’s support for completion models, indicating that such functionality may require specific templates or conditions to operate correctly, particularly for the Codestral Mamba model.
- The conversation highlighted a general lack of completion support and raised questions about model capabilities within OpenRouter.
Gemini Token Count Anomalies: There were reports of unexpected jumps in the input/output token count on Gemini, with users speculating about possible underlying changes or errors affecting their results.
- A participant suggested the token count includes hidden reasoning tokens, leading to confusion among users about the actual token usage.
VLM Support in vLLM: Discussion centered on vLLM’s evolving support for Vision Language Models (VLMs), with feedback indicating that while there were previously issues, recent updates have improved functionality.
- Users confirmed that vLLM’s support hinges on compatibility with Hugging Face’s transformers, suggesting a dependency on external frameworks for new architectures.

Links mentioned:

Llama 3.2 Acceptable Use Policy: Llama 3.2 Acceptable Use Policy
Credits | OpenRouter: Manage your credits and payment history

OpenRouter (Alex Atallah) ▷ #beta-feedback (2 messages):

Bring Your Own Key Beta Test

Request for BYOK Beta Participation: A member inquired about the possibility of being included in the Bring Your Own Key (BYOK) beta test.
- They offered to provide their email address via direct message to facilitate the process.
Willingness to Share Contact: The same member expressed their willingness to share personal contact information to assist with the beta participation process.
- They specifically mentioned being happy to DM their email address if needed.

Sep 25, 2024 Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)

Thu, Sep 26, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Database Upgrade

API Completion Response Changes

Gemini Model Updates

New Vision Language Models

Cohere Models Discount

Scheduled Database Upgrade to Cause Downtime: A database upgrade is set for Friday at 10am ET, expected to result in 5-10 minutes of downtime.
- Users are advised to prepare accordingly for potential service interruptions.
API Enhancements in Response Outputs: The provider used to serve requests is now included directly in the completion response.
- This update aims to streamline the amount of information returned to users.
Gemini Models Routing Updated: Gemini-1.5-flash and Gemini-1.5-pro are now routed to the newest 002 version.
- This change is part of ongoing improvements to the Gemini model lineup.
Exciting New Vision Language Models Launched: OpenRouter now features a collection of new open-source vision language models, ready for interaction.
- Models include Mistral Pixtral 12B and Qwen2-VL series, with users encouraged to engage them in the chatroom.
5% Discount on All Cohere Models: A 5% discount has been launched on all Cohere models, exclusively on OpenRouter.
- Users can explore the flagship model with 128k context at the provided link.

Links mentioned:

Command R+ - API, Providers, Stats: Command R+ is a new, 104B-parameter LLM from Cohere. It's useful for roleplay, general consumer usecases, and Retrieval Augmented Generation (RAG). Run Command R+ with API
Pixtral 12B - API, Providers, Stats: The first image to text model from Mistral AI. Its weight was launched via torrent per their tradition: https://x. Run Pixtral 12B with API
Qwen2-VL 7B Instruct - API, Providers, Stats: Qwen2 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performanc...
Qwen2-VL 72B Instruct - API, Providers, Stats: Qwen2 VL 72B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performan...

OpenRouter (Alex Atallah) ▷ #general (169 messages🔥🔥):

GPTs Agents Performance

OpenRouter Model Availability

Mistral Image Recognition Issues

Llama 3.2 Release

OpenRouter API Rate Limits

GPT4o Mini vs Gemini 1.5 Flash: Members tested GPT4o Mini and Gemini 1.5 Flash, noting GPT4o Mini performed but didn’t adhere strictly to constraints while Flash was faster but gave unreliable outputs.
- Concerns were raised about the speed and adherence to requirements, especially in relation to the context size.
Mistral’s Pixtral Model Performance: A member reported that mistralai/pixtral-12b is producing hallucinations and poor output, while other models perform well with image recognition.
- It was suggested that although Pixer is not the best, Gemini Flash models are more cost-effective for similar tasks.
Excitement for Llama 3.2 Release: Announcement of the upcoming Llama 3.2 included smaller models aimed at easier deployment for development on mobile and edge devices.
- The community showed interest in whether Llama 3.2 would be available on OpenRouter soon.
Rate Limiting on OpenRouter API: Discussion indicated that users were encountering rate limits after sending consecutive requests to the OpenRouter API.
- Clarifications were provided regarding the rate limit being tied to the usage of credits and request frequency.
Translation Model Suggestions: A user inquired about the best model for translation, leading to discussions about the trade-offs between size, accuracy, and language commonality.
- Recommendations included testing different models based on the specific use case and requirements.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Aider LLM Leaderboards: Quantitative benchmarks of LLM code editing skill.
Llama 3.2 Acceptable Use Policy: Llama 3.2 Acceptable Use Policy
Activity | OpenRouter: See how you've been using models on OpenRouter.
Transforms | OpenRouter: Transform data for model consumption
no title found: no description found
Limits | OpenRouter: Set limits on model usage
phind-codellama: Code generation model based on Code Llama.
Gemini Flash 8B 1.5 Experimental - API, Providers, Stats: Gemini 1.5 Flash 8B Experimental is an experimental, 8B parameter version of the [Gemini 1. Run Gemini Flash 8B 1.5 Experimental with API
Elevated Errors on Claude 3.5 Sonnet: no description found
no title found: no description found

OpenRouter (Alex Atallah) ▷ #beta-feedback (2 messages):

Local Server Support

API Access Conditions

Challenges of Supporting Local Servers: It was noted that running services locally without external access makes support difficult, indicating limited assistance in the near future.
- If you’re running it locally, support may not be feasible.
Potential for Future API Support: Support might become available for endpoints accessible via HTTPS that adhere to an OpenAI-style schema with an API key.
- This opens the door for potential collaborations down the line, should these criteria be met.

Sep 25, 2024 ChatGPT Advanced Voice Mode

Wed, Sep 25, 2024

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Cursor Integration

Gemini Update

Database Downtime

New Nous Model

Open-source Vision Language Models

Cursor integrates with OpenRouter!: OpenRouter now works seamlessly in Cursor with all models, including those from Anthropic.
- Thank you @cursor_ai for fixing this! 🍾
Gemini 1.5 models upgraded!: Gemini-1.5-flash and gemini-1.5-pro are now routed to the newest 002 version.
- This update brings both models in line with the latest features and improvements.
Scheduled Database Downtime: A downtime notice was shared, indicating that on Friday at 10am ET, there will be a 5-10 minute downtime for database upgrades.
- This will ensure smoother operations moving forward.
Nous launches multilingual Llama 3.1!: A new finetune of Llama 3.1 8B optimized for multilingual dialogue has been released by Nous, available at this link.
- This model aims to enhance global communication capabilities.
Roast yourself with VLMs!: Several open-source vision language models are now live, including the Mistral Pixtral 12B (link) and Qwen series models (Qwen2-VL-7B-Instruct, Qwen2-VL-72B-Instruct).
- Make sure to ask them to roast a picture of you in the chatroom! 🙂

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Thank you @cursor_ai for fixing this! OpenRouter now works in Cursor with all models, including Anthropic 🍾
Gemini Flash 1.5 - API, Providers, Stats: Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video...
Gemini Pro 1.5 - API, Providers, Stats: Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solvin...
Llama 3.1 8B Instruct - API, Providers, Stats: A fine-tune of [Llama-3.1 8B Instruct](/models/meta-llama/llama-3. Run Llama 3.1 8B Instruct with API
Pixtral 12B - API, Providers, Stats: The first image to text model from Mistral AI. Its weight was launched via torrent per their tradition: https://x. Run Pixtral 12B with API
Qwen2-VL 7B Instruct - API, Providers, Stats: Qwen2 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performanc...
Qwen2-VL 72B Instruct - API, Providers, Stats: Qwen2 VL 72B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performan...

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

OpenRouter App Development

Demo Apps on GitHub

OpenRouter offers demo apps to kickstart development: The OpenRouter team announced the availability of basic demo apps for those interested in building their own applications, found on GitHub.
- These demos include a simple ‘tool calling’ demo, designed to guide users through the initial stages of app creation.
Invitation for feedback on demo apps: The OpenRouter team is open to receiving feedback and requests from users regarding the demo apps.
- They encouraged community engagement, stating that users’ opinions will help improve future offerings.

Link mentioned: tai-llm-chat/demos/tool_calling at main · pxl-research/tai-llm-chat: Repository with demo code for LLM’s (using Azure OpenAI and OpenRouter) - pxl-research/tai-llm-chat

OpenRouter (Alex Atallah) ▷ #general (378 messages🔥🔥):

OpenRouter's middle-out transforms

New Gemini Models

Token Pricing Structures

Performance of various LLMs

User Experiences with Models

Discussion on OpenRouter’s Middle-Out Transforms: Users questioned the disabling of the middle-out transform as the default, citing negative impacts on their current infrastructure and workflows.
- Concerns were raised about accessibility and communication regarding model changes, with some users emphasizing the need for clearer updates.
New Gemini Models Announcement: Google announced the release of two updated models, Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002, with significant reductions in pricing and improved performance metrics.
- The new models are designed with faster outputs, higher rate limits, and will automatically update user-facing aliases by October 8, 2024.
Token Pricing Structures Across Providers: Discussion was held about varying token pricing across different models, noting that OpenRouter utilizes native tokens returned from upstream for cost calculations.
- Users were informed that differences in tokenizers between models like GPT-4o and Qwen can impact token count and pricing estimations.
Performance Comparisons of LLMs: Comparative performance analyses showed that while Gemini Flash 002 is faster than GPT-4o Mini, it sometimes fails to meet coding constraints.
- Users shared experiences with generative coding tasks, highlighting Gemini’s strengths in certain areas while noting limitations in adherence to task requirements.
User Experience and Bug Fixes: Users expressed appreciation for quick bug resolutions from model providers like SambaNova and OpenRouter, noting prompt fixes after reporting issues.
- Feedback on user experience emphasized efficiency and responsiveness within the platforms, which builds user confidence in the technologies.

Links mentioned:

Updated production-ready Gemini models: Two new models from Google Gemini today: `gemini-1.5-pro-002` and `gemini-1.5-flash-002`. Their `-latest` aliases will update to these new models in "the next few days", and new `-001` suffi...
Activity | OpenRouter: See how you've been using models on OpenRouter.
Responses | OpenRouter: Manage responses from models
no title found: no description found
Models | OpenRouter: Browse models on OpenRouter
no title found: no description found
Tweet from Rowan Cheung (@rowancheung): I just finished up an exclusive interview going over a new, major AI model upgrade. Can confirm, tomorrow will be a big day for developers. Dropping the full conversation on X the second the embargo...
Transforms | OpenRouter: Transform data for model consumption
Updated production-ready Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more: no description found
no title found: no description found
open-webui/backend/open_webui/apps/openai/main.py at 6b463164f4b129e0ce4bdc9008dd661214fe5eb5 · open-webui/open-webui: User-friendly WebUI for LLMs (Formerly Ollama WebUI) - open-webui/open-webui
Magnum 72B - API, Providers, Stats: From the maker of [Goliath](https://openrouter.ai/models/alpindale/goliath-120b), Magnum 72B is the first in a new family of models designed to achieve the prose quality of the Claude 3 models, notabl...
Models: 'alpind' | OpenRouter: Browse models on OpenRouter
add middle-out by default · OpenRouterTeam/open-webui@89659df: no description found
GitHub - OpenRouterTeam/open-webui: User-friendly WebUI for LLMs (Formerly Ollama WebUI): User-friendly WebUI for LLMs (Formerly Ollama WebUI) - OpenRouterTeam/open-webui

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

godling72: In my case it would be something I’m running myself.

Sep 23, 2024 a calm before the storm

Tue, Sep 24, 2024

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

Cloud-based testing

Live webinar on advanced usage

Scaling parallel agents

Cloud-based Testing Available: Subscribers can now test and run the service in the cloud without any local installations; a smaller demo is available on the landing page that can be tested with a Loom video.
- This setup makes it easy for users to explore features quickly and efficiently.
Upcoming Webinar on Advanced Usage: A live webinar on advanced usage is scheduled for 12pm EST, focusing on scaling to thousands of parallel agents and proxies.
- Participants can find more details by clicking the Live tab on the associated YouTube channel.

OpenRouter (Alex Atallah) ▷ #general (350 messages🔥🔥):

OpenRouter Model Changes

Upcoming AI Model Announcements

Model Performance Issues

OpenWebUI Integration

OpenAI Account Concerns

OpenRouter disables Middle-Out as Default: OpenRouter has officially changed the default behavior for prompt handling by disabling the middle-out transform, impacting many users with established workflows.
- Users expressed concern about this decision, emphasizing the importance of this feature for various frontend and backend systems.
Possible Release of New Anthropic Model: Speculation arose regarding the launch of a new model from Anthropic, with hints from social media posts indicating a significant announcement was expected soon.
- It was suggested that this announcement might coincide with a Google event and potentially offer extensive free token offers.
Performance Issues with Hermes 3 Models: Users reported delays and stalling issues with various Hermes 3 models, experiencing wait times of over 10 minutes for responses from the API.
- Concerns were raised about the overall performance of the models being slower than usual, leading users to explore alternative options.
Infermatic Models Generating Gibberish: Some users noticed that Infermatic models were producing nonsensical output during their use, raising questions about the model performance.
- Advices were given to check activity logs and adjust settings like temperature and penalties to mitigate these issues.
Concerns Over OpenAI Account Security: Concerns were voiced regarding the security of the OpenAI newsroom Twitter account, which allegedly posted about token announcements while disabling comments.
- This incident stirred anxiety among users about the potential for compromised accounts or misinformation spreading.

Links mentioned:

Tweet from Rowan Cheung (@rowancheung): I just finished up an exclusive interview going over a new, major AI model upgrade. Can confirm, tomorrow will be a big day for developers. Dropping the full conversation on X the second the embargo...
Sao10K/L3-8B-Stheno-v3.3-32K · Hugging Face: no description found
Activity | OpenRouter: See how you've been using models on OpenRouter.
Rate Limits: SambaNova Cloud enforces rate limits on inference requests per model to ensure that developers are able to try the fastest inference on the best open source models. Rate limits in the Free tier M...
Elevated Errors on Claude 3.5 Sonnet: no description found
Transforms | OpenRouter: Transform data for model consumption
Anthropic: Claude 3.5 Sonnet – Provider Status: See provider status and make a load-balanced request to Anthropic: Claude 3.5 Sonnet - Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. S...
Limits | OpenRouter: Set limits on model usage
Dubesor LLM Benchmark table: no description found
Dubesor LLM Benchmark table: no description found
Hermes 3 405B Instruct (free) - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren...
add middle-out by default · OpenRouterTeam/open-webui@89659df: no description found
Hermes 3 405B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren...
Index of /assets/shared/UIcompare/: no description found
o1-mini - API, Providers, Stats: The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 models are optimized for math, science, programming, and other STEM-related tas...
GitHub - hsiehjackson/RULER: This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?: This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models? - hsiehjackson/RULER
GitHub - OpenRouterTeam/open-webui: User-friendly WebUI for LLMs (Formerly Ollama WebUI): User-friendly WebUI for LLMs (Formerly Ollama WebUI) - OpenRouterTeam/open-webui
open-webui/backend/open_webui/apps/openai/main.py at 6b463164f4b129e0ce4bdc9008dd661214fe5eb5 · open-webui/open-webui: User-friendly WebUI for LLMs (Formerly Ollama WebUI) - open-webui/open-webui

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

Private LLM Servers

Inquiry about Private LLM Servers: A member inquired whether others are running private LLM servers themselves or if they are managed by a third party.
- Out of curiosity, are you running private llm servers yourself?
Response to Request on Servers: The conversation opened with a thank you for a request, signaling engagement in an ongoing discussion about LLM server management.
- The member’s response suggested curiosity around the operational aspect of these servers.

Sep 21, 2024 not much happened today

Sat, Sep 21, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Chatroom improvements

New Model Releases

Hermes 3 Pricing Update

Chatroom introduces editable messages: The chatroom now features editable messages, allowing users to edit their messages or the bot’s responses by clicking the regenerate button for new replies.
- Additionally, the stats have undergone a redesign, enhancing user interaction.
Qwen 2.5 raises the bar in AI models: Qwen 2.5 72B boasts improved capabilities in coding and mathematics, and features an impressive 131,072 context size. More details can be found here.
- This model demonstrates a significant knowledge leap, setting a new standard in AI capabilities.
Mistral Pixtral debuts as multimodal model: Mistral Pixtral 12B marks Mistral’s first venture into multimodal AI, with a free variant accessible to users as well. Details can be explored here.
- This introduction represents a step forward in providing versatile AI solutions through multimodal functionality.
Upgrade announcement of Neversleep Lumimaid: Neversleep Lumimaid v0.2 8B is now a finetune of Llama 3.1 8B, offering a HUGE step up dataset wise compared to its predecessor. More information is available here.
- This upgrade showcases a commitment to enhancing dataset quality for improved model performance.
Hermes 3 goes paid but retains a free variant: Hermes 3 is transitioning to a paid model at $4.5/month, although a free variant remains available for the time being. Details can be accessed here.
- Overall, this shift reflects the ongoing evolution of model offerings in the marketplace.

Links mentioned:

Qwen2.5 72B Instruct - API, Providers, Stats: Qwen2.5 72B is the latest series of Qwen large language models. Run Qwen2.5 72B Instruct with API
Lumimaid v0.2 8B - API, Providers, Stats: Lumimaid v0.2 8B is a finetune of [Llama 3. Run Lumimaid v0.2 8B with API
Pixtral 12B - API, Providers, Stats: The first image to text model from Mistral AI. Its weight was launched via torrent per their tradition: https://x. Run Pixtral 12B with API
Hermes 3 405B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren...

OpenRouter (Alex Atallah) ▷ #general (79 messages🔥🔥):

Frontend for OpenRouter

SillyTavern functionalities

Model pricing changes

Integration API issues

Feature requests and further developments

Recommendation for AI Chat Frontend: A user sought recommendations for a frontend to chat with AIs on OpenRouter, mentioning the need for sharing conversations across PCs. Another member pointed to every-chatgpt-gui as a comprehensive frontend option.
- It was noted that while SillyTavern is good for role-playing, it has syntax highlighting for code snippets but may not be optimal for coding tasks.
Recent Pricing Changes for Hermes Model: A reminder was given about the price increase of nousresearch/hermes-3-llama-3.1-405b from free to $4.5/M, which surprised users reliant on free credits. Users voiced concerns about lack of notifications regarding the sudden pricing changes.
- Information was shared indicating that cached tokens were not reflected in usage stats due to OpenRouter’s limitations on forwarding caching details.
Integration API Challenges: A user encountered issues with their integration API key not functioning correctly, despite assurances that Lambda’s services remained free. It was suggested to check the activity page to verify prompt caching effectiveness.
- Various users exchanged tips on utilizing caching and provided updates on feature implementations to better assist API usage.
Discussion on Model Performance: Users compared the performance of different models, notably mentioning that deepinfra qwen72b was slow (5-8 tok/s) while hyperbolic was significantly faster. Some questioned the scalability and intended use of OpenRouter for larger applications.
- Several users reported successes using OpenRouter for extensive token usage, highlighting its potential for both personal and broader application contexts.
Feature Requests and Future Developments: Inquiries about submitting feature requests were made, with directions provided to a specific channel. Discussions also touched on the need for more robust sharing options for chat histories.
- The community anticipates upcoming updates regarding feature enhancements, including functionality around caching and user engagement.

Links mentioned:

Activity | OpenRouter: See how you've been using models on OpenRouter.
Introducing Contextual Retrieval: Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Introducing Contextual Retrieval: Here's an interesting new embedding/RAG technique, described by Anthropic but it should work for any embedding model against any other LLM. One of the big challenges in implementing semantic sear...
Using LLMs to parse and understand proposed legislation: Legislation is famously challenging to read and understand. Indeed, such documents are not even intended to be read by the average person…
anthropic-cookbook/misc/prompt_caching.ipynb at main · anthropics/anthropic-cookbook: A collection of notebooks/recipes showcasing some fun and effective ways of using Claude. - anthropics/anthropic-cookbook
every-chatgpt-gui/README.md at main · billmei/every-chatgpt-gui: Every front-end GUI client for ChatGPT. Contribute to billmei/every-chatgpt-gui development by creating an account on GitHub.

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

Custom API Integration

Private LLM Servers

Request for Custom API Key Integration: A member proposed the feature to add a custom OpenAI compatible URL / API KEY endpoint to facilitate integration with private LLM servers.
- This request highlights the need for more flexible integration options to support varied user environments and deployments.
Discussion on Integration Flexibility: Several members expressed the importance of allowing custom endpoints in current and future integrations for broader compatibility.
- The sentiment was shared that enabling these features could enhance overall user experience by accommodating diverse system architectures.

Sep 20, 2024 not much happened today

Fri, Sep 20, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Chatroom features

Qwen 2.5

Mistral Pixtral

Neversleep Lumimaid v0.2

Hermes 3

Editable Messages Enhance Chatroom Functionality: New chatroom features now allow users to edit messages, including those from the bot, by clicking the regenerate button for new replies.
- Additionally, chatroom stats have been redesigned for improved user experience.
Qwen 2.5 Aces Coding and Math: Qwen 2.5 72B boasts enhanced knowledge and significantly elevated capabilities in coding and mathematics, with an impressive context size of 131,072. More details can be found here.
- This model marks a notable improvement in performance, particularly for coding applications.
Mistral Introduces Pixtral Model: Mistral Pixtral 12B has been launched as Mistral’s first multimodal model, along with a free variant to explore its capabilities. Further information is available here.
- This release expands Mistral’s offerings into multimodal applications, attracting interest from users.
Neversleep Lumimaid Gets a Major Update: The Neversleep Lumimaid v0.2 8B is a refined version of Llama 3.1 8B, described as having a HUGE step up in dataset quality compared to its predecessor. Check out more about it here.
- This update is expected to significantly enhance performance and capabilities.
Hermes 3 Model Update: Hermes 3 has transitioned to a paid model priced at $4.5/m, although a free and an extended variant remain available for users. More details can be found here.
- This shift may alter user accessibility while still offering alternatives.

Links mentioned:

Qwen2.5 72B Instruct - API, Providers, Stats: Qwen2.5 72B is the latest series of Qwen large language models. Run Qwen2.5 72B Instruct with API
Lumimaid v0.2 8B - API, Providers, Stats: Lumimaid v0.2 8B is a finetune of [Llama 3. Run Lumimaid v0.2 8B with API
Pixtral 12B - API, Providers, Stats: The first image to text model from Mistral AI. Its weight was launched via torrent per their tradition: https://x. Run Pixtral 12B with API
Hermes 3 405B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren...

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Google Gemini

No-code agent creation

Open Agent Cloud

Enterprise automation

Screen recording agents

Google Gemini Launches Video-to-Agent Feature: With Google Gemini, users can now upload a Loom video to create a no-code drag-drop agent in seconds, making it the fastest way to build agents on the planet.
- Previously, building a Twitter agent took 20 minutes, but now it can be done in just 5 seconds from a recorded video.
Scale Agents Instantly in Open Agent Cloud: Once created, agents can be instantly run in Open Agent Cloud, allowing users to scale schedules to thousands of agents running in parallel.
- All agents stream data directly to the dashboard, ensuring real-time monitoring and control.
Solving Expertise Loss in Enterprises: This innovative approach addresses a critical problem in enterprises and governments: the loss of expertise when employees and contractors leave.
- Now, users can generate agents from decades-old screen recordings, preserving valuable knowledge.
Watch the Demo Video: Check out the introduction of this groundbreaking feature in this YouTube video, showcasing how to create agents effortlessly.
- The video provides insights into leveraging video content for enhanced productivity and automation.

OpenRouter (Alex Atallah) ▷ #general (136 messages🔥🔥):

OpenAI Rate Limits

Payment Issues on OpenRouter

Model Sharing and Chat History

Integrating New Models

Job Impact of AI

Increased API Rate Limits by OpenAI: OpenAI has increased the rate limits for its o1 API, with o1-preview now allowing 500 requests per minute and o1-mini 1000 requests per minute.
- This change aims to support developers using tier 5 rates, further increasing access to broader functionalities.
Payment Issues on OpenRouter: Users reported issues regarding payment errors when attempting to add credits on OpenRouter, often receiving an error 500 message indicating insufficient funds.
- It was suggested that users check their bank notifications as payment attempts could be denied for various reasons.
Local Storage of Chat History: Users inquired about sharing chat history across devices, discovering that OpenRouter’s chat logs are stored locally without a direct sharing feature.
- Exporting chats to a JSON file was mentioned as the only way to transfer conversation data between devices.
Integrating New Models on OpenRouter: Inquiries were made on how to distribute new models through OpenRouter, indicating a need for formal requests or guidance on the integration process.
- Users expressed interest in the steps necessary to offer new models via their API on the platform.
AI Job Impact Analysis: A discussion arose about the potential impact of AI and automation on jobs, projecting various scenarios for job displacement through 2027 and beyond.
- Speculation suggested that AI advancements could affect 10-20% of jobs by 2027, potentially rising to 50-70% by 2040.

Links mentioned:

Tweet from Haider. (@slow_developer): 🚨 OpenAI CEO Sam Altman confirms moving from o1-preview to full o1 model soon " The new reasoning model o1-preview will significantly improve over the coming months when we shift from an initial...
Models | Mistral AI Large Language Models: Overview
Privacy | OpenRouter: Manage your privacy settings
Tweet from OpenAI Developers (@OpenAIDevs): Just 5x'd rate limits again: o1-preview: 500 requests per minute o1-mini: 1000 requests per minute Quoting OpenAI Developers (@OpenAIDevs) We've increased OpenAI o1 API rate limits for ...
Technology: Frontier AI in your hands
every-chatgpt-gui/README.md at main · billmei/every-chatgpt-gui: Every front-end GUI client for ChatGPT. Contribute to billmei/every-chatgpt-gui development by creating an account on GitHub.

Sep 18, 2024 o1 destroys Lmsys Arena, Qwen 2.5, Kyutai Moshi release

Thu, Sep 19, 2024

OpenRouter (Alex Atallah) ▷ #general (126 messages🔥🔥):

OpenRouter issues

Mistral API price drops

Rate limits and model access

Backup model usage

LLM allocation for users

Mistral API experiences significant price drops: Members highlighted recent significant price reductions on the Mistral API, with competitive pricing noted for large models.
- For instance, one user mentioned pricing of $2/$6 on Large 2, which is favorable compared to other models.
Concerns over OpenRouter accessibility and errors: Several users reported ongoing issues accessing OpenRouter services, particularly faced with error messages such as 429 and Data error output.
- Assistance was suggested through creating threads to report errors with detailed examples for clearer troubleshooting.
Rate limits affecting user experience: Users expressed frustrations with being rate-limited to the point of being unable to access models, impacting productivity significantly.
- One user mentioned being max rate-limited for 35 hours, prompting discussion about alternatives such as BYOK (Bring Your Own Key) to bypass limits.
Usage of fallback models during errors: Members discussed the challenges of implementing fallback models when encountering 429 errors, expressing uncertainty regarding their effectiveness.
- It was noted that 4xx errors represent unrecoverable issues, necessitating manual intervention rather than automatic fallback.
Calculation of LLM queries for large user bases: A user inquired about offering free LLM access to around 5000 individuals within a budget of $10-$15 per month, leading to discussions on token allocation.
- Insights were provided on effective usage rates, estimating around 9k tokens per user per day based on a monthly budget.

Links mentioned:

‎Gemini - chat to supercharge your ideas: Bard is now Gemini. Get help with writing, planning, learning, and more from Google AI.
OpenRouter: LLM router and marketplace
Credits | OpenRouter: Manage your credits and payment history
ChatGPT-4o - API, Providers, Stats: Dynamic model continuously updated to the current version of [GPT-4o](/models/openai/gpt-4o) in ChatGPT. Intended for research and evaluation. Run ChatGPT-4o with API

OpenRouter (Alex Atallah) ▷ #beta-feedback (21 messages🔥):

Fallback Model Behavior

API Key Management

Rate Limiting with Gemini Flash

User Implementation of Fallbacks

Fallback Models Priority Needs Clarity: Members discussed the sequence of using a fallback model versus a fallback key when encountering rate limit issues, especially with Gemini Flash Exp.
- One user observed 429 errors, questioning why their specified fallback model was not being utilized in certain scenarios.
Double Chat Confusion: A member clarified confusion surrounding double chat, ensuring they would streamline discussions in the same thread to avoid clutter.
- Another member reassured that there were no worries regarding overlapping discussions.
User Workaround for Fallbacks: One member mentioned they manually implemented their own fallback solution, resolving their immediate issues with fallback models.
- They emphasized this approach could be worth considering for others facing similar challenges.
Concerns About Abuse from Models: Discussion highlighted concerns that allowing fallback to paid models could lead to abuse where users exploit free access.
- Members agreed on the necessity of restrictions to prevent excessive access to paid features under free account conditions.
General Frustration with Fallbacks: Users expressed their annoyance regarding the rigid fallback policies, particularly in relation to the Gemini models.
- While they understand the reasons behind these policies, they found them to be impractical and cumbersome in practice.

Sep 18, 2024 nothing much happened today

Wed, Sep 18, 2024

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

OpenRouter integration

Google Sheets Addon Features

Updates and Improvements

User Feedback

Support for Multiple Models

OpenRouter successfully integrated into Google Sheets: OpenRouter has been added to the GPT Unleashed for Sheets addon following a user’s request, making it available for free.
- I personally love using OR too and hope to receive valuable feedback and more users along the way.
Innovative Features enhance Google Sheets performance: The addon includes features like ‘jobs’, ‘contexts’, and ‘model presets’ to streamline prompt engineering and boost productivity.
- Users can assign short codes to prompts, making it easier to reuse and refine AI outputs.
September Updates boost Addon functionality: Recent updates have added support for Claude from Anthropic, increased UX/UI enhancements, and improved overall performance.
- With OpenRouter integration, users can now access 100+ models within the addon.
User testimonials highlight addon benefits: Users appreciate that the addon is free forever, supports numerous popular language models, and simplifies AI tool building.
- Key benefits include massive productivity boosts and effective tracking of results and API calls.

Link mentioned: GPT Unleashed for Sheets™ - Google Workspace Marketplace: no description found

OpenRouter (Alex Atallah) ▷ #general (117 messages🔥🔥):

OpenRouter API Issues

Gemini Image Generation

Prompt Caching Usage

Mistral API Price Drops

Model Performance and Ratings

OpenRouter API experiencing issues: Several users reported problems accessing OpenRouter, especially regarding the o1 models, which led to confusion over rate limits and requests exhaustion.
- One user noted a temporary outage in Switzerland but later confirmed functionality returned after initial issues.
Gemini’s contrasting capabilities in image generation: Users discussed discrepancies between Gemini’s image generation capabilities on its official site compared to its performance via OpenRouter.
- It was clarified that the Gemini chatbot integrates image generation from Imagen models, while OpenRouter utilizes Google Vertex AI for Gemini models.
Understanding prompt caching: A discussion on prompt caching illuminated its cost efficiency, allowing repeated use of prompts to reduce expenses on subsequent queries.
- Users highlighted examples where essential prompt components could be cached, saving on costs during multiple related inquiries.
Significant price reductions on Mistral API: Announcements indicated substantial price drops for Mistral APIs, with new pricing set at $2 for Large 2 models, attracting positive comparisons to other providers.
- This price change is seen as competitive and could impact user decisions on which models to utilize for API requests.
Model performance discussions: Users shared differing opinions on the performance of vision models, noting that Google’s flash models appeared to outperform Pixtral 12B in certain aspects.
- Conversations also included insights on commonly faced rate limits and performance issues that are typical in ongoing testing and usage scenarios.

Links mentioned:

AI in abundance: Introducing a free API, improved pricing across the board, a new enterprise-grade Mistral Small, and free vision capabilities on le Chat.
Keys | OpenRouter: Manage your keys or create new ones
‎Gemini - chat to supercharge your ideas: Bard is now Gemini. Get help with writing, planning, learning, and more from Google AI.
Credits | OpenRouter: Manage your credits and payment history
Activity | OpenRouter: See how you've been using models on OpenRouter.
Activity | OpenRouter: See how you've been using models on OpenRouter.
Announcing Pixtral 12B: Pixtral 12B - the first-ever multimodal Mistral model. Apache 2.0.
Pixtral 12B (free) - API, Providers, Stats: The first image to text model from Mistral AI. Its weight was launched via torrent per their tradition: https://x. Run Pixtral 12B (free) with API
Activity | OpenRouter: See how you've been using models on OpenRouter.
anthropic-cookbook/misc/prompt_caching.ipynb at main · anthropics/anthropic-cookbook: A collection of notebooks/recipes showcasing some fun and effective ways of using Claude. - anthropics/anthropic-cookbook

Sep 17, 2024 a quiet weekend

Tue, Sep 17, 2024

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

fn5io: as posted on hacker news today: https://github.com/bklieger-groq/g1

OpenRouter (Alex Atallah) ▷ #general (775 messages🔥🔥🔥):

OpenRouter model context limits

User feedback on model performance

Prompt engineering techniques

Howarding provider efficiency

Prompt caching functionality

Confusion over context sizes in models: Users raised concerns about the displayed context lengths of various models on OpenRouter, particularly regarding the extended versions and their actual supported context sizes, which sometimes conflicted with stated capacities.
- This led to discussions about transparency in model capabilities and potential updates needed for clearer communication on the provider pages.
User experiences with model performance: Some users reported issues with specific models behaving unexpectedly, such as generating gibberish outputs or cut-off responses, particularly with the Venus Chub AI and WizardLM-2 models.
- These issues prompted users to seek feedback and verify whether the problems were consistent across different providers.
Prompt engineering techniques and resources: Discussions about effective prompt engineering techniques surfaced, particularly highlighting the use of XML tags for better responses and a tutorial for learning prompt manipulation.
- Various resources were shared, focusing on improving user interactions with models through structured prompts and caching methods.
Understanding prompt caching and provider selection: Users inquired about the specifics of implementing prompt caching, particularly with the Claude 3.5 model, and whether to disable load balancing for optimal performance.
- It was suggested that focusing on a single provider might enhance the effectiveness of prompt caching, emphasizing the nuances of provider-specific syntax.
General discussions about OpenRouter functionality: The conversation included general inquiries about the functionality and limitations of OpenRouter, particularly regarding API interactions and model integrations.
- Resilience in the face of model limitations was discussed, alongside strategies for effectively utilizing the available features.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Lost in the Middle: How Language Models Use Long Contexts: While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context. We analyze the performance of language models on two ta...
Transforms | OpenRouter: Transform data for model consumption
Activity | OpenRouter: See how you've been using models on OpenRouter.
The Tokenizer Playground - a Hugging Face Space by Xenova: no description found
OpenRouter Integration - Helicone OSS LLM Observability: no description found
Anthropic’s Prompt Engineering Interactive Tutorial: Anthropic continue their trend of offering the best documentation of any of the leading LLM vendors. This tutorial is delivered as a set of Jupyter notebooks - I used it …
OpenAI: GPT-4o – Provider Status: See provider status and make a load-balanced request to OpenAI: GPT-4o - GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text o...
config.json · NousResearch/Hermes-3-Llama-3.1-405B at main: no description found
Models | OpenRouter: Browse models on OpenRouter
lluminous: no description found
lluminous: no description found
Nous: Hermes 3 405B Instruct (extended) – Provider Status: See provider status and make a load-balanced request to Nous: Hermes 3 405B Instruct (extended) - Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agent...
From API to AGI: Structured Outputs, OpenAI API platform and O1 Q&A — with Michelle Pokrass & OpenAI Devrel + Strawberry team: Our episode on all of OpenAI's new models and 2 new paradigms for inference.
Llama 3.1 Sonar 405B Online - API, Providers, Stats: Llama 3.1 Sonar is Perplexity's latest model family. Run Llama 3.1 Sonar 405B Online with API
User: Bruttolöhne gesamt 2.104,20 € April 3.000,OO € Mai 2.945,88 € Juni 2.104,20 € Juli 18.478,09 € Aug-Januar 5.866,89 € Feb+Mar Kinderlos Steuerklasse 1, berechne die Höhe des ALGI Model 3.5s Okay, lass ...
Model: gemini-1.5-pro-exp-0827 User Germany Total gross wages €2,104.20 April €3,000.00 May €2,945.88 June €2,104.20 July €18,478.09 Aug-January €5,866.89 Feb+Mar Childless tax class 1, calculate the amount ...
Hermes 3 405B Instruct (free) - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren...
Models: 'perplexit' | OpenRouter: Browse models on OpenRouter
Hermes 3 405B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren...
OpenAI: o1-preview: The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 models are optimized for math, science, programming, and other STEM-related tas...
GPT-4o (extended) - API, Providers, Stats: GPT-4o Extended is an experimental variant of GPT-4o with an extended max output tokens. This model supports only text input to text output. Run GPT-4o (extended) with API
Skyrim Special Edition Nexus - Mods and Community: no description found
GitHub - LouisShark/chatgpt_system_prompt: A collection of GPT system prompts and various prompt injection/leaking knowledge.: A collection of GPT system prompts and various prompt injection/leaking knowledge. - LouisShark/chatgpt_system_prompt
SillyTavern/src/endpoints/backends/chat-completions.js at staging · SillyTavern/SillyTavern: LLM Frontend for Power Users. Contribute to SillyTavern/SillyTavern development by creating an account on GitHub.

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

Hyperbolic Key Integrations

DeepSeek Ignored Provider

JSON Configuration Issues

Hyperbolic Key Confusion: A user reported having a hyperbolic key under integrations, yet it uses an unintended OR chargeable provider instead.
- They questioned whether the issue stems from different naming conventions, specifically deepseek/deepseek-chat versus hyperbolics deepseek-ai/DeepSeek-V2.5.
Inability to View Failure Details: The user expressed frustration over not being able to see details on the failure when trying to configure providers.
- They are seeking a clearer mechanism to identify why the integration fails in their setup.
Request for JSON Key Enforcement: An inquiry was made regarding whether it’s possible to explicitly force the ‘integrations’ key in JSON configurations.
- The user is looking for a method to ensure that the integration fails if the key isn’t present, indicating a desire for more robust error handling.

Sep 14, 2024 Learnings from o1 AMA

Sat, Sep 14, 2024

OpenRouter (Alex Atallah) ▷ #announcements (10 messages🔥):

OpenAI o1 Model Release

Prompt Caching

OAuth Support for VSCode

Rate Limits

Error Messages

OpenAI o1 Model Live for Everyone: The new OpenAI o1 model family is now live, allowing clients to stream all tokens at once, but initially under rate limits.
- Inquiries about experiencing 429 errors confirm that users hit the rate limit after sending 12 messages.
Prompt Caching Offers Discounts: Prompt caching now enables users to achieve latency speedups and potential 90% discounts on prompt tokens even while sharing cached items.
- This feature has been active for Anthropic and DeepSeek, with expansions to more providers anticipated soon.
OAuth Support for Coding Tools: OpenRouter introduces OAuth support for plugins such as vscode: and cursor:, allowing users to integrate their models into coding tools.
- This development supports bringing custom AI models directly to users’ IDEs for a seamless experience.
Rate Limit Updates for OpenRouter: Rate limits were updated to 30 requests per day for users, with the possibility of further increases as usage patterns are analyzed.
- This limit applies separately to the o1 and o1-mini models, enhancing access for users.
Technical Issues with Empty Responses: Users reported receiving 60 empty lines with usual completion JSON indicating a need for stability before the system settles.
- One member suggested waiting a few days to resolve issues with empty message contents and finish reasons.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): OpenAI o1 🍓 is now live for everyone to play with! (Will be very rate-limited to start). Unlike gpt-4o, it spends cycles thinking before replying. Note: on OpenRouter, streaming is supported, but a...
Llama 3.1 Euryale 70B v2.2 - API, Providers, Stats: Euryale L3.1 70B v2. Run Llama 3.1 Euryale 70B v2.2 with API

OpenRouter (Alex Atallah) ▷ #general (784 messages🔥🔥🔥):

OpenAI o1 model performance

Token consumption comparison

Rate limits for o1

Usage of o1 in coding and math

Perplexity model output rate

OpenAI o1 model performance evaluation: The OpenAI o1 model shows significantly better performance than Sonnet 3.5, especially in reasoning tasks, although it still falls short of human-level reasoning.
- Users have found that despite its strengths, the high cost and potential token consumption make it a niche tool rather than a general-purpose solution.
Token consumption and pricing discrepancies: Users are noticing discrepancies in token consumption for OpenRouter’s o1 model, with reported input token costs not matching expectations based on the prompt size.
- Specifically, one user noted that a significant amount of input resulted in unexpectedly lower token charges, raising questions about token calculation accuracy.
Rate limits for OpenRouter’s o1 models: OpenRouter has recently updated the message limit for o1 models to 30 requests per day, which users feel is still quite restrictive.
- Users are exploring how these limits affect their ability to leverage the model effectively for complex tasks.
Usage of o1 model in coding and math tasks: The o1 model seems to excel in coding and math-related tasks but has received mixed reviews regarding its responsiveness and efficiency.
- Some users suggested that its strengths lie in structured, reasoning-heavy prompts but expressed concerns about overall practicality and cost-effectiveness.
Token output rate for Perplexity model: Users were discussing the output rate of the Perplexity model, noting it generates approximately 7.90 tokens per second.
- This information was being used to calculate expected costs and efficiency compared to other models.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
o1-preview is SOTA on the aider leaderboard: Preliminary benchmark results for the new OpenAI o1 models.
Tweet from xjdr (@_xjdr): First nearly identical repro with sonnet using a long and clever system prompt and the code and math sections from the blog as ICL examples. Now on to 405B ...
Tweet from Spencer Bentley (@Foxalabs): On Wednesday, October 2nd, the default version of GPT-4o will be updated to the latest GPT-4o model, gpt-4o-2024-08-06. The latest GPT-4o modelis 50% cheaper for input tokens, 33% cheaper for output ...
Sao10K/L3.1-70B-Euryale-v2.2 - Demo - DeepInfra: Euryale 3.1 - 70B v2.2 is a model focused on creative roleplay from Sao10k. Try out API on the Web
The Tokenizer Playground - a Hugging Face Space by Xenova: no description found
Activity | OpenRouter: See how you've been using models on OpenRouter.
Wendler Sandwich GIF - Wendler Sandwich - Discover & Share GIFs: Click to view the GIF
Openai O1 | AI Playground | fal.ai: no description found
Manoj Bajpai Gangs Of Wasseypur GIF - Manoj Bajpai Gangs Of Wasseypur Sardar Khan - Discover & Share GIFs: Click to view the GIF
Settings | OpenRouter: Manage your accounts and preferences
Tweet from Clive Chan (@itsclivetime): hidden feature: o1 has cuda mode (worked btw)
markdown\n[LESS_THAN]system[GREATER_THAN]\nKnowledge cutoff[COLON] 2023[MINUS]10 - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Transforms | OpenRouter: Transform data for model consumption
Llama 3.1 Sonar 405B Online - API, Providers, Stats: Llama 3.1 Sonar is Perplexity's latest model family. Run Llama 3.1 Sonar 405B Online with API
DeepSeek V2.5 - API, Providers, Stats: DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. Run DeepSeek V2.5 with API
Models: 'perplexit' | OpenRouter: Browse models on OpenRouter
Reddit - Dive into anything: no description found
🤖 About Me Overview | OpenWebUI Community: no description found

Sep 13, 2024 o1: OpenAI's new general reasoning models

Fri, Sep 13, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

DeepSeek V2.5

DeepSeek merger

Reflection endpoint

Data privacy

DeepSeek V2.5 Launches with New Features: DeepSeek V2.5 now includes a new full-precision provider and ensures that there is no prompt logging for data-conscious users, as noted in the official announcement.
- Users can manage their data preferences through the /settings/privacy section.
DeepSeek Models Merged: The DeepSeek V2 Chat and DeepSeek Coder V2 models have been merged and upgraded into the new DeepSeek V2.5, ensuring backward compatibility with the redirection of deepseek/deepseek-coder to deepseek/deepseek-chat.
- This change simplifies the model access for users taken from OpenRouterAI announcement.
Discontinuation of Free Reflection Endpoint: Attention is drawn to the fact that the free Reflection endpoint will be disappearing soon, with the standard version continuing as long as available providers exist.
- This impending change encourages users to prepare for the transition away from free access as shared by OpenRouterAI.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): DeepSeek 2.5 now has a full-precision provider, @hyperbolic_labs! It also doesn’t log prompts, for data-conscious users. You can configure your data preferences in /settings/privacy

OpenRouter (Alex Atallah) ▷ #general (778 messages🔥🔥🔥):

OpenAI O1 Model

DeepSeek Endpoint Performance

Model Pricing and Costs

User Experiences with LLMs

Limitations of O1 and GPT-4o

OpenAI O1 Model’s Limitations: Users are expressing frustration with the OpenAI O1 model, noting its high costs and underwhelming performance, particularly in coding tasks. The model’s reliance on hidden ‘thinking tokens’ adds to users’ dissatisfaction.
DeepSeek Endpoint Performance: The performance of the DeepSeek endpoint is under scrutiny, with users noting previous downtimes and fluctuating quality. Some users are curious if the endpoint is working consistently, given recent updates.
Model Pricing and Costs: There is concern about the pricing structure for models like O1 and how it could lead to high costs for users, especially with hidden billing for tokens. Users mention that O1 costs $60 per million tokens and questions about the potential for excessive billing have arisen.
User Experiences with LLMs: The conversation highlights varied user experiences with different LLMs, with some preferring OpenAI’s models while others express preference for alternatives like Sonnet. A few users report better performance from other models despite O1’s promised advancements.
Limitations of O1 and GPT-4o: Feedback on O1 and GPT-4o indicates that while O1 is marketed with enhanced reasoning capabilities, practical tests show it may not perform significantly better than earlier iterations. Users emphasize that the results suggest a need for practical applications and improvements in these models.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Tweet from Clive Chan (@itsclivetime): hidden feature: o1 has cuda mode (worked btw)
Tweet from Spencer Bentley (@Foxalabs): On Wednesday, October 2nd, the default version of GPT-4o will be updated to the latest GPT-4o model, gpt-4o-2024-08-06. The latest GPT-4o modelis 50% cheaper for input tokens, 33% cheaper for output ...
markdown\n[LESS_THAN]system[GREATER_THAN]\nKnowledge cutoff[COLON] 2023[MINUS]10 - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Settings | OpenRouter: Manage your accounts and preferences
The More You Take The More You Leave Behind ... Riddle And Answer - Riddles.com: Riddle: The more you take, the more you leave behind. "What am I?" Answer: You take footst...
AI Inference Pricing: Basic Tier Usage : Up to 60 requests per minute for free users, 600 requests per minute for users who deposit a minimum of $10 into their accounts. If users require higher rate limit, they can contact...
[LESS_THAN]system[GREATER_THAN]You are Perplexity[COMMA] a helpful search assist - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Transforms | OpenRouter: Transform data for model consumption
Tweet from xjdr (@_xjdr): First nearly identical repro with sonnet using a long and clever system prompt and the code and math sections from the blog as ICL examples. Now on to 405B ...
Wendler Sandwich GIF - Wendler Sandwich - Discover & Share GIFs: Click to view the GIF
HuggingFaceFV/finevideo · Datasets at Hugging Face: no description found
Sao10K/L3.1-70B-Euryale-v2.2 - Demo - DeepInfra: Euryale 3.1 - 70B v2.2 is a model focused on creative roleplay from Sao10k. Try out API on the Web
Openai O1 | AI Playground | fal.ai: no description found
```markdown\n<system>\nYou are Perplexity[COMMA] a helpful search assistant trai - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Provider Routing | OpenRouter: Route requests across multiple providers
Manoj Bajpai Gangs Of Wasseypur GIF - Manoj Bajpai Gangs Of Wasseypur Sardar Khan - Discover & Share GIFs: Click to view the GIF
Certainly! You can implement the `From<&Scene>` trait for your `ECS` struct by r - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Models: 'deepsee' | OpenRouter: Browse models on OpenRouter
DeepSeek V2.5 - API, Providers, Stats: DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. Run DeepSeek V2.5 with API
Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found
rust.rs: GitHub Gist: instantly share code, notes, and snippets.
Can't use claude 3.5 sonnet with openrouter, seems like a cursor issue · Issue #1511 · getcursor/cursor: Using cursor on windows 11. Was working till very recently, at leas until last friday. If I use anthropic/claude-3.5-sonnet i get error Invalid API key. On verify API key in the model preferences, ...
DeepSeek-V2.5: A New Open-Source Model Combining General and Coding Capabilities | DeepSeek API Docs: We’ve officially launched DeepSeek-V2.5 – a powerful combination of DeepSeek-V2-0628 and DeepSeek-Coder-V2-0724! This new version not only retains the general conversational capabilities of the Chat m...

Sep 12, 2024 Pixtral 12B: Mistral beats Llama to Multimodality

Thu, Sep 12, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Novita Endpoint Outage

Novita Endpoints Suffer Outage: All Novita endpoints are currently experiencing an outage, resulting in a 403 status error for those filtering down to Novita without fallbacks.
- If you allow fallbacks, then your requests should proceed as usual.
Novita Outage Resolved: The previously reported issues with Novita endpoints have now been resolved.
- Users can expect normal functionality after the outage.

OpenRouter (Alex Atallah) ▷ #general (171 messages🔥🔥):

Tool Suggestions for Programming

Discussion on Hermes Model Pricing

Pixtral Model Capabilities

OpenRouter and Cursor Integration

Novita Service Outage

Tool Recommendations for Programming: A user inquired about programming tools, mentioning plans to utilize AWS Bedrock with Litelm for rate management and cost efficiency.
- Other users suggested tools like Aider and Cursor, with varying opinions on their effectiveness and user experience.
Confusion Over Hermes Model Pricing: There was uncertainty regarding whether the Hermes 3 model would remain free, with one user speculating a possible charge of $5/M for updated endpoints.
- Members expressed hope for improved performance once they began charging, while some suggested that free alternatives would still be available.
Pixtral Model’s Use Cases: Users discussed the capabilities of the Pixtral 12B model, determining it might only accept image inputs to produce text outputs, implying limited text processing abilities.
- The consensus seemed to suggest that it would function similarly to LLaVA, potentially offering specialized performance in image tasks.
Integrating OpenRouter with Cursor: A user faced issues using OpenRouter with Cursor, leading to discussions about configuration adjustments required for enabling model functionalities.
- Users shared insights on the existing problem in the cursor repository, highlighting the hardcoded routing issues when using specific models.
Novita Service Outage Discussion: Members reported on a temporary outage affecting the Novita service linked to OpenRouter, yielding frustrations over the unclear duration of the issue.
- Some users speculated on the reasons behind the ‘NOT_ENOUGH_BALANCE’ error, leaning towards provider-side authentication problems.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Meet Arcee-SuperNova: Our Flagship 70B Model, Alternative to OpenAI: Meet Arcee-SuperNova: a groundbreaking model with state-of-the-art abilities in instruction-following and strong alignment with human preferences.
Release v1.4.0 - Mistral common goes 🖼️ · mistralai/mistral-common: Pixtral is out! Mistral common has image support! You can now pass images and URLs alongside text into the user message. pip install --upgrade mistral_common Images You can encode images as follow...
Sao10K/L3.1-70B-Hanami-x1 · Hugging Face: no description found
mistral-community/pixtral-12b-240910 · Any Inference code?: no description found
OpenRouter Status: OpenRouter Incident History
Hermes 3 405B Instruct (free) - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren...
Can't use claude 3.5 sonnet with openrouter, seems like a cursor issue · Issue #1511 · getcursor/cursor: Using cursor on windows 11. Was working till very recently, at leas until last friday. If I use anthropic/claude-3.5-sonnet i get error Invalid API key. On verify API key in the model preferences, ...

Sep 11, 2024 not much happened today + AINews Podcast?

Wed, Sep 11, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Hermes 3 transition

Paid model announcement

Hermes 3 heading towards paid model: The standard Hermes 3 405B will transition into a paid model by the weekend, prompting users to adjust their usage.
- To continue using it for free, switch to the model slug nousresearch/hermes-3-llama-3.1-405b:free as the free variant may have limited availability.
Upcoming changes to Hermes 3 model access: Users are advised that the transition to a paid model will happen shortly, potentially affecting access to Hermes 3.
- This change is effective soon, so it’s critical to move to the specified free model slug to avoid disruptions.

Link mentioned: Hermes 3 405B Instruct (free) - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren…

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Eggu Dataset

Open Source Multilingual Models

Cost of Usage

Eggu Dataset Development: The Eggu dataset is currently in development and aims to train an open source multilingual model, with a size of 1.5GB that incorporates image positioning for compatibility with vision models.
- This dataset is intended to be used by many and faces concerns about being misused by some.
Training Costs are Relatively Low: Using OpenAI services costs approximately $2,500 in credits for just one week’s usage of resources.
- This expense is considered reasonable given the potential outputs from the dataset and models.

OpenRouter (Alex Atallah) ▷ #general (102 messages🔥🔥):

DeepSeek Models and Performance

Google Gemini Flash Rate Limits

Sonnet 3.5 Beta Issues

Costs of Hermes 3 and Llama 3 Models

AI Programming Tools Explore

DeepSeek Models Confusion: Discussions revealed confusion around DeepSeek’s models, particularly regarding endpoints for ‘coder’ vs. ‘chat’. Members noted the model IDs are set to remain free for another five days, easing migration concerns.
- Concerns about throughputs being low, with reports of performance at 1.75t/s and only 8tps for certain models.
Google Gemini Flash Rate Limit Woes: A user reported recurring rate limit issues with Google Gemini Flash 1.5, stating that their application would hit limits frequently, even with user restrictions in place. They are in communication with NVIDIA Enterprise Support to clarify compatibility and limitations.
- Concerns were raised that many are forced to use the experimental API, which presents its own limitations, as seen by the errors they faced while accessing the models.
Sonnet 3.5 Beta Outage Acknowledged: Recent outages affecting Sonnet 3.5 Beta were confirmed, with users reporting a drop in successful API interactions. Status updates from Anthropic confirmed normal success rates have returned for free users.
- Members expressed relief as access was noted to be restored; however, overarching questions about stability remain prevalent in discussions.
Hermes 3 Pricing Speculation: Participants discussed speculation surrounding the future costs of Hermes 3 405b, indicating potential anxiety about transitioning from free access. A user humorously noted how users might react to sudden charges after being accustomed to no costs.
- Conversations pointed out that while Llama 3 405B is cheaper for outputs, it also may come with trade-offs regarding performance, leading to a decision-making dilemma for many users.
Exploration of AI Programming Tools: Users discussed tools suitable for programming with mentions of Aider and Cursor, highlighting their respective features and experiences. One noted that Aider’s methodology could feel peculiar due to how it interacts with model responses.
- The dialogue reflected a broader interest in finding effective programming aids, indicating user intention to experiment with various offerings based on current cloud credits availability.

Links mentioned:

Anthropic Status: no description found
Parameters API | OpenRouter: API for managing request parameters
DeepSeek-Coder-V2 - API, Providers, Stats: DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model. It is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Run DeepSeek...
DeepSeek V2.5 - API, Providers, Stats: DeepSeek-V2 Chat is a conversational finetune of DeepSeek-V2, a Mixture-of-Experts (MoE) language model. It comprises 236B total parameters, of which 21B are activated for each token. Run DeepSeek V2....
Issues · paul-gauthier/aider: aider is AI pair programming in your terminal. Contribute to paul-gauthier/aider development by creating an account on GitHub.

Sep 09, 2024 AIPhone 16: the Visual Intelligence Phone

Tue, Sep 10, 2024

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Reflection API

Reflection-Tuning Technique

Self-Correcting AI Models

Reflection API Now Open for Playtesting: The Reflection API is now available on OpenRouter for free playtesting, with a fixed version expected soon.
- Matt Shumer noted a distinct quality difference between hosted and internal APIs, indicating the current hosted version is not fully optimized.
Introducing Reflection-Tuning Technique: The Reflection-70B model developed by Matt Shumer employs a new technique called Reflection-Tuning that enables the model to detect and correct mistakes in its reasoning.
- This model leverages synthetic data for training, enhancing its performance as noted in several sources, including a LinkedIn post.
Community Resources on Reflection 70B: Users can access various resources about the Reflection 70B model, including a Medium article that discusses its self-correcting abilities.
- There are also insightful videos available, such as a YouTube discussion with Matt Shumer about this innovative model.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Reflection's own API is now available on OpenRouter for free playtesting: https://openrouter.ai/models/mattshumer/reflection-70b:free Stay tuned for a production endpoint for the fixed version so...
no title found: no description found

OpenRouter (Alex Atallah) ▷ #app-showcase (10 messages🔥):

ISO20022

Bitcoin and CBDCs

cli_buddy GitHub project

Open Source Multi-lingual Model

OpenRouter Usage

Exploring ISO20022 for Crypto: A member highlighted the importance of ISO20022 in the context of ongoing developments in crypto, suggesting that others should investigate its implications.
- They encouraged a deeper look into this standard to understand its potential impact on financial transactions.
Bitcoin’s Incompatibility with CBDCs: Bitcoin cannot be traded with CBDCs, sparking discussions about the implications of central bank digital currencies on decentralized cryptocurrencies.
- Members shared their surprise at this limitation and its potential effects on trading dynamics.
Introducing cli_buddy for OpenRouter: A member shared a GitHub project called cli_buddy, designed to enhance interactions with OpenRouter by offering a variety of commands.
- The info command allows users to search for AI models and display credits available in OpenRouter, increasing accessibility.
Development of Open Source Multi-lingual Model: Discussions emerged regarding a dataset currently under development, with 1.5GB in size, aimed at training an open source multi-lingual model.
- This dataset combines image position data, making it suitable for integration with vision models.
Cost-effectiveness of Recent OpenAI Usage: Members compared the 1 week usage cost of OpenAI credits at roughly $2,500, considering it quite expensive in light of the other project expenses discussed.
- Participants pointed out the need for more affordable options amidst the rising costs of AI services.

Link mentioned: GitHub - rezmeplxrf/cli_buddy: Contribute to rezmeplxrf/cli_buddy development by creating an account on GitHub.

OpenRouter (Alex Atallah) ▷ #general (611 messages🔥🔥🔥):

DeepSeek Coder

Reflection Model

OpenRouter API Issues

Gemini Models

Multi-Modal Models

DeepSeek Coder experiencing issues: Users reported that the DeepSeek Coder is producing zero responses and that the API is malfunctioning, indicating potential upstream issues.
- Despite the DeepSeek status page showing no reported issues, users continue to experience problems with both the API and the OpenRouter chat.
Concerns about Reflection Model: Discussion arose regarding the legitimacy of the Reflection model, with some users expressing skepticism over its claims and performance.
- There is a desire for the model to be removed from OpenRouter due to concerns over scams and misinformation.
Errors in OpenRouter API Calls: Users encountered errors such as ‘httpx.RemoteProtocolError’ indicating that connections were prematurely closed, suggesting issues with the DeepSeek API.
- Some users are attempting to verify whether these errors stem from their own implementations or upstream problems.
Interest in AI Model Hosting: Users discussed the hosting of models on OpenRouter, noting that Euryale 2.2 is a recommended choice for RP applications, while Magnum’s lack of updates is a concern.
- The conversation included comparisons to other models and requests for reliable options for roleplaying.
Multi-Modal Model Usage: Users asked about integrating local images with multi-modal models, seeking guidance on how to format requests properly.
- Instructions on decoding images into base64 format for API requests were provided to assist users in utilizing multi-modal capabilities.

Links mentioned:

Tweet from cocktail peanut (@cocktailpeanut): OpenAI preparing to drop their new model
no title found: no description found
OpenRouter: LLM router and marketplace
Transforms | OpenRouter: Transform data for model consumption
Prompt Caching | OpenRouter: Optimize LLM cost by up to 90%
Monopoly Guy Money GIF - Monopoly Guy Money - Discover & Share GIFs: Click to view the GIF
Requests | OpenRouter: Handle incoming and outgoing requests
DeepSeek Service Status: no description found
Tweet from Matt Shumer (@mattshumer_): Quick update — we re-uploaded the weights but there’s still an issue. We just started training over again to eliminate any possible issue. Should be done soon. Really sorry about this. The amount of...
Lumen Orbit: Join Lumen Orbit in pioneering sustainable space-based data centers. Learn how we use 90% less electricity and access 24/7 solar energy. Download our white paper today!
Models: 'base>' | OpenRouter: Browse models on OpenRouter
Tweet from OpenRouter (@OpenRouterAI): Reflection's own API is now available on OpenRouter for free playtesting: https://openrouter.ai/models/mattshumer/reflection-70b:free Stay tuned for a production endpoint for the fixed version so...
Tweet from Matt Shumer (@mattshumer_): We’ve figured out the issue. The reflection weights on Hugging Face are actually a mix of a few different models — something got fucked up during the upload process. Will fix today. Quoting Matt Shu...
python-aiplatform/google/cloud/aiplatform_v1/types/tool.py at 6d1f7fdaadade0f9f6a77c136490fac58d054ca8 · googleapis/python-aiplatform: A Python SDK for Vertex AI, a fully managed, end-to-end platform for data science and machine learning. - googleapis/python-aiplatform
Llama 3.1 Euryale 70B v2.2 - API, Providers, Stats: Euryale L3.1 70B v2. Run Llama 3.1 Euryale 70B v2.2 with API
no title found: no description found
DeepSeek-Coder-V2 - API, Providers, Stats: DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model. It is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Run DeepSeek...
Reddit - Dive into anything: no description found
What is Top K? - Explaining AI Model Parameters: Today, I delve into the concept of Top K in AI, a crucial parameter that influences text generation. By limiting the AI's word choices to the top K most like...
Llama 3.1 405B (base) - API, Providers, Stats: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 405B (base) with API
Mixtral 8x7B (base) - API, Providers, Stats: A pretrained generative Sparse Mixture of Experts, by Mistral AI. Incorporates 8 experts (feed-forward networks) for a total of 47B parameters. Run Mixtral 8x7B (base) with API
Brave Search: Search the Web. Privately. Truly useful results, AI-powered answers, & more. All from an independent index. No profiling, no bias, no Big Tech.
Magnum 72B - API, Providers, Stats: From the maker of [Goliath](https://openrouter.ai/models/alpindale/goliath-120b), Magnum 72B is the first in a new family of models designed to achieve the prose quality of the Claude 3 models, notabl...
This appears to be very similar to our Atlas-1 model, but with hard coded clicks. Is that correct? · Issue #21 · OthersideAI/self-operating-computer: Hey guys we've been training a very similar multi-modal model called Atlas-1, however we don't need to hard-code click positions like it appears here, because we trained our model to find UI-e...
no title found: no description found
Change Log | DeepSeek API Docs: Version: 2024-09-05
feat: Add support for system instruction and tools in tokenization. · googleapis/python-aiplatform@72fcc06: PiperOrigin-RevId: 669058979
Change Log | DeepSeek API Docs: Version: 2024-09-05

OpenRouter (Alex Atallah) ▷ #beta-feedback (11 messages🔥):

Vertex AI Key Compatibility

JSON Formatting Issues

Google AI Studio Usage

Base64 Encoding Workaround

Vertex AI Key requires full JSON: A member noted that for the Vertex AI key, it indeed needs to be the whole JSON object, including the project_id and other details.
- This point was confirmed after some discussion about whether just the private_key would suffice.
Google AI Studio is current requirement: Members discussed limitations in using Vertex AI, confirming that as of now, one can only use Google AI Studio.
- This indicates that further fixes are necessary to expand compatibility options.
Base64 encoding suggested as solution: A clever workaround was suggested for upload issues with the JSON file: convert the whole JSON to Base64 and decode it before sending to Vertex AI.
- This method was mentioned as a stolen idea from a GitHub PR discussion.

Link mentioned: Add Vertex AI support by u-minor · Pull Request #45 · saoudrizwan/claude-dev: This PR adds support for Vertex AI in Google Cloud. At this time, the Application Default Credentials (ADC) must be set in the gcloud command to use Vertex AI. Authentication supports one of the fo…

Sep 06, 2024 Replit Agent - How did everybody beat Devin to market?

Fri, Sep 6, 2024

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

Bank Account Expansion

Infinite Dilution Concept

Infinite Bank Account Concept: A member humorously expressed a desire to condense their bank account into an infinite amount.
- This witty request sparked discussion about financial limits and possibilities.
Confusion Over Expansion vs. Condensation: Another member questioned whether condensing into an infinite amount would actually mean expanding it.
- This provoked a thought-provoking moment, prompting deeper consideration of financial concepts.
The Perils of Infinite Expansion: A member raised an important point stating that if you infinitely expand something, you can dilute it to nothingness.
- This comment cautioned against the potential downsides of pursuing infinite quantities in contexts like finance.

OpenRouter (Alex Atallah) ▷ #general (91 messages🔥🔥):

Opus vs Sonnet Performance

DeepSeek V2.5 Release

Reflection 70B Announcement

Claude Caching Feature

Model Throughput Comparisons

Opus claims better task performance than Sonnet: A member noted that Opus outperforms Sonnet on specific prompts, such as calculating angles on a digital clock display.
- Conversely, others argue that most benchmarks consistently show Sonnet as superior overall.
Launch of DeepSeek V2.5 Model: DeepSeek has merged and upgraded its Coder and Chat models into the new V2.5 version, which shows significant improvements in various performance metrics.
- For example, the ArenaHard win rate improved from 68.3% to 76.3%, enhancing both general capabilities and instruction following.
Excitement over Reflection 70B model: The new Reflection 70B model has been announced, boasting self-correcting capabilities through a technique called Reflection-Tuning.
- With the promise of a 405B version launching next week, the community anticipates it will outperform existing models.
Questions about Claude’s context caching: There are inquiries about the availability of context caching in the Claude model, with some members sharing experiences of rate limits and costs.
- It was revealed that current conditions do not allow for reduced prices via caching, although plans for implementation in the future are expected.
Concerns over model throughput: Concerns were raised about the throughput of DeepSeek models being lower than that of Sonnet 3.5, despite the advancements in the new V2.5 model.
- Some members remarked that while the model is great for personal use, its slower performance presents challenges for production cases.

Links mentioned:

GPT-4o’s Memory Breakthrough! (NIAN code): no description found
Tweet from Matt Shumer (@mattshumer_): I'm excited to announce Reflection 70B, the world’s top open-source model. Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes. 405B coming next week ...
Tweet from Matt Shumer (@mattshumer_): I'm excited to announce Reflection 70B, the world’s top open-source model. Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes. 405B coming next week ...
DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs: In large language model API usage, a significant portion of user inputs tends to be repetitive. For instance, user prompts often include repeated references, and in multi-turn conversations, previous ...
Change Log | DeepSeek API Docs: Version: 2024-09-05

OpenRouter (Alex Atallah) ▷ #beta-feedback (5 messages):

AI Studio key issues

Bug reports

Activity logging

AI Studio key doesn’t save configuration: When entering an AI Studio key, the page updates successfully but reverts back to Not Configured after entry.
- Daun.ai identified this as a potential bug and is working on a fix.
Hyperbolic and Lambda keys function properly: Despite issues with the AI Studio key, both Hyperbolic and Lambda keys are reported to have worked without problems.
- Users expressed concern regarding inconsistent behavior across different keys.
Activity logging questions raised: A user inquired about the possibility of verifying if the AI Studio key was utilized under Activity.
- This raised questions on how effectively users can monitor their key usage.

Sep 05, 2024 $1150m for SSI, Sakana, You.com + Claude 500m context

Thu, Sep 5, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Gemini Flash 8B

Gemini Flash Experiment

Pricing updates

Database downtime

Separation of providers

Launch of Gemini Flash 8B Model: The new Gemini Flash 8B (EXP) has been made available alongside the Gemini Flash Experiment.
- Both models will remain free until the pricing for AI Studio is finalized.
Separation of Google Vertex from Google AI Studio: Google Vertex has been officially separated from Google AI Studio, now recognized as two distinct providers.
- This change aims to clarify the offerings and improve user navigation.
Pricing for Gemini Experimental Models Adjusted: All Gemini Experimental models are currently free, as confirmed in the latest update.
- This adjustment aims to provide accessible resources until future pricing models are established.
Downtime Recorded Due to Database Issue: A recorded downtime of 15 minutes was experienced due to a database error, which has since been reverted.
- The issue was resolved quickly, minimizing impact on the services.

Links mentioned:

Gemini Flash 8B 1.5 Experimental - API, Providers, Stats: Gemini 1.5 Flash 8B Experimental is an experimental, 8B parameter version of the [Gemini 1. Run Gemini Flash 8B 1.5 Experimental with API
Gemini Flash 1.5 - API, Providers, Stats: Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video...

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

daun.ai launch

AI CLI tool

daun.ai celebrates launch success: Members congratulated the team behind daun.ai for their recent launch.
- Cheers and acknowledgments echoed in the chat for this significant milestone.
Exploring the All-in-one AI CLI Tool: One user expressed enthusiasm for the AI CLI tool by sigoden, describing it as an all-in-one solution featuring Chat-REPL, Shell Assistant, and more.
- The tool enables access to several models including OpenAI, Claude, and Gemini, highlighting its versatility.

Link mentioned: GitHub - sigoden/aichat: All-in-one AI CLI tool featuring Chat-REPL, Shell Assistant, RAG, AI tools & agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more.: All-in-one AI CLI tool featuring Chat-REPL, Shell Assistant, RAG, AI tools & agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more. - sigoden/aichat

OpenRouter (Alex Atallah) ▷ #general (146 messages🔥🔥):

Cache Features for Sonnet and DeepSeek

Issues with Perplexity Models

Cohere Command Model Updates

Qwen Model Provider Concerns

Downtime and Infrastructure Upgrades

Cache Features for Sonnet and DeepSeek Coming Soon: A member inquired about the arrival of cache features for Sonnet and DeepSeek, with another member indicating it may be available soon, possibly tomorrow.
- Discussion highlighted that database incidents have shifted priorities, affecting release timelines.
Issues with Perplexity Models: A user reported errors with Perplexity models, receiving a message about an invalid model, prompting a call for clarification on the issue.
- The issue was confirmed to have been introduced due to preceding bugs which were addressed quickly.
Cohere Command Model Updates Introduced: A notable update was made to the Command R models, with a streamlining of its access points and changes to model ids to ensure smoother operation.
- Users expressed excitement over the benefits of the updates, particularly regarding pricing and improved model performance.
Concerns Over Qwen Model’s Provider: Concerns were raised about Qwen’s provider, DashScope, which is not well-known among users, though it produced promising benchmarks.
- Despite uncertainties surrounding the provider, users showed eagerness to explore and test the model through available platforms.
Downtime and Infrastructure Upgrades: There has been increasing downtime reported, leading to concerns about system health and responsiveness as infrastructure upgrades continue.
- The team acknowledged the issues stemming from database limitations, with ongoing projects aimed at strengthening the backend.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
no title found: no description found
Qwen 2 7B Instruct - API, Providers, Stats: Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and grou...
Cohere Command Models: Command R models are optimized for a variety of use cases including reasoning, summarization, and question answering. Developed by Cohere and Cohere For AI.
Tweet from Qwen (@Alibaba_Qwen): To access Qwen2-VL-72B, temporarily you should use our official API in the following way:
Tweet from GitHub - FixTweet/FxTwitter: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others - FixTweet/FxTwitter
Tweet from Logan Kilpatrick (@OfficialLoganK): @DaveManouchehri Free in AI Studio. I don’t know off the top of my head if Vertex’s experimental endpoint is free or not
no title found: no description found
Issues · Pythagora-io/gpt-pilot: The first real AI developer. Contribute to Pythagora-io/gpt-pilot development by creating an account on GitHub.
CohereForAI (Cohere For AI): no description found
no title found: no description found
Pythagora (GPT Pilot) Beta - Visual Studio Marketplace: Extension for Visual Studio Code - The first real AI developer.
LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps

Sep 04, 2024 Everybody shipped small things this holiday weekend

Wed, Sep 4, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Mistral price drop

Mistral-Nemo’s Price Takes a Hit: The price of Mistral-Nemo has dropped by 23%, reflecting changes in market dynamics.
- This significant price change might indicate a shift in demand or supply for the Mistral models.
Market Reactions to Mistral-Nemo’s Price Drop: Industry analysts are keenly observing the 23% price drop of Mistral-Nemo to understand its impact on competitors.
- Some traders believe this could lead to an influx of users exploring alternative options.

Link mentioned: Mistral Nemo - API, Providers, Stats): A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chin…

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Mume AI App Launch

Feedback Request

Free Tier Availability

Mume AI App Debuts with Excitement: The Mume AI app, short for Muse Mesh, has been launched using OpenRouter as a provider, marking an exciting milestone for the developer in this burgeoning space.
- Users can explore over 100 models that offer text and image generation capabilities and vision-enabled models.
Developer Encourages Community Feedback: The developer expressed enthusiasm for receiving feedback from the community to improve Mume AI, emphasizing that it’s just the beginning of many milestones.
- It was highlighted that every bit of feedback would be valuable as the app is still in its early stage of development.
Free Tier Offers Daily Tokens: Mume AI features a free tier that provides users with tokens every day, similar to the initial experience the developer had with OpenRouter’s free tier.
- This feature encourages users to try out the app while making it accessible for a broader audience.
Cross-Platform Availability: Mume AI is accessible on both the App Store and Play Store, allowing users to download and engage with the app seamlessly.
- The app supports a range of features including multimodal learning and generating creative content through various model categories.
User-Friendly Interface Features: The app boasts a sleek interface with light and dark modes tailored to the user’s system theme, helping to maintain focus on tasks.
- Its organized structure allows users to explore models by categories like Marketing, Science, and Technology.

Links mentioned:

‎Mume AI: ‎~ Access 100+ models with chat interface, brainstorm about ideas, get creative inspiration ~ Learn from images with wide range of multimodal models that recognise images ~ Generate beautiful images f...
Mume AI - Apps on Google Play: no description found

OpenRouter (Alex Atallah) ▷ #general (83 messages🔥🔥):

Caching with Google and Claude models

Multi-turn conversations in OpenRouter

Character consistency in AI models

Using OpenRouter with Cursor and ContinueDev

Refund request for accidental charge

Caching capabilities for Google and Claude models: Members discussed the potential for caching with Google and Claude models through OpenRouter, with indications that the feature is close to being implemented.
- However, concerns were raised about cache routing due to the two endpoints not sharing the same cache.
Clarification on multi-turn conversations support: A user inquired about the support for multi-turn conversations in OpenRouter, which prompted discussions on the necessity to resend the entire chat history for maintaining continuity.
- Responses indicated that users need to handle this aspect on their end since LLMs are stateless.
Best models for character consistency in AI: A user sought advice on the best models for maintaining character consistency, mentioning that Midjourney is not satisfactory, while another suggested Segmind as a potential solution.
- The conversation highlighted the desire to create an Instagram AI influencer and ways to achieve more reliable outputs.
Challenges using OpenRouter with other providers: A member expressed issues using OpenRouter with Cursor, indicating that Cursor requires all requests to go through them for privacy concerns.
- Additional inquiries involved the difficulties faced when trying to utilize ContinueDev with OpenRouter, with documentation suggesting solutions.
Refund request for accidental charge: A user requested a refund after accidentally charging themselves $174, expressing distress about the situation.
- The request highlights the need for clear user support regarding billing issues.

Links mentioned:

OpenRouter | Continue: OpenRouter is a unified interface for commercial and open-source models, giving you access to the best models at the best prices. You can sign up here, create your API key on the keys page, and then c...
Frameworks | OpenRouter: Frameworks supporting model integration
LangChain: LangChain’s suite of products supports developers along each step of their development journey.
every-chatgpt-gui/README.md at main · billmei/every-chatgpt-gui: Every front-end GUI client for ChatGPT. Contribute to billmei/every-chatgpt-gui development by creating an account on GitHub.

Aug 31, 2024 not much happened today

Sat, Aug 31, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Gemini Flash models

Database downtime

Gemini Flash models are now available and free: The Gemini Flash 8B (EXP) model is now available at this link and the Gemini Flash Experiment can be found here.
- All Gemini Experimental models are now confirmed to be free until further pricing is determined for AI Studio.
Downtime caused by database error: A 15-minute downtime was recorded due to a database mistake, but the issue has since been reverted.
- No additional details on the impact of this downtime were provided.

Links mentioned:

Gemini Flash 8B 1.5 Experimental - API, Providers, Stats: Gemini 1.5 Flash 8B Experimental is an experimental, 8B parameter version of the [Gemini 1. Run Gemini Flash 8B 1.5 Experimental with API
Gemini Flash 1.5 - API, Providers, Stats: Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video...

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Daun.ai Launch

AI Chat CLI Tool

Congrats on Daun.ai Launch!: Excitement was expressed in the community as members congratulated the team behind Daun.ai for their recent launch.
- The sentiment reflects a growing interest and positive reception towards new AI tools.
All-in-One AI CLI Tool on GitHub: A member shared a link to the AI Chat CLI Tool, which features Chat-REPL, Shell Assistant, RAG, AI tools & agents with access to various platforms including OpenAI and Claude.
- The project is touted as a comprehensive solution for AI interactions, integrating multiple functionalities for enhanced user experience.

OpenRouter (Alex Atallah) ▷ #general (146 messages🔥🔥):

OpenRouter Feedback

Cohere Model Updates

Rate Limiting on Experimental Models

Perplexity Model Issues

Infrastructure Downtime

OpenRouter users report issues and suggestions: Users expressed concerns about default models in chat and issues with the frontend, prompting requests for improvements and direct communications with developers.
- One user noted the possibility of providing screen recordings to facilitate troubleshooting of these frontend issues.
Cohere updates bring excitement: Discussion centered around recent updates to Cohere’s Command R models, highlighting new features and pricing structures for API access.
- Users were eager to try out the new capabilities but questioned how safety modes would be handled by OpenRouter.
Experimental models experiencing rate limits: Users reported encountering rate limit errors while using experimental models, highlighting the challenges and limitations in testing these new features.
- There was discussion about the implications of needing to handle safety settings through the API and confusion regarding defaults set at the endpoint.
Perplexity model errors reported: A user reported receiving an error regarding a model that was no longer valid, suggesting issues with model IDs and availability.
- Another user confirmed that this issue was being actively addressed and to use a specific channel for further discussions.
Infrastructure upgrades amidst downtime concerns: Concerns about increasing downtime were raised, prompting responses about ongoing infrastructure upgrades intended to alleviate pressure on systems.
- Developers acknowledged recent outages, attributing them to database capacity issues, and outlined plans to improve overall system stability in the near future.

Links mentioned:

Tweet from Qwen (@Alibaba_Qwen): To access Qwen2-VL-72B, temporarily you should use our official API in the following way:
Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
no title found: no description found
Getting Started with Perplexity API - Perplexity: no description found
Qwen 2 7B Instruct - API, Providers, Stats: Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and grou...
Cohere Command Models: Command R models are optimized for a variety of use cases including reasoning, summarization, and question answering. Developed by Cohere and Cohere For AI.
Tweet from Logan Kilpatrick (@OfficialLoganK): @DaveManouchehri Free in AI Studio. I don’t know off the top of my head if Vertex’s experimental endpoint is free or not
no title found: no description found
Tweet from GitHub - FixTweet/FxTwitter: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others - FixTweet/FxTwitter
Issues · Pythagora-io/gpt-pilot: The first real AI developer. Contribute to Pythagora-io/gpt-pilot development by creating an account on GitHub.
LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps
Pythagora (GPT Pilot) Beta - Visual Studio Marketplace: Extension for Visual Studio Code - The first real AI developer.
CohereForAI (Cohere For AI): no description found

Aug 29, 2024 Cerebras Inference: Faster, Better, AND Cheaper

Thu, Aug 29, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

OpenRouter API Degradation

Llama 3.1 405B Update

OpenRouter API Degradation: OpenRouter experienced a ~5m period of API degradation, but a patch was rolled out and the incident appears recovered.
Llama 3.1 405B bf16 Endpoint: Llama 3.1 405B (base) has been updated with a bf16 endpoint.

Link mentioned: Llama 3.1 405B (base) - API, Providers, Stats: Meta’s latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 405B (base) with API

OpenRouter (Alex Atallah) ▷ #general (89 messages🔥🔥):

Hyperbolic's BF16 Llama 405B

LMSys Leaderboard

OpenRouter's DeepSeek Caching

OpenRouter's Activity Page Bar Chart

Gemini Flash-8B performance

Hyperbolic Deploys BF16 Llama 405B Base: Hyperbolic released a BF16 variant of the Llama 3.1 405B base model.
- This comes in addition to the existing FP8 quantized version on OpenRouter.
LMSys Leaderboard: Outdated?: A user discussed the LMSys leaderboard.
- They suggested that its relevance might be decreasing due to newer models like Gemini Flash performing exceptionally well.
OpenRouter’s DeepSeek Caching: Coming Soon: OpenRouter is working on adding support for DeepSeek’s context caching.
- This feature is expected to reduce API costs by up to 90%.
OpenRouter’s Activity Page Bar Chart Not Loading: Users reported issues with the activity page’s bar chart not appearing.
- This issue seems to affect specific accounts, potentially due to a frontend bug.
Gemini Flash-8B: Surprisingly Good for 8B: A user mentioned being impressed with the performance of the Gemini Flash-8B model.
- They noted that it performed comparably to larger versions of Flash and was particularly impressive in its multilingual capabilities.

Links mentioned:

Secure & reliable LLMs | promptfoo: Eliminate risk with AI red-teaming and evals used by 30,000 developers. Find and fix vulnerabilities, maximize output quality, catch regressions.
Activity | OpenRouter: See how you've been using models on OpenRouter.
OpenRouter Integration - Helicone OSS LLM Observability: no description found
GOODY-2 | The world's most responsible AI model: Introducing a new AI model with next-gen ethical alignment. Chat now.
Llama 3.1 405B (base) - API, Providers, Stats: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 405B (base) with API
Responses | OpenRouter: Manage responses from models
Llama 3.1 405B Instruct - API, Providers, Stats: The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs. Meta's latest c...
DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs: In large language model API usage, a significant portion of user inputs tends to be repetitive. For instance, user prompts often include repeated references, and in multi-turn conversations, previous ...
Llama 3.1 405B (base) - API, Providers, Stats: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 405B (base) with API
Tweet from Hyperbolic (@hyperbolic_labs): Llama 3.1 405B Base at BF16: Now Available on Hyperbolic 🦙💜 Base models are far more creative and capable than instruction-tuned models, but they’ve been underutilized—until now. ➡️ Get started bu...
Your First API Call | DeepSeek API Docs: The DeepSeek API uses an API format compatible with OpenAI. By modifying the configuration, you can use the OpenAI SDK or softwares compatible with the OpenAI API to access the DeepSeek API.

Aug 28, 2024 CogVideoX: Zhipu's Open Source Sora

Wed, Aug 28, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

API degradation

Incident Recovery

API Degradation Incident Briefly Affects Services: There was a ~5m period of API degradation that impacted service availability.
- A patch has been rolled out, and the incident appears to be fully recovered.
Prompt and Effective Incident Response: The response team quickly identified the issue during the API degradation period, ensuring minimal disruption.
- This proactive approach highlights the importance of rapid response in maintaining service integrity.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Appreciation for Team Efforts

Tweet about AI Collaboration

Team Efforts Recognized!: A member expressed gratitude towards the team for their contributions, stating, Thank you team!
- This acknowledgment highlights the collaborative spirit and the hard work put in by individuals involved.
Highlighting AI Collaboration on Twitter: A tweet was shared that showcases significant developments in AI collaboration.
- The tweet emphasizes the importance of community efforts in advancing AI technologies.

OpenRouter (Alex Atallah) ▷ #general (84 messages🔥🔥):

OpenRouter Model Fee Structure

DisTrO Distributed Training Innovation

Cerebras Pricing and Features

Context Caching in DeepSeek

Gemini Model Updates

OpenRouter model pricing and fees explained: A user inquired about whether the price per token displayed in OpenRouter includes the service fee. It was clarified that the price listed is based on OpenRouter credits and does not account for any additional fees incurred when adding credits.
DisTrO brings new hope to distributed training: A member highlighted the release of a preliminary report on DisTrO (Distributed Training Over-the-Internet) by Nous Research, which improves distributed training efficiency. It promises to drastically reduce inter-GPU communication, enabling more resilient training of large models.
Cerebras offers competitive pricing: Cerebras currently has pricing set at 10 cents per million tokens for Llama 3.1-8B and 60 cents for Llama 3.1-70B, inducing interest among community members. Discussions included potential collaboration and the platform’s continuous improvements.
OpenRouter and DeepSeek context caching: The topic of whether OpenRouter supports context caching for DeepSeek was raised, indicating a desire for improved performance and cost efficiency. It was noted that OpenRouter is awaiting further changes to support custom user segmentation for caching.
Exciting updates for Gemini models: The upcoming release of new Gemini 1.5 Flash and Pro models was discussed, with users expressing excitement about its potential features and performance. There are speculations that these updates might aim to compete with existing models like GPT-4.

Links mentioned:

Tweet from Nous Research (@NousResearch): What if you could use all the computing power in the world to train a shared, open source AI model? Preliminary report: https://github.com/NousResearch/DisTrO/blob/main/A_Preliminary_Report_on_DisTrO...
Tweet from Logan Kilpatrick (@OfficialLoganK): Today, we are rolling out three experimental models: - A new smaller variant, Gemini 1.5 Flash-8B - A stronger Gemini 1.5 Pro model (better on coding & complex prompts) - A significantly improved Gem...
Tweet from Logan Kilpatrick (@OfficialLoganK): @patricksrail Vertex is rolling out the 1.5 Flash and Pro models (not 8B) later today, should be soon!
no title found: no description found
Llama 3.1 405B (base) - API, Providers, Stats: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 405B (base) with API
Llama 3.1 405B (base) - API, Providers, Stats: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 405B (base) with API
Llama 3.1 405B Instruct - API, Providers, Stats: The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs. Meta's latest c...
Tweet from Hyperbolic (@hyperbolic_labs): Llama 3.1 405B Base at BF16: Now Available on Hyperbolic 🦙💜 Base models are far more creative and capable than instruction-tuned models, but they’ve been underutilized—until now. ➡️ Get started bu...
DeepSeek API introduces Context Caching on Disk, cutting prices by an order of magnitude | DeepSeek API Docs: In large language model API usage, a significant portion of user inputs tends to be repetitive. For instance, user prompts often include repeated references, and in multi-turn conversations, previous ...
Quick Start | DeepSeek API Docs: The DeepSeek API uses an API format compatible with OpenAI. By modifying the configuration, you can use the OpenAI SDK or softwares compatible with the OpenAI API to access the DeepSeek API.

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

Activity page indicators

Model pricing transparency

Integrations provider insights

Activity Page Needs Clear Routing Indicators: A suggestion was made to add an indicator on the activity page to show if it has been routed to an integrations provider.
- Currently, it displays $0, which may mislead users due to potential errors or simply reflect the model price for Hermes 405b.
Clarification on Model Pricing and Errors: Concerns were raised regarding the current display of $0 on the activity page, which could be caused by other errors.
- The visibility of model pricing is essential to prevent confusion and improve user experience.

Aug 27, 2024 not much happened this weekend

Tue, Aug 27, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Database Outage

Database Outage: A recent database change caused a ~2 minute outage.
- The issue has been fixed and service should be back to normal.
Apologies for inconvenience: We apologize for the inconvenience caused by the recent outage.

OpenRouter (Alex Atallah) ▷ #general (245 messages🔥🔥):

Grok-2

Grok-mini

Mistral

Claude

OpenRouter

Grok-2 and Grok-mini join the Leaderboard: Exciting news, xAI’s Grok-2 and Grok-mini are now on the LMSYS Leaderboard with over 6000 community votes!
- Grok-2 surpasses GPT-4o (May) and ties with the latest Gemini for #2 spot, while Grok-2-mini is #5, excelling in Math (#1) and #2 across the board for Hard Prompts, Coding, and Instruction-following.
Mistral struggles to scale beyond 8k: Members raised concerns that Mistral cannot be extended beyond 8k without continued pretraining, pointing to this known issue.
- They suggested further exploration of mergekit and frankenMoE finetuning to push performance boundaries.
Claude 3.5 Sonnet is down again: Users reported Claude 3.5 Sonnet is experiencing frequent outages, impacting its availability.
- While Haiku seems to be working, other models like Hermes 3.5 are also experiencing issues, leading to speculation about broader issues impacting the models.
OpenRouter’s API Key & Pricing: Users are discussing how to add their own API keys to OpenRouter and if the displayed token pricing reflects the total cost, including the OpenRouter fee.
- It was clarified that the token price displayed is in OpenRouter credits, and the fee is automatically calculated when adding credits to the account.
Exploring Open Source Models: DeepSeek and Codestral: Members discussed the strengths and limitations of DeepSeek Coder V2, highlighting its exceptional performance-to-cost ratio for coding from scratch but its weaknesses in understanding and refactoring existing code.
- Codestral 22B from Mistral is also mentioned as a strong contender for open weights coding model, currently available for free via API.

Links mentioned:

The Novel Writing Toolbox: no description found
Introducing Zed AI - Zed Blog: Powerful AI-assisted coding powered by Anthropic's Claude, now available.
Aider LLM Leaderboards: Quantitative benchmarks of LLM code editing skill.
CausalLM/miniG · Hugging Face: no description found
Integrations | OpenRouter: Use your own provider keys with OpenRouter
Buy Turing Pi 2, mini ITX cluster board - for sale: The Turing Pi 2 is a 4-node mini ITX cluster board with a built-in Ethernet switch that runs Turing RK1, Raspberry Pi CM4 or Nvidia Jetson compute modules
Activity | OpenRouter: See how you've been using models on OpenRouter.
Tweet from Nous Research (@NousResearch): What if you could use all the computing power in the world to train a shared, open source AI model? Preliminary report: https://github.com/NousResearch/DisTrO/blob/main/A_Preliminary_Report_on_DisTrO...
Prompt caching with Claude: Prompt caching, which enables developers to cache frequently used context between API calls, is now available on the Anthropic API. With prompt caching, customers can provide Claude with more backgrou...
Meta: Llama 3.1 70B Instruct – Recommended Parameters: Check recommended parameters and configurations for Meta: Llama 3.1 70B Instruct - Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned...
Code generation | Mistral AI Large Language Models: Codestral
Is finetuning GPT4o worth it?: How Cosine Genie reached 50% on SWE-Bench Lite, 30% on the full SWE-Bench, and 44% on OpenAI's new SWE-Bench Verified, all state of the art results by the widest ever margin recorded.
Meta: Llama 3.1 405B (base) – Recent Activity: See recent activity and usage statistics for Meta: Llama 3.1 405B (base) - Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This is the base 405B pre-train...
Tweet from lmsys.org (@lmsysorg): Chatbot Arena update❤️‍🔥 Exciting news—@xAI's Grok-2 and Grok-mini are now officially on the leaderboard! With over 6000 community votes, Grok-2 has claimed the #2 spot, surpassing GPT-4o (May)...
anthropic-cookbook/misc/prompt_caching.ipynb at main · anthropics/anthropic-cookbook: A collection of notebooks/recipes showcasing some fun and effective ways of using Claude. - anthropics/anthropic-cookbook
Provider Routing | OpenRouter: Route requests across multiple providers
Settings | OpenRouter: Manage your accounts and preferences
Llama 3 Soliloquy 7B v3 32K - API, Providers, Stats: Soliloquy v3 is a highly capable roleplaying model designed for immersive, dynamic experiences. Trained on over 2 billion tokens of roleplaying data, Soliloquy v3 boasts a vast knowledge base and rich...

Aug 23, 2024 Nvidia Minitron: LLM Pruning and Distillation updated for Llama 3.1

Sat, Aug 24, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Model Deprecation

Yi Model

Hermes Model

Mistral Model

Llama 2

Several Models Deprecate: Several models are no longer accessible, effective 8/28/2024, due to the model provider’s deprecation decision.
- The models affected include 01-ai/yi-34b, 01-ai/yi-6b, phind/phind-codellama-34b, nousresearch/nous-hermes-2-mixtral-8x7b-sft, open-orca/mistral-7b-openorca, allenai/olmo-7b-instruct, meta-llama/codellama-34b-instruct, meta-llama/codellama-70b-instruct, meta-llama/llama-2-70b-chat, meta-llama/llama-3-8b, and meta-llama/llama-3-70b.
Yi Model Deprecation: The base versions of the Yi model, 01-ai/yi-34b and 01-ai/yi-6b, are no longer available.
- This includes the base versions of the Yi model, 01-ai/yi-34b and 01-ai/yi-6b.
Hermes Model Deprecation: The nousresearch/nous-hermes-2-mixtral-8x7b-sft model has been deprecated.
- This specific model, nousresearch/nous-hermes-2-mixtral-8x7b-sft, is no longer available.
Mistral Model Deprecation: The open-orca/mistral-7b-openorca model has been deprecated.
- The open-orca/mistral-7b-openorca model is no longer accessible.
Llama 2 and Llama 3 Deprecation: The meta-llama/llama-2-70b-chat, meta-llama/llama-3-8b (base version), and meta-llama/llama-3-70b (base version) models have been deprecated.
- The meta-llama/llama-2-70b-chat, meta-llama/llama-3-8b (base version), and meta-llama/llama-3-70b (base version) models are no longer available.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

OpenRouter Team's work

Oz’s team’s current project: A user asked Oz about their team’s current projects and their work.
- Oz did not reply with any information about their team’s work.
No Further Information: No further information was provided regarding the OpenRouter team’s current projects or work.

OpenRouter (Alex Atallah) ▷ #general (104 messages🔥🔥):

OpenRouter Pricing

OpenRouter Token Counting

OpenRouter Model Deprecations

Llama 2

Grok 2

OpenRouter accidentally charges user $0.01: A new OpenRouter user, unfamiliar with English, accidentally clicked on a paid model after intending to use a free model, leaving a $0.01 balance.
- They asked for help in understanding how to pay and a member assured them that OpenRouter won’t sue over a $0.01 balance.
OpenRouter’s token counting mystery: A member inquired about OpenRouter’s method for counting input tokens, noting that a simple “hey” prompt resulted in a charge for over 100 input tokens.
- Several members clarified that OpenRouter simply forwards token counts from OpenAI’s API for GPT-4o models, while the count can be affected by system prompts, tool calls, and the inclusion of previous messages in the chat history.
OpenRouter deprecating models: Together AI is deprecating several models, including some available as dedicated endpoints, and will remove them in six days.
- The deprecation policy is outlined on Together AI’s website and users will be notified by email with options to migrate to newer models.
Llama 2 70b launched: Alex Atallah confirmed that Llama 2 70b has been launched but not yet formally announced.
- The model is available on OpenRouter and other platforms, with discussion on its performance and availability.
Grok 2 on LMSYS Leaderboard: Grok 2 and Grok-mini have been added to the LMSYS leaderboard, with Grok 2 currently ranked #2, surpassing GPT-4o (May) and tying with Gemini.
- Grok 2 excels in math and ranks highly across other areas, including hard prompts, coding, and instruction-following, showcasing its capabilities.

Links mentioned:

Deprecations: Overview We regularly update our platform with the latest and most powerful open-source models. This document outlines our deprecation policy and provides information on migrating from deprecated mode...
Tiktokenizer: no description found
Yi 1.5 34B Chat - API, Providers, Stats: The Yi series models are large language models trained from scratch by developers at [01.AI](https://01. Run Yi 1.5 34B Chat with API
Tweet from lmsys.org (@lmsysorg): Chatbot Arena update❤️‍🔥 Exciting news—@xAI's Grok-2 and Grok-mini are now officially on the leaderboard! With over 6000 community votes, Grok-2 has claimed the #2 spot, surpassing GPT-4o (May)...
Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Activity | OpenRouter: See how you've been using models on OpenRouter.
Hermes 3 405B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren...

Aug 23, 2024 super quiet day

Fri, Aug 23, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

OpenRouter tool parameters

OpenRouter deprecates function_calls and functions: OpenRouter is officially deprecating the function_calls and functions parameters from OpenAI calls.
- This is due to OpenAI deprecating them for a long time and the recommended parameters being tools and tool_choice.
Reduced Switching Costs for Tool Calling: This change reduces the switching cost when using tool calling between models on OpenRouter.
- This is because every other provider only supports the tools and tool_choice parameters.

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

BenchmarkAggregator

LLM Evaluation Framework

Oz's Projects

BenchmarkAggregator: Comprehensive LLM Evaluation: A member shared a GitHub repository for a project called BenchmarkAggregator, which aims to offer a comprehensive, fair, and scalable evaluation framework for Large Language Models across all major benchmarks.
- They described it as a unified view of model performance, balancing thorough assessment with practical resource management, and were eager for feedback.
Oz’s Current Projects: A member inquired about the current projects being built by a user known as “Oz.”
- They specifically mentioned that their team was interested in learning more about what Oz has been working on.

Links mentioned:

React App: no description found
GitHub - mrconter1/BenchmarkAggregator: Comprehensive LLM evaluation framework: GPQA Diamond to Chatbot Arena. Tests all major models equally, easily extensible.: Comprehensive LLM evaluation framework: GPQA Diamond to Chatbot Arena. Tests all major models equally, easily extensible. - mrconter1/BenchmarkAggregator

OpenRouter (Alex Atallah) ▷ #general (95 messages🔥🔥):

Llama 3.1 Tools

OpenRouter MoE

OpenRouter context limits

OpenAI fine-tuning

Cursor Composer

Llama 3.1 Tools Support is Imminent: A user asked about the status of Llama 3.1 tools support on OpenRouter, and an admin confirmed that it’s coming soon, likely within the next day or so.
OpenRouter MoE and 3.5-Mini: A user inquired about the availability of MoE models on OpenRouter, noting that 3.5-Mini is unimpressive.
- The admin responded that there is no host or model for MoE on OpenRouter yet.
OpenRouter’s Hermes 3 70B Context Limitation: A user reported that the Hermes 3 70B model on OpenRouter has a 12k context window limit, despite both the model and provider claiming a 120k+ context size.
- The admin confirmed the 12k context limit, noting that it’s not just for output but also for the input, and is likely a limitation imposed by the provider.
OpenAI Fine-Tuning for Free: A user mentioned that OpenAI now offers free fine-tuning for its models with a 2M token limit per day for a limited time.
- Another user stated that they’ve been using OpenRouter exclusively after giving up on OpenAI’s API because of its lack of support for payment methods like PayPal or crypto.
Cursor Composer vs Aider: A user expressed their enthusiasm for Cursor’s Composer feature, describing it as ‘insane’ for their use case.
- Another user disagreed, preferring Aider’s output, but acknowledged that they have to pay for both Cursor and API credits.

Links mentioned:

Cursor - Built Software Faster: no description found
Foundation models: no description found
Yi 1.5 34B Chat - API, Providers, Stats: The Yi series models are large language models trained from scratch by developers at [01.AI](https://01. Run Yi 1.5 34B Chat with API
Cursor Community Forum: A place to discuss Cursor (bugs, feedback, ideas, etc.)
Hermes 3 70B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, rea...
ai21labs/AI21-Jamba-1.5-Large · Hugging Face: no description found
Deprecations: Overview We regularly update our platform with the latest and most powerful open-source models. This document outlines our deprecation policy and provides information on migrating from deprecated mode...
GitHub - mrconter1/BenchmarkAggregator: Comprehensive LLM evaluation framework: GPQA Diamond to Chatbot Arena. Tests all major models equally, easily extensible.: Comprehensive LLM evaluation framework: GPQA Diamond to Chatbot Arena. Tests all major models equally, easily extensible. - mrconter1/BenchmarkAggregator
ai21labs/AI21-Jamba-1.5-Mini · Hugging Face: no description found

Aug 22, 2024 Ideogram 2 + Berkeley Function Calling Leaderboard V2

Thu, Aug 22, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Hermes 3

OpenAI deprecated parameters

Hermes 3 70B Released!: OpenRouter has announced the release of Hermes 3, a 70B parameter model based on LLaMA 3.1, which can be accessed at https://openrouter.ai/models/nousresearch/hermes-3-llama-3.1-70b.
OpenAI Parameters Deprecation: OpenRouter has officially deprecated function_calls and functions parameters from OpenAI calls.

OpenRouter (Alex Atallah) ▷ #general (138 messages🔥🔥):

Hermes 3

Phi 3.5

Phi 3.5 - Vision Model

Azure Pricing

GPT-4o Finetuning

Hermes 3: Llama 3.1-70b on OpenRouter: A new model called Hermes 3, based on Llama 3.1-70b, has been released on OpenRouter.
- It costs $0.4/$0.4 for input and output tokens.
Microsoft Releases Phi 3.5 Model Family: Microsoft has released a new family of models called Phi 3.5, including a vision model, a MoE model, and a mini instruct model.
- The vision model focuses on high-quality, reasoning-dense data in text and vision, while the MoE model is lightweight and powerful, but its pricing on Azure is still unclear.
OpenAI Now Allows GPT-4o Finetuning: OpenAI has announced that GPT-4o is now finetunable by all users.
- This allows for 2M tokens of free finetuning per day for a limited time.
OpenRouter Faces Provider and Model Issues: Some users are experiencing issues with the performance of Llama 3.1 70b on OpenRouter.
- This appears to be related to the DeepInfra provider and there is discussion about how different providers can affect a model’s performance.
RAG Cookbook & Building Your First RAG: A good RAG cookbook is available on GitHub for users looking to create their own retrieval augmented generation systems.
- One user shares their approach to building a RAG system, using LangChain doc loaders, Qdrant, OpenAI embeddings, and Llama 3 8B.

Links mentioned:

Upload and share screenshots and images - print screen online | Snipboard.io: Easy and free screenshot and image sharing - upload images online with print screen and paste, or drag and drop.
Tweet from Logan Kilpatrick (@OfficialLoganK): We just increased the max PDF page upload size to 1,000 pages or 2GB (up from 300 pages) in Google AI Studio and the Gemini API. 🗒️ We use both text understanding and the native multi-modal capabili...
Azure AI Studio: no description found
Info: The Galápagos Islands in the eastern Pacific are a
Tweet from Sebastien Bubeck (@SebastienBubeck): I'm super excited by the new eval released by Scale AI! They developed an alternative 1k GSM8k-like examples that no model has ever seen. Here are the numbers with the alt format (appendix C): GP...
GroqCloud: Experience the fastest inference in the world
Settings | OpenRouter: Manage your accounts and preferences
Phi-3.5 Mini 128K Instruct - API, Providers, Stats: Phi-3.5 models are lightweight, state-of-the-art open models. Run Phi-3.5 Mini 128K Instruct with API
Hermes 3 70B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, rea...

Aug 21, 2024 not much happened today

Wed, Aug 21, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Hermes 3

Hermes 3 Released: Hermes 3, a 70B parameter model, has been released.
- You can say hi to it at OpenRouter.

Link mentioned: Hermes 3 70B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, rea…

OpenRouter (Alex Atallah) ▷ #general (84 messages🔥🔥):

GPT Functions

OpenRouter Model Support

German Pretraining

Mistral

Multilingual Models

GPT Functions on OpenRouter: A user asked if GPT functions are still supported on OpenRouter, as they are receiving ‘function_call=None’ even though the stop reason is ‘functioncall’.
- The OpenRouter team confirmed that better tool call routing is coming soon, but currently, results may vary unless using OpenAI, Anthropic, or Google models.
Mistral Large Instruct 2407 for German: A user inquired about a model with good German pretraining and was suggested to try Mistral-Large-Instruct-2407, which is multi-lingual by design and supports German.
- The user tested the model but found it to be ‘okay’ but not great, and further suggested checking Hugging Face for other models.
OpenAI Assistant Embedding on WordPress: A user asked for guidance on embedding an OpenAI assistant, including docs and instructions, on a WordPress website.
- The user mentioned that WordPress supports the straight API but not the assistant API, and requested advice on go-to services or open-source options for embedding.
OpenRouter Error with Non-Free Models: Users reported encountering an error when trying to access non-free models on OpenRouter, specifically getting a ‘client-side exception’ and needing to hard refresh the browser.
- The OpenRouter team investigated and determined that the issue was related to access token expiration and potentially CORS errors, and ultimately resolved the issue.
OpenRouter Uncensored Models: A user inquired about uncensored models on OpenRouter.
- It was suggested that ‘open source’ and ‘roleplay’ tags are good indicators for models that may produce NSFW content, with popular options including Dolphin, Stheno, Euryale, and MythoMax.

Links mentioned:

no title found: no description found
no title found: no description found
Azure AI Studio: no description found
mistralai/Mistral-Large-Instruct-2407 · Hugging Face: no description found
Info: Mind-blowing beauty like the kind pictured in toda
microsoft/Phi-3.5-vision-instruct · Hugging Face: no description found
Hermes 3 70B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, rea...
microsoft/Phi-3.5-MoE-instruct · Hugging Face: no description found
Models | OpenRouter: Browse models on OpenRouter

Aug 20, 2024 The DSPy Roadmap

Tue, Aug 20, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Hermes 3

GPT-4

Perplexity Huge

Model Launches

Quantization

Hermes 3 405B is free this weekend!: Hermes 3 405B is free for a limited time, with 128k context, courtesy of Lambda Labs.
- Check it out at this link.
GPT-4 extended is now on OpenRouter: You can now use GPT-4 extended output (alpha access) through OpenRouter.
- This is capped at 64k max tokens.
Perplexity Huge is now the largest online model on OpenRouter: Perplexity Huge launched 3 days ago and is the largest online model on OpenRouter.
- Check out this link for more information.
This week saw a ton of new model launches on OpenRouter: There were 10 new model launches this week, including GPT-4 extended, Perplexity Huge, Starcannon 12B, Lunaris 8B, Llama 405B Instruct bf16 and Hermes 3 405B.
- See the full list at this link.
Quantization has a big impact on performance: Quantization can massively degrade the performance of 405B models, according to @hyperbolic_labs.
- They recommend reaching out to them if you are concerned about performance, as they offer alternative solutions.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Welcome to Hermes 3 405B, from @NousResearch! It's free for a limited time, including 128k context! Courtesy of @LambdaAPI:
Hermes 3 405B Instruct (extended) - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren...
ChatGPT-4o - API, Providers, Stats: Dynamic model continuously updated to the current version of [GPT-4o](/models/openai/gpt-4o) in ChatGPT. Intended for research and evaluation. Run ChatGPT-4o with API
Tweet from OpenRouter (@OpenRouterAI): You can now use GPT-4o extended output (alpha access) through OpenRouter! 64k max tokens
Tweet from OpenRouter (@OpenRouterAI): ICMI: Perplexity Huge launched 3 days ago This is the largest online model on OpenRouter
Mistral Nemo 12B Starcannon - API, Providers, Stats: Starcannon 12B is a creative roleplay and story writing model, using [nothingiisreal/mn-celeste-12b](https://openrouter.ai/models/nothingiisreal/mn-celeste-12b) as a base and [intervitens/mini-magnum-...
Llama 3 8B Lunaris - API, Providers, Stats: Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge. R...
Tweet from OpenRouter (@OpenRouterAI): By popular request, we're adding a Llama 405B Instruct bf16, plus a quantization filter! Now you can filter down providers for a model by the level of quantization they offer, including via API. ...

OpenRouter (Alex Atallah) ▷ #general (240 messages🔥🔥):

SearchGPT waitlist

Hermes 405B

OpenRouter Auto router struggles

OpenRouter budget model

Hermes 3 405B

SearchGPT waitlist full: Users shared they received waitlist denial emails for OpenAI’s SearchGPT, indicating they’ve run out of spots.
Free Hermes 405B Overload: A user joked that they hope the free Hermes 405B model will face the same overload fate as other models that have become inaccessible due to popularity.
Auto Router Struggles: A user reported difficulty using OpenRouter’s Auto router, encountering an error message preventing them from continuing conversations.
- Another user suggested switching to Claude Sonnet 3.5 self-moderated and offered to look into the issue next week.
Budget Model Recommendation: A user sought a budget-friendly model for a quick project, with a maximum budget of $5 and a need for limited replies and basic conversation capabilities.
- Other users recommended GPT-4o-mini or GPT-4o for simplicity and suggested alternative models like Llama-3.1-sonar-large-128k-chat for a middle ground.
Hermes 3 405B Extended Variant: Users discussed the extended variant of Hermes 3 405B, noting its slower performance compared to the standard version, despite having a larger context length.
- Other users pointed out that the extended version is showing the top endpoint’s serviceable context length and that this may be a confusing edge case.

Links mentioned:

Integrations (Beta) | OpenRouter: Bring your own provider keys with OpenRouter
Tweet from TestingCatalog News 🗞 (@testingcatalog): A missing Gemini announcement was published. Now it is expected to be better at coding and reasoning. In particular: "multi-step logical challenges that require more expertise" gemini-1.5-pr...
Discord | Markdown Guide: Discord is a popular free messaging and team collaboration application.
‎Gemini Apps’ release updates & improvements: Explore the latest updates from Gemini Apps - including improvements in generative AI capabilities, expanded access, and more.
Provider Routing | OpenRouter: Route requests across multiple providers
OpenRouter: LLM router and marketplace
Reddit - Dive into anything: no description found
Hermes 3 405B Instruct (extended) - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren...
Models | OpenRouter: Browse models on OpenRouter
OpenAI GPT Models: Model Input Price Output Price OpenAI GPT-4o-mini $0.15 $0.60 OpenAI GPT-4o-2024-08-06 $2.50 $10.00 Mistral Models Model Input Price Output Price mistral-large-2407 $3.00 $9.00 open-mistral-nemo-2407 ...
Hermes 3 405B Instruct - API, Providers, Stats: Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coheren...
model xe/hermes3 doesn't correctly parse tool call tokens · Issue #6390 · ollama/ollama: What is the issue? I uploaded Hermes3 to Ollama here. The problem is that it isn't parsing the tool call syntax. Hermes tool call syntax roughly looks like this: <tool_call> {"name"...
Reddit - Dive into anything: no description found
no title found: no description found
GPT-4o (2024-08-06) - API, Providers, Stats: The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai. Run GPT-4o (2024-08...
GPT-4o-mini - API, Providers, Stats: GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples ...
Llama 3.1 Sonar 70B - API, Providers, Stats: Llama 3.1 Sonar is Perplexity's latest model family. Run Llama 3.1 Sonar 70B with API

Aug 16, 2024 not much happened today

Fri, Aug 16, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Flavor of the Week Model Removal

OpenRouter Removes “Flavor of the Week” Model: OpenRouter is planning to remove the “Flavor of the Week” model next week due to low usage.
- The model is available at https://openrouter.ai/models/openrouter/flavor-of-the-week and OpenRouter is asking for feedback on the experiment.
OpenRouter Asks for Feedback on Flavor of the Week Experiment: OpenRouter is seeking feedback from users about the “Flavor of the Week” experiment.
- They are asking for input on whether the experiment was successful or not, and what could be improved.

Link mentioned: Flavor of The Week - API, Providers, Stats: This is a router model that rotates its underlying model weekly. It aims to be a simple way to explore the capabilities of new models while using the same model ID. Run Flavor of The Week with API

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

4oSo agent

OpenRouter

Claude 3.5 Sonnet

GPT-4o

4oSo agent combines GPT-4o and Claude 3.5 Sonnet: 4oSo is a “mixture of agents” approach that combines GPT-4o with Claude 3.5 Sonnet.
- This approach runs on OpenRouter.
Pre-emptive Thread Creation: A member created a pre-emptive thread to avoid cluttering the channel.

OpenRouter (Alex Atallah) ▷ #general (120 messages🔥🔥):

OpenRouter Arena

OpenRouter LLM API

OpenRouter LLM Model Availability

OpenRouter Privacy

OpenRouter PDF upload

OpenRouter Arena has problems with judging LLM performance: Some members are concerned that the OpenRouter Arena may not be a reliable judge of LLM performance due to the lack of clear details on testing methodologies and the possibility of bias from users with varying levels of expertise.
OpenRouter integrates prompt caching: OpenRouter will be integrating prompt caching into its API, which will allow for significant improvements in performance and cost efficiency, particularly for repetitive tasks and prompts with consistent elements.
OpenRouter has a new LLM model: Hermes 3: Nous Research has released their Hermes 3 models (8B, 70B, 405B) and they are now available on OpenRouter.
OpenRouter struggles with PDF upload: Some members are reporting that they cannot upload PDF files to OpenRouter for models to interact with, although the platform supports image uploads.
OpenRouter API key integration still in beta: A new member expressed interest in integrating their own API keys for services like DeepSeek on OpenRouter, which is currently in beta.

Links mentioned:

no title found: no description found
NousResearch/Hermes-3-Llama-3.1-405B · Hugging Face: no description found
MTEB Leaderboard - a Hugging Face Space by mteb: no description found
Prompt Caching (beta) - Anthropic: no description found
OpenRouter: LLM router and marketplace
no title found: no description found
Voyage: no description found
NousResearch/Hermes-3-Llama-3.1-8B · Hugging Face: no description found
Reader API: Read URLs or search the web, get better grounding for LLMs.
Perplexity Labs: no description found
Anthropic Status: no description found
Meta: Llama 3.1 8B Instruct (free) – Run with an API: Sample code and API for Meta: Llama 3.1 8B Instruct (free) - Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and ef...
Llama 3.1 8B Instruct (free) - API, Providers, Stats: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 8B Instruct (free) with API
Reddit - Dive into anything: no description found
OpenRouter Status: OpenRouter Incident History
Groq is Fast AI Inference: The LPU™ Inference Engine by Groq is a hardware and software platform that delivers exceptional compute speed, quality, and energy efficiency. Groq provides cloud and on-prem solutions at scale for AI...

Aug 15, 2024 Grok 2! and ChatGPT-4o-latest confuses everybody

Thu, Aug 15, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

louisgv: ChatGPT-4o-latest is now available: https://openrouter.ai/models/openai/chatgpt-4o-latest

OpenRouter (Alex Atallah) ▷ #general (127 messages🔥🔥):

AgentQ

Infer

OpenRouter Pricing

ChatGPT-4o-Latest

Codeium

AgentQ blows Llama 3 out of the water!: A new model named AgentQ claims to be 340% better than Llama 3 70B BASE but has no comparisons to 3.1, 405b, Mistral Large 2, or Claude 3.
- This company, Infer, seems to give zero fucks about OpenRouter revenue and has published a paper on AgentQ.
ChatGPT-4o-Latest is just a new alias: ChatGPT-4o-Latest is just a new handy alias for gpt-4o-2024-08-06, which is already on OpenRouter.
- However, many members are still confused by the model’s optimization for ChatGPT and lack of proper documentation.
OpenAI’s sidebars are missing icons: Members discussed changes in the sidebars of platform.openai.com.
- One member reported that two icons disappeared from the sidebar: one for threads and another one for messages.
Grok 2 has arrived! (and is surprisingly good): Grok 2, an early version of xAI’s model, has secured the #3 spot on the LMSys Arena leaderboard.
- It excels in Coding, Hard Prompts, and Math and even matches GPT-4o on the leaderboard.
Anthropic’s Prompt Caching: the future of efficient AI?: Anthropic has just released prompt caching for their Claude 3 models.
- This feature can reduce costs by up to 90% and latency by up to 85% for long prompts, and could be integrated into OpenRouter.

Links mentioned:

Prompt caching with Claude: Prompt caching, which enables developers to cache frequently used context between API calls, is now available on the Anthropic API. With prompt caching, customers can provide Claude with more backgrou...
ChatGPT-4o - API, Providers, Stats: Dynamic model continuously updated to the current version of [GPT-4o](/models/openai/gpt-4o) in ChatGPT. Intended for research and evaluation. Run ChatGPT-4o with API
no title found: no description found
Tweet from lmsys.org (@lmsysorg): Woah, another exciting update from Chatbot Arena❤️‍🔥 The results for @xAI’s sus-column-r (Grok 2 early version) are now public**! With over 12,000 community votes, sus-column-r has secured the #3 s...
OpenRouter Status: OpenRouter Incident History
GPT-4o (2024-08-06) - API, Providers, Stats: The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai. Run GPT-4o (2024-08...
Dream Bigger: The Codeium mission, Cortex and Forge launches, and detailed vision.

Aug 14, 2024 Gemini Live

Wed, Aug 14, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Gemini Flash 1.5

GPT-4o Extended

OpenRouter Pricing

Gemini Flash 1.5 Price Drop: The input token costs for Gemini Flash 1.5 have decreased by 78% and the output token costs have decreased by 71%.
- This makes the model more accessible and affordable for a wider range of users.
GPT-4o Extended Early Access Launch: Early access has just launched for GPT-4o Extended through OpenRouter.
- You can access it via this link: https://x.com/OpenRouterAI/status/1823409123360432393
GPT-4o Extended Output Limit: The maximum number of tokens allowed for GPT-4o Extended output is 64k.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): You can now use GPT-4o extended output (alpha access) through OpenRouter! 64k max tokens
Gemini Flash 1.5 - API, Providers, Stats: Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video...

OpenRouter (Alex Atallah) ▷ #general (80 messages🔥🔥):

Gemini Flash Price Updates

GCP Cost Table

Token:Character Ratio

Euryale 70B Downtime

Infermatic Downtime

Gemini Flash Prices on OpenRouter?: A user inquired about the new Gemini flash prices on OpenRouter and when they will be updated.
- A user mentioned their GCP cost table already reflects the new pricing, suggesting it’s up to OpenRouter to implement the update.
OpenRouter’s Update Hurdle: OpenRouter’s update was blocked by the new 1:4 token:character ratio from Gemini, which doesn’t map cleanly to the max_tokens parameter validation.
- Another user expressed frustration about the constantly changing token:character ratio and suggested switching to a per-token pricing system.
Euryale 70B Issues?: A user reported that Euryale 70B was down for some users but not for them, prompting questions about any issues or error rates.
- Further discussion revealed multiple instances of downtime, including a 10-minute outage due to an update and possible ongoing issues with location availability.
Model Performance Comparison: Users compared the performance of Groq 70b and Hyperbolic, finding nearly identical results for the same prompt.
- This led to a discussion about the impact of FP8 quantization, with some users noting that it makes a minimal difference in practice, but others pointing to potential degraded quality with certain providers.
ChatGPT 4.0 Default Setting Change: A user expressed concern that the “middle-out” setting is no longer the default for ChatGPT 4.0, which impacts function calling for their frontends.
- The user requested suggestions for setting this parameter in the system prompt using platforms like Ollama and a Wordpress plugin.

Links mentioned:

Tweet from MultiOn (@MultiOn_AI): Announcing our latest research breakthrough: Agent Q - bringing next-generation AI agents with planning and AI self-healing capabilities, with a 340% improvement over LLama 3's baseline zero-sho...
Tweet from Tibor Blaho (@btibor91): Someone pressed publish on too many new (and test?) articles too early in the last hours - expect GPT-4o system card, SWE-bench Verified, new customer stories, and more soon - Collaborating with The ...
deepseek-ai (DeepSeek): no description found
Dream Bigger: The Codeium mission, Cortex and Forge launches, and detailed vision.
Llama 3 8B Lunaris - API, Providers, Stats: Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge. R...
Mistral Nemo 12B Starcannon - API, Providers, Stats: Starcannon 12B is a creative roleplay and story writing model, using [nothingiisreal/mn-celeste-12b](https://openrouter.ai/models/nothingiisreal/mn-celeste-12b) as a base and [intervitens/mini-magnum-...
Llama 3 Euryale 70B v2.1 - API, Providers, Stats: Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi. Run Llama 3 Euryale 70B v2.1 with API

Aug 12, 2024 a quiet weekend

Tue, Aug 13, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Perplexity Model Updates

Llama3 Model Transition

Perplexity Models Going Offline: Several Perplexity models will be inaccessible after 8/12/2024, including llama-3-sonar-small-32k-online and llama-3-sonar-large-32k-chat as noted in the Changelog.
- Users are advised to prepare for these changes to ensure continuity in their model usage.
Transitioning to Llama3-based Sonar Models: Effective immediately, the online and chat models will redirect to their Llama3-based Sonar counterparts, including llama-3.1-sonar-small-128k-online and llama-3.1-sonar-large-128k-chat.
- This change aims to enhance user experience and performance in model capabilities.

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

OpenRouter Command Line Integration

Automation with Bash Scripts

UI for Agent Frameworks

OpenRouter hits the command line with Bash: A user shared a detailed guide and script to integrate OpenRouter into the command line using pure Bash, supporting piping and chaining.
- This script works across various platforms, including Raspberry Pi and Android’s Termux, with an aim to automate devices through a plan -> execute -> review workflow.
Automation insights from long experimentation: The creator expressed gratitude for the positive feedback and noted that creating the script without dependencies took months of experimentation with multiple programming languages.
- Key insights involved using XML for simpler parsing in Bash and the concept of outputting <bot>response</bot>, along with ideas around creating a ‘mixture of experts’ with the --model flag.
Testing automation on smart devices: The developer plans to test the Bash script on a smart watch this week, aiming to explore gesture-based interactions and further agentizing capabilities.
- “I hope this helps someone!” was a sentiment shared, highlighting their hope to assist others in automation.
Interest in a UI for agent frameworks: Another user expressed interest in a user interface that resembles text-based applications like htop or weechat.
- This highlights the ongoing desire for more user-friendly tools to manage agent frameworks.

OpenRouter (Alex Atallah) ▷ #general (209 messages🔥🔥):

Gemini Flash Pricing Updates

Model Performance Issues

Token Usage Concerns

AI Tool Recommendations

Free API Options

Gemini Flash Pricing Updates: Community members are inquiring about the timeline for the new Gemini Flash price updates, with some indicating that GCP cost tables already reflect the new pricing.
- Alex Atallah noted that updates are currently blocked due to discrepancies in the token:character ratio used by Gemini.
Model Performance Issues: There were discussions about the stability of models like Hyperbolic’s 405B-Instruct, with some users noting that it was recently pulled from their API.
- Users also pointed to performance discrepancies between different versions of models, specifically mentioning issues with instruct models.
Token Usage Concerns: Users expressed frustration over the high token consumption of AI tools, particularly in the context of using aider with coding tasks.
- There was a consensus that inefficient use and the complexity of tasks contribute significantly to token depletion.
AI Tool Recommendations: Participants discussed various AI tools, weighing the benefits of options like Codestral, Groq, and Copilot for coding tasks.
- Recommendations varied based on user needs, with suggestions leaning towards tools that accommodate complex coding requirements.
Free API Options: The availability of free API tiers for Gemini models was discussed, highlighting regional limitations due to data privacy regulations.
- Several users mentioned the challenges of integrating APIs from other models and services, particularly GCP’s complexities.

Aug 10, 2024 not much happened today

Sat, Aug 10, 2024

OpenRouter (Alex Atallah) ▷ #general (95 messages🔥🔥):

Gemini 1.5 Flash Performance

GPT-4o Mini vs Gemini 1.5

OpenRouter API Configuration

Dunning-Kruger Effect in Discussions

Model Recommendations for Japanese

Gemini 1.5 Flash Performance Discussed: Several users noted that Gemini 1.5 Flash has seen a significant price drop, making it more competitive in the fast-and-cheap model market.
- The updated model can now natively understand PDFs and has improved performance for text and multi-modal queries.
Comparing GPT-4o Mini and Gemini 1.5: GPT-4o Mini was praised for its lower hallucination rates compared to Gemini 1.5, particularly in coding tasks.
- Users expressed a preference for models that reduce hallucinations and streamline coding capabilities.
OpenRouter API Configuration Challenges: A developer faced issues passing custom parameters, specifically the providers configuration, while using the OpenAI SDK in TypeScript.
- It was mentioned that the API currently doesn’t support these parameters natively, causing linting errors.
Dunning-Kruger Effect Highlighted in Discussion: A humorous debate emerged where a participant used the Dunning-Kruger Effect to illustrate a point about self-assessment in discussions about expertise.
- The conversation turned comedic as users reflected on confidence versus ability, particularly in the context of making money.
Seeking Recommendations for Japanese LLM: A user inquired about LLM models that outperform GPT-4o Mini in Japanese, seeking alternatives within a similar price range.
- The ongoing search reflects the demand for models that excel in specific languages beyond the general capabilities of larger models.

Links mentioned:

no title found: no description found
no title found: no description found
Gemini 1.5 Flash price drop: Google Gemini 1.5 Flash was already one of the cheapest models, at 35c/million input tokens. Today they dropped that to just 7.5c/million (and 30c/million) for prompts below 128,000 tokens. The …
ImageFX: no description found
no title found: no description found
Models | OpenRouter: Browse models on OpenRouter
Dunning–Kruger effect - Wikipedia: no description found
Dunning–Kruger effect - Wikipedia: no description found
Kenan Thompson Snl GIF - Kenan Thompson Kenan Thompson - Discover & Share GIFs: Click to view the GIF
GitHub - bigscience-workshop/petals: 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading: 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading - bigscience-workshop/petals
Yi Vision - API, Providers, Stats: The Yi Vision is a complex visual task models provide high-performance understanding and analysis capabilities based on multiple images. It's ideal for scenarios that require analysis and interp...
FireLLaVA 13B - API, Providers, Stats: A blazing fast vision-language model, FireLLaVA quickly understands both text and images. It achieves impressive chat skills in tests, and was designed to mimic multimodal GPT-4. Run FireLLaVA 13B wit...
Settings | OpenRouter: Manage your accounts and preferences
Provider Routing | OpenRouter: Route requests across multiple providers
Gemini 1.5 Flash price drop with tuning rollout complete, and more: no description found
GitHub - mlc-ai/web-llm: High-performance In-browser LLM Inference Engine: High-performance In-browser LLM Inference Engine . Contribute to mlc-ai/web-llm development by creating an account on GitHub.

Aug 09, 2024 Too Cheap To Meter: AI prices cut 50-70% in last 30 days

Fri, Aug 9, 2024

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Vercel outage

Anthropic error rates

Vercel experiences intermittent outages: Vercel is currently facing an outage, impacting the OpenRouter service, as noted in their status update. After several updates, services were reported stable again at 3:45pm ET.
- Monitoring continues as Vercel implements fixes and will keep the Vercel status page updated.
Anthropic tackles high upstream error rates: Anthropic reported elevated error rates affecting their services, particularly on 3.5 Sonnet and 3 Opus, and has implemented a mitigation and a workaround. As of Aug 8, 17:29 PDT, success rates have returned to normal levels, and access for Claude.ai free users has been restored.
- They are closely monitoring the situation and continuing to provide updates as issues are resolved.

Links mentioned:

Anthropic Status: no description found
Tweet from OpenRouter (@OpenRouterAI): Notice: we're seeing downtime due to a @Vercel platform outage, which doesn't appear in their status page. Our status will be visible on https://status.openrouter.ai/

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}

Aug 08, 2024 not much happened today

Thu, Aug 8, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

GPT-4o-2024-08-06

Structured outputs in strict mode

New Release of GPT-4o-2024-08-06: The latest version, GPT-4o-2024-08-06, is now available for use.
- OpenRouter, LLC provides this update, noting its release as part of their continuous improvement efforts.
Limited Support for Structured Outputs: A note indicated that structured outputs with strict mode are not fully supported at this time.
- Users are encouraged to report issues in designated channels: <#1138521849106546791> or <#1107397803266818229>.

Link mentioned: GPT-4o (2024-08-06) - API, Providers, Stats: The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai. Run GPT-4o (2024-08…

OpenRouter (Alex Atallah) ▷ #general (62 messages🔥🔥):

Gemini Pro 1.5 performance issues

OpenRouter API usage

gpt-4o-2024-08-06 updates

Structured outputs in the API

Model pricing changes

Gemini Pro 1.5 experiences resource exhaustion: Users reported an error with Gemini Pro 1.5 stating ‘Resource has been exhausted’, attributed to heavy rate limiting by Google.
- It was confirmed that there is no fix for this issue as it’s due to Google implementing strict limits on the model.
Navigating OpenRouter’s API for models: A member inquired about purchasing models and was informed that models are accessed via API with payment per token usage on OpenRouter.
- New users were advised to explore user interfaces like Lobe Chat to simplify interactions with the API.
Updates to gpt-4o-2024-08-06: The gpt-4o-2024-08-06 model enables developers to save costs with significantly reduced prices compared to previous versions, noted to be 50% cheaper for inputs and 33% for outputs.
- Users are also excited about a new ‘refusal’ field feature and ongoing discussions regarding enhanced efficiency in model operations.
Introduction of structured outputs in OpenAI’s API: OpenAI introduced structured outputs enabling developers to request valid JSON responses directly from the API, enhancing reliability.
- Previous methods were less consistent, but the new approach aims to standardize outputs and improve usability across applications.
Model pricing and token limits: There was a discussion around the token limit discrepancy for gpt-4o-2024-08-06, where OpenRouter initially displayed a lower maximum than what was documented by OpenAI.
- Updates were anticipated shortly, as emphasized by users awaiting changes to reflect the accurate capabilities of the latest model.

Links mentioned:

no title found: no description found
OpenAI: Introducing Structured Outputs in the API: OpenAI have offered structured outputs for a while now: you could specify `"response_format": {"type": "json_object"}}` to request a valid JSON object, or you could use t...
Responses | OpenRouter: Manage responses from models
Anthropic Status: no description found
GPT-4o (2024-08-06) - API, Providers, Stats: The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai. Run GPT-4o (2024-08...

Aug 07, 2024 GPT4o August + 100% Structured Outputs for All (GPT4o mini edition)

Wed, Aug 7, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

GPT-4o-2024-08-06 Release

Structured Outputs Issues

GPT-4o-2024-08-06 is Now Live!: The new model GPT-4o-2024-08-06 has been officially released and is available for use at OpenRouter. This release adds to the current lineup of models offered by OpenRouter.
Issues with Structured Outputs: A note was made that structured outputs with strict mode are not fully supported at this time. Users are encouraged to report issues in designated threads: <#1138521849106546791> or <#1107397803266818229>.

OpenRouter (Alex Atallah) ▷ #general (62 messages🔥🔥):

Gemini Pro 1.5 Performance

New Pricing for Google Gemini

OpenRouter API Access

Structured Outputs in GPT Models

Model Misconfigurations and Limitations

Gemini Pro 1.5 Encountering Resource Exhaustion: Users reported running into a ‘Resource has been exhausted’ error with Gemini Pro 1.5, attributed to Google’s rate limiting rather than user misconfiguration.
- One user confirmed that Google’s heavy rate limits on this model led to these issues.
Significant Price Drops for Google Gemini: On the 12th, the price for Google Gemini 1.5 flash will be cut in half, making it cheaper than both yi-vision and firellava.
- This pricing update generated excitement, as a user noted that these cost reductions could allow for extensive captioning for user-generated content (UGC).
OpenRouter API Usability Explained: To use the OpenRouter API, users must obtain an API key from their profile and use it in compatible user interfaces like Lobe Chat.
- This setup allows new users to engage more easily with models through simplified interfaces.
New Structured Outputs Feature Introduced: A new feature in GPT-4o allows structured outputs with improved token usage, providing a 50% reduction in input costs and a 33% reduction in output costs compared to previous versions.
- Discussions highlighted the importance of ensuring responses return valid JSON and the potential enhancements this functionality offers.
Confusion Over Model Capabilities: There was confusion regarding the GPT-4o-2024-08-06 model’s output limits, with a user noting that OpenRouter’s current display showed a maximum of only 4,096 tokens, while the official OpenAI documentation stated 16,384 tokens.
- Alex Atallah confirmed updates to align OpenRouter’s information with OpenAI’s documentation.

Links mentioned:

no title found: no description found
OpenAI: Introducing Structured Outputs in the API: OpenAI have offered structured outputs for a while now: you could specify "response_format": {"type": "json_object" }} to request a valid JSON object, or you could use t…
Responses | OpenRouter: Manage responses from models
Anthropic Status: no description found
GPT-4o (2024-08-06) - API, Providers, Stats: The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai. Run GPT-4o (2024-08…

Aug 07, 2024 GPT4o August + 100% Structured Outputs for All (GPT4o August edition)

Wed, Aug 7, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

GPT4-4o release

Structured outputs with strict mode

GPT4-4o Release OpenRouter: The new model GPT4-4o-2024-08-06 is now available on OpenRouter.
Issues with Strict Mode Structured Outputs: Structured outputs with strict mode are currently not fully supported, with issues to be reported in designated channels.
- Users are encouraged to report any issues encountered to improve the system’s functionality.

OpenRouter (Alex Atallah) ▷ #general (62 messages🔥🔥):

AI Model Performance

GPT-4o-2024-08-06 Update

Token Usage and Pricing

Google Gemini Update

API Cost Calculation

AI Models Face Performance Challenges: yi-vision and firellava were tested by a member but failed compared to haiku/flash/4o due to pricing and performance issues on a single test image.
- The conversation hinted at price changes for Google Gemini 1.5, which will soon be cheaper than the less effective models mentioned.
GPT-4o-2024-08-06 Boasts Structured Outputs: OpenAI introduced structured outputs in their API for the new gpt-4o-2024-08-06, promising better and more cost-effective token usage compared to previous models.
- There are expectations for improved JSON generation consistency, with details available through OpenAI’s blog.
Understanding Token Pricing and Savings: Developers can save 50% on inputs and 33% on outputs by switching to gpt-4o-2024-08-06, which is cheaper than previous offerings.
- The community discussed the potential reasons for reduced costs, including efficiency and usage of investor resources.
Methods to Calculate API Costs Discussed: A conversation unfolded regarding the calculation of OpenRouter API costs, with the consensus being to utilize the generation endpoint post-request for exact details.
- This information empowers users to manage pay-as-you-go systems by assessing usage without embedded cost details in streaming replies.
Rate Limiting Affects Google Gemini Model: Users experienced issues with Google Gemini Pro 1.5, particularly ‘RESOURCE_EXHAUSTED’ errors due to heavy rate limiting by Google.
- This situation necessitates adjustment expectations for usage as there is no immediate fix for the rate limit constraints.

Links mentioned:

no title found: no description found
OpenAI: Introducing Structured Outputs in the API: OpenAI have offered structured outputs for a while now: you could specify `"response_format": {"type": "json_object" }}` to request a valid JSON object, or you could use t...
Responses | OpenRouter: Manage responses from models
Anthropic Status: no description found
GPT-4o (2024-08-06) - API, Providers, Stats: The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai. Run GPT-4o (2024-08...

Aug 05, 2024 How Carlini Uses AI

Tue, Aug 6, 2024

OpenRouter (Alex Atallah) ▷ #announcements (7 messages):

Chatroom improvements

New models from OpenRouter

Azure routing for Mistral

Pricing structure for Gemini Pro

Yi endpoints

Chatroom Rebranding and Features: The Playground has been rebranded as the Chatroom, featuring local chat saving and a simplified UI for easier room configuration.
- Users can now explore the enhanced functionality while enjoying a more user-friendly interface.
Exciting New Models Launched: OpenRouter introduced new models including Llama 3.1 405B BASE and a free Llama 3.1 8B, which can be accessed through their model page.
- Additionally, Mistral Nemo 12B Celeste and the Llama 3.1 Sonar family are now available for various applications.
Mistral Models Routed to Azure: The Mistral Large and Mistral Nemo models are now routed to Azure for increased accessibility.
- This move enhances the available infrastructure for users needing robust performance from these AI models.
Gemini Pro 1.5 Experimental Now Live: The Gemini Pro 1.5 Experimental model is available at this link, requiring users to enable training in their settings.
- This model is served by AIStudio, differing from the usual Vertex routing, and users must update settings at privacy settings to access it.
Clarifying Gemini Pricing Structure: The pricing for the Gemini model is currently set with 1 token equal to 1 character, as confirmed by a community member.
- There are plans to shift to a token-based pricing system soon, contingent on data reconciliation efforts.

Links mentioned:

Gemini Pro 1.5 (0801) - API, Providers, Stats: Gemini 1.5 Pro (0801) is an experimental version of the [Gemini 1. Run Gemini Pro 1.5 (0801) with API
Mistral Large - API, Providers, Stats: This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Run Mistr...
Mistral Nemo - API, Providers, Stats: A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chin...
Yi Large - API, Providers, Stats: The Yi Large model was designed by 01.AI with the following usecases in mind: knowledge search, data classification, human-like chat bots, and customer service. Run Yi Large with API
Yi Large - API, Providers, Stats: The Yi Large model was designed by 01.AI with the following usecases in mind: knowledge search, data classification, human-like chat bots, and customer service. Run Yi Large with API
Yi Large - API, Providers, Stats: The Yi Large model was designed by 01.AI with the following usecases in mind: knowledge search, data classification, human-like chat bots, and customer service. Run Yi Large with API
Yi Vision - API, Providers, Stats: The Yi Vision is a complex visual task models provide high-performance understanding and analysis capabilities based on multiple images. It's ideal for scenarios that require analysis and interp...
Settings | OpenRouter: Manage your accounts and preferences
Parameters API | OpenRouter: API for managing request parameters
Llama 3.1 405B (base) - API, Providers, Stats: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 405B (base) with API
Llama 3.1 8B Instruct (free) - API, Providers, Stats: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. Run Llama 3.1 8B Instruct (free) with API
Mistral Nemo 12B Celeste - API, Providers, Stats: A specialized story writing and roleplaying model based on Mistral's NeMo 12B Instruct. Fine-tuned on curated datasets including Reddit Writing Prompts and Opus Instruct 25K. Run Mistral Nemo 12B...
Llama 3.1 Sonar 70B Online - API, Providers, Stats: Llama 3.1 Sonar is Perplexity's latest model family. Run Llama 3.1 Sonar 70B Online with API
Llama 3.1 Sonar 8B Online - API, Providers, Stats: Llama 3.1 Sonar is Perplexity's latest model family. Run Llama 3.1 Sonar 8B Online with API

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Multi-AI answer website launch

Community support

Product Hunt engagement

Launch of Multi-AI Answer Website: The new Multi-AI answer website has launched on Product Hunt, thanks to support from OpenRouter!
- The team invites users to check it out and is seeking upvotes and suggestions from the community.
Gratitude for Community Support: Acknowledgements were given for the ongoing support from OpenRouter, highlighting its importance in the launch process.
- The message emphasized that community feedback and engagement are highly valued during this launch.

Link mentioned: Aiswers.com - Ai version of Quora - Get feedback from world top AIs | Product Hunt: We brings together the world’s top AI minds to answer your questions. Get instant, personalized answers from a diverse range of AI models and agents, all in one place. Developers can also integrate an…

OpenRouter (Alex Atallah) ▷ #general (150 messages🔥🔥):

Model Comparisons

API Rate Limits

Image Classification Models

Image Quality

Pricing Strategies

Yi-Vision vs. FireLLaVA Performance: Users reported differing performance results when testing Yi-Vision against FireLLaVA, with some stating that Yi-Vision performed better despite both being similarly priced.
- Yi-Vision was noted to have made minor mistakes, while FireLLaVA had larger errors in the same tests.
Pricing Changes for Google Gemini Flash: It was announced that the price for Google Gemini 1.5 Flash would be cut in half on the 12th, making it more competitive against other models like Yi-Vision and FireLLaVA.
- Users expressed excitement about the potential for cheaper options that would enable detailed automatic captioning for user-generated content.
API Rate Limit Handling: When users exceed their API rate limit, they receive a 429 response indicating too many requests.
- Discussion confirmed that it is essential to monitor activity to avoid rate limit issues when using OpenRouter.
Token Counting for Image API Calls: There was a question about how to calculate token limits for images in API calls, with clarifications about differences in token counting between services.
- It was noted that Google’s Gemini treats tokens and characters equally, which can affect how costs are estimated for image processing.
Costs and API Call Pricing: Users inquired if OpenRouter API calls return cost information directly, with feedback that costs can typically be retrieved using the generation endpoint after the call.
- Concerns about providing pay-as-you-go access led to the understanding that API call costs can be calculated retroactively based on detailed request data.

Links mentioned:

Activity | OpenRouter: See how you've been using models on OpenRouter.
Responses | OpenRouter: Manage responses from models
Image classification: no description found
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models: Large language models (LLMs) has been effectively used for many computer vision tasks, including image classification. In this paper, we present a simple yet effective approach for zero-shot image cla...
Tweet from Alex Albert (@alexalbert__): My full talk from the AI Engineer summit a few weeks ago is now up on YouTube! https://x.com/aiDotEngineer/status/1820484842594930939 Quoting AI Engineer (@aiDotEngineer) Claude 3.5 Sonnet was the ...
Limits | OpenRouter: Set limits on model usage
FireLLaVA 13B - API, Providers, Stats: A blazing fast vision-language model, FireLLaVA quickly understands both text and images. It achieves impressive chat skills in tests, and was designed to mimic multimodal GPT-4. Run FireLLaVA 13B wit...
Tweet from OpenRouter (@OpenRouterAI): Llama 3.1 405B BASE! It's here. This is the base version of the chat model released last week. You can use it to generate training data, code completions, and more. Currently hosted by a new pro...
Aiswers.com - Ai version of Quora - Get feedback from world top AIs | Product Hunt: We brings together the world’s top AI minds to answer your questions. Get instant, personalized answers from a diverse range of AI models and agents, all in one place. Developers can also integrate an...
GitHub - MinorJerry/WebVoyager: Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models": Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models" - MinorJerry/WebVoyager
GitHub - robert-mcdermott/LLM-Image-Classification: Image Classification Testing with LLMs: Image Classification Testing with LLMs. Contribute to robert-mcdermott/LLM-Image-Classification development by creating an account on GitHub.

Aug 03, 2024 Execuhires: Tempting The Wrath of Khan

Sat, Aug 3, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Chatroom improvements

Ignored Providers

Parameters API updates

New models launched

Chatroom gets a fresh update: The Playground has been rebranded to Chatroom, featuring a simpler UI and local saving of chats.
- Users will find it easier to configure new rooms with this update.
Avoid unwanted providers with new settings: Users can now avoid routing requests to specific providers via settings page.
- This feature allows for greater customization of the request handling process.
Check model parameters easily now!: The Improved Parameters API allows checking for supported parameters and settings for models and providers at this link.
- This enhancement makes it easier to understand model capabilities.
Exciting new models released: New models include the Llama 3.1 405B BASE for generating training data and the Llama 3.1 8B, available for free here.
- Others include Mistral Nemo 12B Celeste, a specialized writing and roleplaying model, and the Llama 3.1 Sonar family for factual responses with links available.

Links mentioned:

Settings | OpenRouter: Manage your accounts and preferences
Parameters API | OpenRouter: API for managing request parameters
Meta: Llama 3.1 405B (base) by meta-llama: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This is the base 405B pre-trained version. It has demonstrated strong performance compared to leading clo...
Meta: Llama 3.1 8B Instruct (free) by meta-llama: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to ...
Mistral Nemo 12B Celeste by nothingiisreal: A specialized story writing and roleplaying model based on Mistral's NeMo 12B Instruct. Fine-tuned on curated datasets including Reddit Writing Prompts and Opus Instruct 25K. This model excels a...
Perplexity: Llama 3.1 Sonar 70B Online by perplexity: Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is the online version of the [offline chat model](/m...
Perplexity: Llama 3.1 Sonar 8B Online by perplexity: Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is the online version of the [offline chat model](/m...

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

API Key Acquisition

Benefits of Using Own API Key

Free Plan Limitations

Google Sheets Add-ons

Getting an API Key is a Breeze: Acquiring an API key is straightforward; simply sign up on the AI provider’s website, add credit, copy the API key, and paste it into the add-ons without any technical skills required. Learn more about the process here.
Why Using Your Own API Key Matters: Using your own API key not only offers the best pricing but also ensures flexibility in choosing your AI provider. For instance, the price for GPT-4o-mini is only $0.6 for 1,000,000 tokens.
- Additionally, you can transparently view your model usage in the AI provider’s dashboard.
Lite Plan: Free but with Limitations: The Lite plan remains free indefinitely for low-usage users, currently limited to 300 results per month. Notably, 1 cell result counts as 1 result and 1 analyze counts as 5 results.
- It’s important to note that this limit may change in the future.
Get 1 Year Free with Google Sheets Add-ons: You can access a 1-year free offer for the add-ons by using the code LAUNCH at checkout and selecting yearly while using the same email as your Google Sheets. This offer is a great way for new users to experience the service with reduced costs.

Link mentioned: AiAssistWorks - AI for Google Sheets™ - GPT- Claude - Gemini - Llama, Mistral, OpenRouter ,Groq. : no description found

OpenRouter (Alex Atallah) ▷ #general (58 messages🔥🔥):

OpenRouter Website Issues

Anthropic Service Problems

Group Chat Functionality in OR Playground

Yi Large Availability

Free Model Usage Limitations

OpenRouter Website Accessibility: A member confirmed that the OpenRouter website works for them, but regional connection issues are common, as evidenced by past incidents like this. Another user reported that the website issue was resolved for them shortly thereafter.
- This highlights the potential for localized outages affecting user experience.
Anthropic Facing Service Issues: Multiple users reported that Anthropic services seem to be down or struggling under severe load, indicating possible infrastructural issues. One user noted that the service has been intermittent for a couple of hours.
- This appears to be a growing concern among users who rely on their services.
Clarification on Group Chat in OR Playground: A user attempted to set up a ‘writer’s room’ with Llama3, to which members clarified that each model in the OR Playground operates with segregated memory, not akin to traditional group chats. Future improvements to allow models to respond in sequence were hinted at.
- The current setup is designed to compare outputs from various models with the same prompts.
Yi Large and Fireworks Availability: A member inquired about the status of Yi Large, to which another member indicated that they are exploring the addition from the original creator’s host. There was also a mention that Fireworks has been removed.
- This suggests ongoing adjustments to the available models on the platform.
Understanding Free Model Limitations: Discussion ensued on what it means when a model is offered for free; it was clarified that free models are heavily rate-limited in usage via API or chatroom. This limitation is crucial for managing server load and ensuring fair access to users.
- Such constraints are important for maintaining the service’s viability.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): Llama 3.1 405B BASE! It's here. This is the base version of the chat model released last week. You can use it to generate training data, code completions, and more. Currently hosted by a new pro...
Meta: Llama 3.1 405B (base) by meta-llama: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This is the base 405B pre-trained version. It has demonstrated strong performance compared to leading clo...
OpenRouter Status: OpenRouter Incident History

Aug 02, 2024 Rombach et al: FLUX.1 [pro|dev|schnell], $31m seed for Black Forest Labs

Fri, Aug 2, 2024

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Moye Launcher

Digital detox tools

Moye Launcher Promotes Digital Detox: Moye Launcher is a minimalist Android launcher with built-in AI-powered digital detox tools, aiming to reduce excessive screen time. It eliminates the app drawer to make apps less accessible, encouraging less impulsive app use.
- The launcher aims to address the top three reasons for unproductive screen time, such as auto-clicking due to boredom and lack of accountability, by removing easily accessible app icons and providing usage feedback.
Digital Detox Tools Explained: Moye Launcher uses AI tools to help users stay accountable and avoid unnecessary app usage, providing reminders and tracking usage.
- These features target the main reasons for unproductive screen time: auto-clicking of apps, lack of a ‘watchman,’ and forgetting why an app was opened initially.

Link mentioned: Moye Launcher: Digital Detox - Apps on Google Play: no description found

OpenRouter (Alex Atallah) ▷ #general (39 messages🔥):

Lobe interface

Librechat capabilities

Big-agi features

Msty tool integrations with Obsidian

Llama 405B Instruct providers

Big-agi expands model capabilities with BEAM: Big-agi introduces a ‘persona creator’ that allows users to generate prompts from YouTube videos or text and the BEAM feature to call 2/4/8 models simultaneously and merge their responses.
- However, it lacks server saving and easy syncing capabilities.
Msty integrates Obsidian and websites: Msty offers slick integrations with Obsidian and website access, though its parameter settings are reportedly easily forgotten.
- Despite minor polish issues, many users find it appealing and are considering switching to it.
Llama 405B Instruct providers and quantization: There are no FP16 providers for Llama 405B on OpenRouter, and FP8 quantization, recommended by Meta, runs more efficiently than FP16.
- SambaNova Systems runs in bf16 but is limited to 4k context length, and hosting in bf16 is computationally expensive.
API Integration with OpenRouter under Beta: Users seeking API integration to handle rate limits and integrate OpenAI and Claude API are advised to email support to join the Beta waitlist.
- Detailed requests can be directed to [email protected] for assistance.
OpenRouter website faces occasional regional issues: The OpenRouter website experiences occasional regional connection issues but generally remains operational.
- Users can check status updates for real-time operational information via the OpenRouter status page.

Links mentioned:

SambaNova Systems | Revolutionize AI Workloads: Unlock the power of AI for your business with SambaNova's enterprise-grade generative AI platform. Discover how to achieve 10x lower costs & unmatched security.
OpenRouter: LLM router and marketplace
OpenRouter Status: OpenRouter Incident History
DRY: A modern repetition penalty that reliably prevents looping by p-e-w · Pull Request #5677 · oobabooga/text-generation-webui: Looping is an undesirable behavior where the model repeats phrases verbatim that have previously occurred in the input. It affects most models, and is exacerbated by the use of truncation samplers....

Aug 01, 2024 Gemma 2 2B + Scope + Shield

Thu, Aug 1, 2024

OpenRouter (Alex Atallah) ▷ #general (70 messages🔥🔥):

LLM Tracking Challenges

Aider's LLM Leaderboard

4o Mini Performance Discussion

NSFW Model Recommendations

OpenRouter Cost Comparison

LLM Tracking Challenges Piled Up: Members expressed frustrations over the proliferation of LLMs, stating it’s difficult to keep track of their capabilities and performance.
- One noted the necessity of creating personal benchmarks to evaluate new models as they emerge in the crowded landscape.
Aider’s LLM Leaderboard Emerges: Aider has an interesting LLM leaderboard that ranks models based on their ability to edit code, specifically designed for coding tasks.
- Users pointed out that it works best with models that excel in editing rather than merely generating code.
Concerns over 4o Mini Performance: There was a lively debate about 4o Mini, with varying opinions on its performance compared to other models like 3.5 and potential replacements for coding tasks.
- Some members argued that while it has its strengths, options like 1.5 flash are still preferred by some due to better quality output.
Discussion on NSFW Model Options: Members shared their experiences with various NSFW models, specifically highlighting Euryal 70b and Magnum as notable options.
- They also suggested checking out Dolphin models and directed users to resources like SillyTavern Discord for more information.
OpenRouter Cost-Cutting Insights: A member pointed out their drastic cost reduction after switching from ChatGPT to OpenRouter, mentioning a spend drop from $40/month to just $3.70.
- This cost-saving was attributed to using Deepseek for coding, which made up the bulk of their usage.

Links mentioned:

SambaNova Systems | Revolutionize AI Workloads: Unlock the power of AI for your business with SambaNova's enterprise-grade generative AI platform. Discover how to achieve 10x lower costs & unmatched security.
Aider LLM Leaderboards: Quantitative benchmarks of LLM code editing skill.

Jul 31, 2024 not much happened today

Wed, Jul 31, 2024

OpenRouter (Alex Atallah) ▷ #general (47 messages🔥):

Palm Chat 2 surge

GPT-4o capabilities

Cost tracking alternatives

Claude model instruction templates

Palm Chat 2 experiences a 3000% increase: A member humorously highlighted that Palm Chat 2’s usage surged from 1 request to 30, leading to a 3000% increase.
- Another member mentioned that such a sharp rise reminds them of the WinRAR sales meme, further adding to the amusement.
New GPT-4o allows for extensive outputs: The experimental version of GPT-4o can handle up to 64K output tokens per request, which is estimated to be around 18.2K words.
- It’s been noted that the output price is around $1.15 per 64K reply, adding a significant cost element for large responses.
Searching for LiteLLM alternatives: A user expressed frustration with LiteLLM’s confusing documentation and suggested a potential build for similar services, opting instead for OpenRouter.
- Another user noted that OpenRouter can allow for more control as it gives cost information from their generations endpoint.
Challenges with Claude models and instruct templates: There was a discussion on whether the Claude 3.5 Sonnet model uses an instruct template, with some indicating it does not have one.
- It was alluded that using the prompt mode in OpenRouter might convert prompts into user messages, making it key to properly guide the model.
Fireworks model status: A member confirmed that while Fireworks is operational, the Yi-Large endpoint has been removed for unspecified reasons.
- This sparked discussions about the stability of other models hosted by Fireworks, ensuring that most are still functioning as expected.

Jul 26, 2024 AlphaProof + AlphaGeometry2 reach 1 point short of IMO Gold

Fri, Jul 26, 2024

OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

Llama 405B price cut

Middle-out transform changes

Database traffic surge

Llama 3.1 price reduction

Database performance issues

Llama 405B gets a 10% price cut: The price of Llama 405B has been reduced by 10% as announced by OpenRouterAI.
- This pricing adjustment is part of ongoing competitive strategies in the market.
Middle-out transform to be turned off by default: The middle-out transform will be turned off by default starting August 1, moving away from its historical default setting to provide better control for users.
- Users heavily relying on this feature are encouraged to update their requests accordingly, as found in the documentation.
Traffic surge causing database strain: The platform experienced a 5x traffic surge which strained the database, necessitating a scheduled downtime at 10:05 PM ET for upgrades.
- Post-upgrade, services were reported to be back online promptly.
14% price cut for Llama 3.1-8b-instruct: A 14% price cut has been announced for the meta-llama/llama-3.1-8b-instruct model, continuing the recent trend in aggressive pricing adjustments.
- This price change raises questions about where the pricing competition will eventually stabilize, especially following the recent product launch.
Database performance issues arise again: Some database issues have resurfaced, leading to potential degradation in performance during the troubleshooting phase.
- The team is actively addressing these issues to ensure smooth operations.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): 🍕 Price cut: Llama 405B pricing reduced by 10%
Meta: Llama 3.1 8B Instruct by meta-llama: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to ...

OpenRouter (Alex Atallah) ▷ #general (215 messages🔥🔥):

Llama 3.1 Performance

Inference Engine Issues

Price Competition Among Providers

Model Quantization

OpenRouter Provider Accountability

Llama 3.1 exhibits variable performance: Users reported inconsistent outputs from the Llama 3.1 model, with responses sometimes being entirely off-topic or nonsensical, especially when under heavy context loads.
- Switching providers improved the output quality for some users, suggesting that inference engine performance is critical.
Concerns over inference engine quality: Discussion highlighted that many open-source inference engines might degrade model quality, leading to gibberish responses when parameters or contexts are pushed to their limits.
- The community speculated about potential issues with specific vendors and their deployment practices, which could be leading to poor output quality.
Providers engage in price competition: There are ongoing discussions about providers undercutting prices to attract more users, sometimes at the cost of model quality and performance.
- This pricing behavior raises concerns about accountability and the consistency of the models being offered on OpenRouter.
Model quantization techniques: Users discussed the transition to lower precision quantization methods like FP8 for Llama 3.1, analyzing the implications on performance and quality.
- There was a consensus that while good quality FP8 can be nearly equivalent to FP16, problems may arise depending on the implementation of inference engines.
OpenRouter’s role in ensuring vendor quality: A lack of clear accountability on OpenRouter was cited, with concerns that vendors could misrepresent the models they host, particularly regarding the quantization methods used.
- The community discussed the need for better verification processes to ensure providers deliver models that meet expected performance standards.

Links mentioned:

Meta: Llama 3.1 8B Instruct by meta-llama: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to ...
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits: Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single param...
mistral-large: Mistral Large 2 is Mistral's new flagship model that is significantly more capable in code generation, mathematics, and reasoning with 128k context window and support for dozens of languages.
GitHub - princeton-nlp/SWE-agent: SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.: SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run. - princ...
GitHub - BuilderIO/micro-agent: An AI agent that writes (actually useful) code for you: An AI agent that writes (actually useful) code for you - BuilderIO/micro-agent
CofeAI/Tele-FLM-1T · Hugging Face: no description found
MythoMax 13B by gryphe: One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps

OpenRouter (Alex Atallah) ▷ #일반 (1 messages):

Mistral Large 2

Mistral Large 2 showcases multilingual prowess: Mistral Large 2 excels in various languages, including English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, and Hindi.
Mistral Large 2’s impressive language model: The performance of Mistral Large 2 makes it a noteworthy player in the field of multilingual language processing.

Jul 24, 2024 Mistral Large 2 + RIP Mistral 7B, 8x7B, 8x22B

Thu, Jul 25, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

DeepSeek Coder V2

Private Inference Provider

DeepSeek Coder V2 Launches Private Inference Provider: DeepSeek Coder V2 now features a private provider to serve requests on OpenRouter without input training.
- This new capability was announced on X, signifying a step forward in private model deployment.
New Developments in Inference Providers: The announcement of a private inference provider indicates strategic progression in the OpenRouter platform.
- The absence of input training marks a significant difference from previous models, enhancing usability.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): DeepSeek Coder V2 now has a private provider serving requests on OpenRouter, with no input training! Check it out here: https://openrouter.ai/models/deepseek/deepseek-coder

OpenRouter (Alex Atallah) ▷ #general (273 messages🔥🔥):

Llama 3.1 405B

Mistral Large 2

OpenRouter API Issues

Coding Tools Exploration

Language Model Pricing

Concerns over Llama 3.1 405B Performance: Several users express dissatisfaction with the performance of Llama 3.1 405B, noting it struggles with NSFW content, often refusing prompts or outputting training data.
- User feedback indicates that temperature settings heavily influence output quality, with some reporting better results at lower temperatures.
Mistral Large 2 Launch and Usage: The Mistral Large 2 model is now available as Mistral Large, effectively replacing the previous version with updates for improved multilingual capabilities.
- Users speculate on its performance compared to Llama 3.1, particularly in handling languages like French.
OpenRouter API Challenges: Users discuss limitations in the OpenRouter API, including rate limits and the handling of multilingual input, noting challenges faced when using certain models.
- Reports indicate that while some models are free in preview, they may come with strict limitations on usage and context.
Interest in Open-Source Coding Tools: In a shift of focus, users inquire about open-source autonomous coding tools like Devika and Open Devin, seeking recommendations based on current efficacy.
- The discussion highlights a growing interest in experimenting with alternative coding solutions beyond mainstream AI offerings.
Model Pricing Comparisons: Discussion on pricing reveals that Mistral Large offers competitive rates at $3 per million tokens for input and $9 for output, drawing comparisons to other models.
- Users debate the value of uncensored outputs from various models, weighing it against the more commercial approach taken by other providers.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): 🏆 Multi-LLM Prompt Competition Reply below with prompts that are tough for Llama 405B, GPT-4o, and Sonnet! Winner gets 15 free credits ✨. Example:
5xx Error: Cloudflare is a free global CDN and DNS provider that can speed up and protect any site online
Meta: Llama 3.1 405B Instruct by meta-llama: The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs. Meta's latest c...
mistralai/Mistral-Large-Instruct-2407 · Hugging Face: no description found
OpenAI: GPT-4 32k by openai: GPT-4-32k is an extended version of GPT-4, with the same capabilities but quadrupled context length, allowing for processing up to 40 pages of text in a single pass. This is particularly beneficial fo...
no title found: no description found
llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models: Utilities intended for use with Llama models. Contribute to meta-llama/llama-models development by creating an account on GitHub.
Responses | OpenRouter: Manage responses from models
GitHub - open-webui/open-webui: User-friendly WebUI for LLMs (Formerly Ollama WebUI): User-friendly WebUI for LLMs (Formerly Ollama WebUI) - open-webui/open-webui

Jul 24, 2024 Llama 3.1: The Synthetic Data Model

Wed, Jul 24, 2024

OpenRouter (Alex Atallah) ▷ #announcements (41 messages🔥):

Llama 3 405B Launch

Model Performance Comparisons

OpenRouter Features Updates

Prompt Competition Announcement

DeepSeek Coder V2 Inference Provider

Llama 3 405B Launch Competitively Priced: The Llama 3 405B has been launched, rivaling GPT-4o and Claude 3.5 Sonnet at $3/M tokens and offering an impressive 128K token context for synthetic data generation.
- Users expressed excitement, with comments like ‘Damn! That’s crazy, this is THE BEST open LLM now’ and ‘what a leap’ highlighting its anticipated impact.
User Feedback on Model Performance: Feedback arose on Llama 3 405B’s performance, with one user noting it is ‘worse than both gpt4o and not even comparable to claude 3.5’ in translation tasks.
- Concerns were raised about the 70B version producing ‘gibberish’ after a few tokens, while 405B was compared to gemini 1.5 pro.
Updates on OpenRouter Features: OpenRouter announced new features including Retroactive Invoices, custom keys, and improvements to the Playground.
- Users were encouraged to provide feedback on new offerings available at OpenRouter Chat to enhance user experience.
Prompt Competition for Multiple Models: A Multi-LLM Prompt Competition has been introduced with users invited to submit challenging prompts for Llama 405B, GPT-4o, and Sonnet for a chance to win 15 free credits.
- The competition aims to test the limits of these models, as users eagerly await announcements detailing the outcomes.
DeepSeek Coder V2 Inference Provider: Announced a new private inference provider for DeepSeek Coder V2, which operates with no input training.
- Users can explore the new provider through DeepSeek Coder, enhancing OpenRouter’s offerings.

Links mentioned:

Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
Tweet from OpenRouter (@OpenRouterAI): DeepSeek Coder V2 now has a private provider serving requests on OpenRouter, with no input training! Check it out here: https://openrouter.ai/models/deepseek/deepseek-coder
Tweet from OpenRouter (@OpenRouterAI): 🏆 Multi-LLM Prompt Competition Reply below with prompts that are tough for Llama 405B, GPT-4o, and Sonnet! Winner gets 15 free credits ✨. Example:
Meta: Llama 3.1 405B Instruct by meta-llama: The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs. Meta's latest c...
Meta: Llama 3.1 8B Instruct by meta-llama: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to ...
Meta: Llama 3.1 70B Instruct by meta-llama: Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated stro...

OpenRouter (Alex Atallah) ▷ #general (190 messages🔥🔥):

Llama 405B Model Performance

Custom API Keys Integration

Comparison of Llama Models

Prompting Competition for Llama 405B

Fine-Tuning and Instruction Challenges

Llama 405B model shows strong capabilities: Users discuss the performance of the new Llama 405B model, noting its impressive reasoning abilities, especially in English, although some mention it still falls short in foreign languages compared to models like Claude and GPT-4.
- Some users find the model to produce nonsense responses, with varying experiences reported among different users.
Accessing Custom API Keys: Discussion arose about the process to obtain custom API keys per provider, emphasizing that this integration could vary by provider and might involve specific account settings.
- Users are eager to understand how to manage and utilize these keys effectively.
Comparison between Llama 3 and Llama 3.1: Participants compare Llama 3 (8B/70B) with Llama 3.1, highlighting that 3.1 is distilled from the larger 405B model and offers improved context length limits of 128k instead of 8k.
- The new version is expected to perform better across various benchmarks.
Prompting Competition for Llama 405B: Alex Atallah announced a prompting competition for Llama 405B, with the winner receiving 15 free credits, focusing on prompts that challenge the model’s capabilities.
- Participants are curious about the criteria for the competition, particularly regarding what constitutes a tough prompt.
Challenges in using Instruction Models: Several users reported bugs when using instruct models, specifically mentioning issues with calling JSON responses in multi-turn scenarios.
- Participants are sharing code snippets and troubleshooting tips in an effort to resolve these challenges.

Links mentioned:

Breaking Instruction Hierarchy in OpenAI's gpt-4o-mini · Embrace The Red: no description found
Tweet from Pliny the Prompter 🐉 (@elder_plinius): 🌩️ JAILBREAK ALERT 🌩️ META: PWNED 🦾😎 LLAMA-3-405B: LIBERATED 🦙💨 Come, witness the brand new SOTA open source AI outputting a home lab bioweapon guide, how to hack wifi, copyrighted lyrics, and...
Download Llama: Request access to Llama.
The Shawshank GIF - The Shawshank Redemption - Discover & Share GIFs: Click to view the GIF
Tweet from OpenRouter (@OpenRouterAI): 🏆 Multi-LLM Prompt Competition Reply below with prompts that are tough for Llama 405B, GPT-4o, and Sonnet! Winner gets 15 free credits ✨. Example:
Integrations (Beta) | OpenRouter: Bring your own provider keys with OpenRouter
Reddit - Dive into anything: no description found
GitHub - vikyw89/llmtext: A simple llm library: A simple llm library. Contribute to vikyw89/llmtext development by creating an account on GitHub.

Jul 23, 2024 Llama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b model

Tue, Jul 23, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Rankings page update

Infrastructure migration

Rankings Page Slows Due to Migration: The rankings page will be slow to update over the weekend and often present stale data while the migration to new infrastructure occurs.
- Users should expect delays and inaccuracies in rankings during this timeframe.
Infrastructure Migration Causes Delays: The notice pointed out that the infrastructure migration will specifically impact the rankings page over the weekend.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

OpenRouter provider for GPTScript

gptscript on command line

gptscript demo video

OpenRouter provider for GPTScript now available: A new OpenRouter provider for GPTScript has been announced, with an image and detailed description on GitHub.
- This tool contributes significantly to the development of GPTScript applications.
GPTScript impresses on command line: Discussion highlights the gptscript GitHub repo, noted for its capability to build AI assistants that interact directly with systems.
- One member mentioned an impressive demo video available on the repository page.

Links mentioned:

GitHub - RobinVivant/gptscript-openrouter-provider: Contribute to RobinVivant/gptscript-openrouter-provider development by creating an account on GitHub.
GitHub - gptscript-ai/gptscript: Build AI assistants that interact with your systems: Build AI assistants that interact with your systems - gptscript-ai/gptscript

OpenRouter (Alex Atallah) ▷ #general (202 messages🔥🔥):

Hermes 2.5

GPTs Agents

OpenRouter Feature Requests

Model Merging

Dolphin Llama 70B

Dolphin Llama 70B’s Performance Issues: Using Dolphin Llama 70B at 0.8-1 temperature led to erratic behavior in a 7k token context chat, producing incoherent content split between code and unrelated output.
- Another member noted similar issues with Euryale 70B’s fp8 quantized models, suggesting potential problems stemming from the quantization process.
DeepSeek’s Low Cost and Efficiency: DeepSeek v2, a 236B parameter MoE model (21B activated per token), is praised for its excellent performance and cost-efficiency at $0.14 per input token.
- “DeepSeek’s pricing is very competitive, and it seems to be hugely profitable,” explaining their strategy of using high batch sizes and compression techniques.
GPT-4o mini’s Multilingual Capabilities and Code Performance: Members discussed that GPT-4o mini performs worse in coding tasks compared to DeepSeek, but better in multilingual capabilities than gemma2-27b.
- One member noted, “4o mini seems better at multilingual capabilities compared to gemma2-27b, but worse at reasoning.”
Leaked Information on Llama 3.1 405B: Llama 3.1 405B Base apparently leaked early due to a misstep on HuggingFace, sparking discussions about its extended context capabilities via RoPE scaling.
- Members are excited, anticipating software updates for efficient utilization and eager for the official instruct model release.
Issues with Free vs Paid Model Limits: A user discovered that free model variants like google/gemma-2-9b-it:free have stricter token limits (4096 tokens) compared to their paid counterparts (8192 tokens).
- The disparity led to confusion and error messages, prompting discussions on potential misconceptions or misconfigurations in how token limits are enforced.

Links mentioned:

Google: Gemma 2 9B (free) by google: Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class. Designed for a wide variety of tasks, it empowers developers...
Requests | OpenRouter: Handle incoming and outgoing requests
Limits | OpenRouter: Set limits on model usage
Models | OpenRouter: Browse models on OpenRouter
no title found: no description found
The AI summer — Benedict Evans: Hundreds of millions of people have tried ChatGPT, but most of them haven’t been back. Every big company has done a pilot, but far fewer are in deployment. Some of this is just a matter of time. But L...
GitHub - NolanGC/window-3d-demo: Generative 3D in the web via window.ai, Next, Three, Neon, Drizzle, and GCS.: Generative 3D in the web via window.ai, Next, Three, Neon, Drizzle, and GCS. - NolanGC/window-3d-demo
Parameters | OpenRouter: Configure parameters for requests
Reddit - Dive into anything: no description found

Jul 20, 2024 DataComp-LM: the best open-data 7B model/benchmark/dataset

Sat, Jul 20, 2024

OpenRouter (Alex Atallah) ▷ #announcements (4 messages):

Ranking and stats issue fix

New models from Mistral AI

Router resilience update

L3-Euryale-70B price drop

New Dolphin-Llama model

Ranking analytics issue resolved: Due to a read-replica database failure, ranking and stats showed stale data, but user-facing features like the API and credits operated normally.
- UPDATE: The issue with ranking analytics and stats has now been fixed.
Mistral AI unveils two new models: Daun.ai introduced Mistral Nemo, a 12B parameter multilingual LLM with a 128k token context length.
- Codestral Mamba was also released, featuring a 7.3B parameter model with a 256k token context length for code and reasoning tasks.
Router resilience feature live: A new feature now allows providers not specified in the order parameter to be used as fallbacks by default unless allow_fallbacks: false is explicitly set.
- This means other providers will be tried after the prioritized ones in API requests—enhancing the overall resilience.
L3-Euryale-70B price slashed by 60%: A 60% price drop was announced for sao10k/l3-euryale-70b.
- But wait, there’s more: Cognitivecomputations released Dolphin-Llama-3-70B.

Links mentioned:

Llama 3 Euryale 70B v2.1 by sao10k: Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and c...
Dolphin Llama 3 70B 🐬 by cognitivecomputations: Dolphin 2.9 is designed for instruction following, conversational, and coding. This model is a fine-tune of [Llama 3 70B](/models/meta-llama/llama-3-70b-instruct). It demonstrates improvements in inst...
Mistral: Mistral Nemo by mistralai: A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chin...
Mistral: Codestral Mamba by mistralai: A 7.3B parameter Mamba-based model designed for code and reasoning tasks. - Linear time inference, allowing for theoretically infinite sequence lengths - 256k token context window - Optimized for qui...

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

LLM-Draw App

AI Whispers Prompts Collection

LLM-Draw integrates OpenRouter API keys: The LLM-Draw app has been updated to accept Openrouter API keys, leveraging the Sonnet 3.5 self-moderated model.
- It is deployable as a Cloudflare page with Next.js, and a live app is available here.
AI Whispers Prompts Collection Update: AI Whispers is reorganizing prompts for use with Fabric and adding markdown structure, including more detailed info in the README files.
- Currently, things are moving around for better organization and clarity.

Links mentioned:

GitHub - zielperson/AI-whispers: testing: testing. Contribute to zielperson/AI-whispers development by creating an account on GitHub.
GitHub - RobinVivant/llm-draw: Make it real: Make it real. Contribute to RobinVivant/llm-draw development by creating an account on GitHub.
make real starter: no description found

OpenRouter (Alex Atallah) ▷ #general (71 messages🔥🔥):

4o mini moderation

Image tokens billing

OpenRouter availability

Gemma 2 repetition issues

OpenRouter statistics system

4o mini moderation ambiguity: There was confusion about whether 4o mini is self-moderated or uses OpenAI’s moderator, with some users experiencing different moderation behaviors.
- A user speculated their request to 4o might have been routed to Azure which has a lower moderation threshold.
Image tokens billing inconsistency explained: Discussion about image tokens which suggest costs on OpenRouter are based on resolution but have ambiguities in the token count.
- It’s noted that the base tokens are used for analytics while total tokens determine the cost, involving OpenAI’s calculation.
OpenRouter Availability FAQ: Users discussed OpenRouter’s availability and were directed to the status page for recent incidents.
- A regional issue might cause service unavailability; the stats system also faced DB replica failures recently.
Gemma 2 users face repetition issues: Users of Gemma 2 9B reported experiencing repetition issues and sought tips for resolving this.
- A suggestion was made to use CoT (Chain of Thought) prompting for better performance.
OpenRouter statistics system outage: There was an outage in the OpenRouter statistics system affecting ranking and provider info updates.
- The outage was due to a DB read replicas failure, with a fix being worked on, and the activity page facing latency in data updates.

Links mentioned:

Mistral: Mistral Nemo by mistralai: A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chin...
Mistral: Codestral Mamba by mistralai: A 7.3B parameter Mamba-based model designed for code and reasoning tasks. - Linear time inference, allowing for theoretically infinite sequence lengths - 256k token context window - Optimized for qui...
OpenRouter Status: OpenRouter Incident History

OpenRouter (Alex Atallah) ▷ #일반 (3 messages):

Mistral NeMo

Korean Language Support

Supported Languages of Mistral NeMo

daun.ai

Mistral NeMo supports Korean Language: A message indicated that Mistral NeMo has added support for the Korean language.
- Users noted that Mistral NeMo is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
Discussion on daun.ai Link: A member shared a link to daun.ai: Discord Conversation.

OpenRouter (Alex Atallah) ▷ #一般 (1 messages):

k11115555: 誰も使ってない．．．

Jul 19, 2024 Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o-mini version)

Fri, Jul 19, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

GPT-4o mini

Cost-effectiveness in models

GPT-4o mini: OpenAI’s latest innovation: OpenAI introduced the GPT-4o mini, its newest model supporting both text and image inputs with text outputs, available at this link.
- The model is designed to maintain state-of-the-art intelligence while being significantly more cost-effective, priced at just $0.15/M input and $0.60/M output.
Comparative Affordability of GPT-4o mini: GPT-4o mini is more than 60% cheaper than GPT-3.5 Turbo, making it a promising alternative for users looking for affordability.
- Its pricing is described as many multiples more affordable than other recent frontier models, marking a significant shift in accessibility for advanced AI.

Links mentioned:

OpenAI: GPT-4o by openai: GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/open...
OpenAI: GPT-3.5 Turbo by openai: GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.

OpenRouter (Alex Atallah) ▷ #general (97 messages🔥🔥):

Mistral NeMo Launch

OpenAI GPT-4o Mini Announcement

OpenRouter Availability

Image Token Pricing

User Experience with Gemma 2

Mistral NeMo with 128K Context Window Released: Mistral announced the launch of Mistral NeMo, a 12B model supporting a large context window of up to 128,000 tokens. Its capabilities in reasoning and world knowledge are recognized as state-of-the-art, with further details available in the official blog post.
- Users discussed the pre-trained checkpoints being available under the Apache 2.0 license, promoting adoption among researchers and enterprises.
OpenAI Launches GPT-4o Mini: OpenAI introduced GPT-4o Mini as a much-anticipated model set to replace GPT-3.5 Turbo, which will be accessible to free users of ChatGPT along with Plus and Team subscribers soon. Initial pricing is set, with input costs at $0.15 per 1M tokens and output at $0.60 per 1M tokens.
- Discussions covered the model’s availability on both the ChatGPT website and OpenAI API, with users expressing excitement for using this upgraded model.
Current Status of OpenRouter: Participants inquired about the availability of OpenRouter, with links provided to its status page which indicated no recent incidents affecting performance. As of July 18, 2024, the platform was operational without reported downtime.
- Users shared observations on regional issues and ongoing monitoring of outages by the team, reflecting the overall status and reliability.
Image Token Pricing Confusion: A conversation arose regarding how image tokens are billed for models, with users questioning resolutions and potential discrepancies in costs. Calculations indicated that the maximum token count appears to have increased, impacting image pricing to align with GPT-4o.
- Concerns were raised about whether these billing practices apply uniformly across different image sizes and resolutions, with ongoing discussions about the implications for users.
User Repetition Issues with Gemma 2 9B: A user expressed frustration with experiencing repetition issues while using the Gemma 2 9B model, seeking possible solutions or tips from the community. This prompted further discussions on user experiences and potential model limitations.
- The community considered the importance of sorting model responses by key performance metrics, which could help identify specific patterns or outcomes.

Links mentioned:

Mistral NeMo: Mistral NeMo: our new best small model. A state-of-the-art 12B model with 128k context length, built in collaboration with NVIDIA, and released under the Apache 2.0 license.
mistralai/mamba-codestral-7B-v0.1 · Hugging Face: no description found
OpenAI debuts mini version of its most powerful model yet: OpenAI on Thursday launched a new AI model, "GPT-4o mini," the artificial intelligence startup's latest effort to expand use of its popular chatbot.
Tweet from Matt Shumer (@mattshumer_): New @OpenAI model! GPT-4o mini drops today. Seems to be a replacement for GPT-3.5-Turbo (finally!) Seems like this model will be very similar to Claude Haiku — fast / cheap, and very good at handli...
OpenRouter Status: OpenRouter Incident History

Jul 19, 2024 Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o version)

Fri, Jul 19, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

GPT-4o Mini

Cost-efficiency of GPT-4o Mini

OpenAI Unveils GPT-4o Mini: GPT-4o mini, OpenAI’s newest model after GPT-4 Omni, supports both text and image inputs with text outputs.
- The model maintains state-of-the-art intelligence while being significantly more cost-effective, priced at just $0.15/M input and $0.60/M output.
GPT-4o Mini: A Cost-Effective AI Solution: GPT-4o Mini is many multiples more affordable than other recent frontier models and more than 60% cheaper than GPT-3.5 Turbo.
- The price for GPT-4o Mini is $0.15/M input and $0.60/M output, making it a budget-friendly option for users.

Links mentioned:

OpenAI: GPT-4o by openai: GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/open...
OpenAI: GPT-3.5 Turbo by openai: GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks. Training data up to Sep 2021.

OpenRouter (Alex Atallah) ▷ #general (97 messages🔥🔥):

Codestral 22B

OpenRouter outages

Mistral NeMo release

GPT-4o mini release

Image token pricing issues

Codestral 22B model request: A user requested to add Codestral 22B, sharing the model card for Mamba-Codestral-7B which is an open code model performing on par with state-of-the-art Transformer-based code models.
OpenRouter experiencing outages: Several users reported issues with OpenRouter, including API requests hanging and website timeouts, while others stated it was working fine in different regions like Northern Europe.
Mistral NeMo 128K Context model release: Mistral NeMo, a 12B model offering a context window of up to 128k tokens, was announced, with pre-trained and instruction-tuned checkpoints available under the Apache 2.0 license.
GPT-4o mini release: OpenAI announced the release of GPT-4o mini, described as the most capable and cost-efficient small model, available to free ChatGPT users, ChatGPT Plus, Team, and Enterprise users.
Image token pricing issues: Users discussed discrepancies in image token counts and pricing for GPT-4o mini compared to other models, with some noting unexpectedly high token counts for images.

Links mentioned:

Mistral NeMo: Mistral NeMo: our new best small model. A state-of-the-art 12B model with 128k context length, built in collaboration with NVIDIA, and released under the Apache 2.0 license.
mistralai/mamba-codestral-7B-v0.1 · Hugging Face: no description found
OpenAI debuts mini version of its most powerful model yet: OpenAI on Thursday launched a new AI model, "GPT-4o mini," the artificial intelligence startup's latest effort to expand use of its popular chatbot.
Tweet from Matt Shumer (@mattshumer_): New @OpenAI model! GPT-4o mini drops today. Seems to be a replacement for GPT-3.5-Turbo (finally!) Seems like this model will be very similar to Claude Haiku — fast / cheap, and very good at handli...
OpenRouter Status: OpenRouter Incident History

Jul 17, 2024 Gemma 2 tops /r/LocalLlama vibe check

Thu, Jul 18, 2024

OpenRouter (Alex Atallah) ▷ #general (60 messages🔥🔥):

Error Code 524

Meta 405B Model Pricing

Deepseek Coder Speed Issues

Fast and Affordable Models on OpenRouter

WordPress Plugin Issues

Error Code 524 Encountered by Users: Multiple users reported experiencing Error Code 524 over a few minutes.
- One user questioned if others were experiencing the same issue, indicating a wider problem affecting the service.
Discussion on Meta 405B Model Pricing: Users speculated on the pricing of the upcoming Meta 405B model, with guesses suggesting it might release around the 23rd.
- Information about 8K context for the model is based on previous models, and actual details are still pending.
Deepseek Coder Speed Complaints: A user expressed frustration over the slow performance of Deepseek Coder, despite being impressed by its capabilities.
- Others echoed this sentiment, mentioning it would be beneficial if the performance was faster or some provider offered a faster version.
Finding Fast and Affordable Models on OpenRouter: Users discussed models that are faster and better than GPT-3.5-Turbo but still affordable.
- Models like Claude-3-Haiku and various Llama models were recommended, but issues around pricing and context length remain.
Issues with WordPress Plugin for RSS Feeds: A user reported problems integrating OpenRouter API with a WordPress plugin for automatically editing RSS feed news.
- The problem might be related to API key usage or rate limits, and suggestions for verifying API reach with curl were shared.

Links mentioned:

Auto (best for prompt) by openrouter: Depending on their size, subject, and complexity, your prompts will be sent to [Llama 3 70B Instruct](/models/meta-llama/llama-3-70b-instruct), [Claude 3.5 Sonnet (self-moderated)](/models/anthropic/c...
Limits | OpenRouter: Set limits on model usage

Jul 17, 2024 SciCode: HumanEval gets a STEM PhD upgrade

Wed, Jul 17, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Qwen 2 7B Instruct

Free access to Qwen 2 7B Instruct model: OpenRouter announced the availability of the Qwen 2 7B Instruct model for free.
- You can now access it on OpenRouter:free.
Qwen 2 7B Instruct model released: The Qwen 2 7B Instruct model is now available for free.
- Learn more about the model on OpenRouter:free.

Links mentioned:

Qwen 2 7B Instruct by qwen: Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and grou...
Qwen 2 7B Instruct (free) by qwen: Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and grou...

OpenRouter (Alex Atallah) ▷ #general (130 messages🔥🔥):

Google Gemini Models

GPT-4o Free Tier

Gemini 1.5 Pro Performance

OpenRouter Issues

Llama 3 Extended Context Models

Google Gemini Models Critiqued: Members debated why Google doesn’t provide a better model for Gemini’s free tier, calling Gemini 1.0 ‘quite bad’ compared to GPT-4o.
- ‘Google does know how to cook,’ noted a user of Gemini 1.5 Pro, pointing out its creative potential but problematic coding performance.
GPT-4o Impresses with Free Tier Strategy: A member remarked that OpenAI succeeded by offering GPT-4o to their free tier users, setting a high bar for competitors.
OpenRouter Suffered Minor Outages: Users reported sporadic outages on OpenRouter and experienced difficulties in accessing the site and API, triggering multiple inquiries.
- Official responses attributed these issues to intermittent routing problems and possible Cloudflare errors, with general service resuming shortly after.
Seeking Llama 3 Extended Context Models: Members expressed frustration over the 8k context window limit of Llama 3-70B Instruct and the challenge of finding better alternatives.
- Models like Euryale and Magnum-72B were suggested, but lack of consistent instruction-following and high cost are notable concerns.
OpenRouter Model Access and Functionality: There was confusion over OpenRouter’s service, clarifying it is not free for all models but does offer free options without a subscription.
- OpenRouter provides access to various APIs and free models, with details on hosting local models still under specific business contracts.

Links mentioned:

JSON Editor Online: edit JSON, format JSON, query JSON: JSON Editor Online is the original and most copied JSON Editor on the web. Use it to view, edit, format, repair, compare, query, transform, validate, and share your JSON data.
Playground | OpenRouter: Experiment with different models and prompts
Llama 3 Euryale 70B v2.1 by sao10k: Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and c...
Magnum 72B by alpindale: From the maker of [Goliath](https://openrouter.ai/models/alpindale/goliath-120b), Magnum 72B is the first in a new family of models designed to achieve the prose quality of the Claude 3 models, notabl...
Together Pricing | The Most Powerful Tools at the Best Value: Get detailed pricing for inference, fine-tuning, training and Together GPU Clusters.
Wizard101 0bobux GIF - Wizard101 0Bobux Wallet - Discover & Share GIFs: Click to view the GIF
Models - Newest | OpenRouter: Browse models on OpenRouter
Llama 3 Lumimaid 8B (extended) by neversleep: The NeverSleep team is back, with a Llama 3 8B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necessar...
Llama 3 Stheno 8B v3.3 32K by sao10k: Stheno 8B 32K is a creative writing/roleplay model from [Sao10k](https://ko-fi.com/sao10k). It was trained at 8K context, then expanded to 32K context. Compared to older Stheno version, this model is...
Models: 'ne' | OpenRouter: Browse models on OpenRouter
OpenRouter Status: OpenRouter Incident History

Jul 13, 2024 We Solved Hallucinations

Sat, Jul 13, 2024

OpenRouter (Alex Atallah) ▷ #general (15 messages🔥):

Training Data Concerns

Integrations (Beta)

Prompting Image Models

405b Model Update

Specialized Models

Understanding DeepInfra’s data policy: A member asked about keeping training data and mentioned that companies like DeepInfra keep the data.
- DeepInfra logs usage but does not train on user inputs, and the detailed policy can be referenced on their website.
Integrations (Beta) Opens New Possibilities: Members inquired about the new Integrations (Beta) feature, designed to use custom API keys for various providers, including Groq.
- Future expansions include integrations beyond model APIs.
Improve Weak Models by Prompt Placement: A user shared a tip to place text prompts after images in the content for better responses.
- This method helps weaker models accurately understand and answer the request.
405b Model Release Anticipation: A user announced that 405b model is expected to be released soon, causing excitement in the community.
- Bindu Reddy tweeted about the model’s anticipated release, marking July 23rd as a historic day for open source AGI.
Debate on Specialized vs Generic Models: A member questioned why companies like OpenAI and Anthropic do not create multiple specialized models instead of one generic model.
- Alex Atallah agreed, suggesting that specialization should be considered, and asked which specialized models users would utilize the most.

Links mentioned:

Tweet from Bindu Reddy (@bindureddy): Yay!!! July 23rd will go down in the history of open source AGI! Can’t wait 💃💃💃
DeepInfra Privacy Policy: Run the top AI models using a simple API, pay per use. Low cost, scalable and production ready infrastructure.

Jul 12, 2024 FlashAttention 3, PaliGemma, OpenAI's 5 Levels to Superintelligence

Fri, Jul 12, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Magnum 72B

Hermes 2 Theta

Model Deprecations

Router Resilience Update

Magnum 72B aims for Claude 3 level prose: Alpindale’s Magnum 72B is designed to achieve the prose quality of the Claude 3 models, with origins in Qwen2 72B and trained with 55 million tokens of RP data.
Hermes 2 Theta merges Llama 3 with metacognitive abilities: Nousresearch’s Hermes-2 Theta is an experimental model combining Llama 3 and Hermes 2 Pro, notable for function calls, JSON output, and metacognitive abilities.
Older models face deprecation: Due to low usage, intel/neural-chat-7b and koboldai/psyfighter-13b-2 are set for deprecation and will begin to 404 over the API by July 25th.
Router gains resilience with fallback feature: A new router feature will use fallback providers by default unless allow_fallbacks: false is specified, ensuring resilience during top provider outages.

Links mentioned:

Magnum 72B by alpindale: From the maker of [Goliath](https://openrouter.ai/models/alpindale/goliath-120b), Magnum 72B is the first in a new family of models designed to achieve the prose quality of the Claude 3 models, notabl...
Goliath 120B by alpindale: A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale. Credits to - [@chargoddard](https://huggingface.co/chargoddard) for developing the fra...
Qwen 2 72B Instruct by qwen: Qwen2 72B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and gro...
Nous: Hermes 2 Theta 8B by nousresearch: An experimental merge model based on Llama 3, exhibiting a very distinctive style of writing. It combines the the best of [Meta's Llama 3 8B](https://openrouter.ai/models/meta-llama/llama-3-8b-in...
Meta: Llama 3 8B (Base) by meta-llama: Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This is the base 8B pre-trained version. It has demonstrated strong performance compared to leading closed-...
NousResearch: Hermes 2 Pro - Llama-3 8B by nousresearch: Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mod...
Neural Chat 7B v3.1 by intel: A fine-tuned model based on [mistralai/Mistral-7B-v0.1](/models/mistralai/mistral-7b-instruct-v0.1) on the open source dataset [Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca),...
Psyfighter v2 13B by koboldai: The v2 of [Psyfighter](/models/jebcarter/psyfighter-13b) - a merged model created by the KoboldAI community members Jeb Carter and TwistedShadows, made possible thanks to the KoboldAI merge request se...

OpenRouter (Alex Atallah) ▷ #general (62 messages🔥🔥):

Noromaid model removal

LLaMA-Guard benefits

VoiceFlow integration with OpenRouter

Maintaining conversation context

OpenRouter and assistant API

Noromaid model phased out due to cost: Members discussed the removal of the noromaid model due to its high cost and low usage.
- One member noted, ‘I really liked that noromaid model, it was just too dang expensive to use all the time.’
LLaMA-Guard as a moderator model alternative: Members considered using LLaMA-Guard as an alternative to Noromaid for moderation purposes, noting that it could be passed filter arguments through OR.
- A member shared a link to LLaMA-Guard and mentioned, ‘And small enough to run locally.’
Challenges of integrating OpenRouter with VoiceFlow: VoiceFlow integration with OpenRouter was discussed for managing conversation context with OR, raising concerns about stateless requests.
- One suggestion was to use conversation memory on VoiceFlow to maintain chat history.
Importance of maintaining conversation context: Users discussed strategies for maintaining conversation context using APIs like OpenRouter and frameworks such as LangChain.
- ‘VoiceFlow will (or should) have a way to maintain the conversation history,’ noted a member, highlighting the need for context persistence.
Interest in Assistant API for OpenRouter: There was a discussion about the potential benefits of OpenRouter supporting an Assistants API, similar to OpenAI’s setup.
- Members pointed out the value this could bring, including features like embedded docs and code interpreter, if it weren’t such a huge undertaking.

Links mentioned:

GitHub - vercel/ai: Build AI-powered applications with React, Svelte, Vue, and Solid: Build AI-powered applications with React, Svelte, Vue, and Solid - vercel/ai
DeepInfra Privacy Policy: Run the top AI models using a simple API, pay per use. Low cost, scalable and production ready infrastructure.
meta-llama/Meta-Llama-Guard-2-8B · Hugging Face: no description found

Jul 11, 2024 Nothing much happened today

Thu, Jul 11, 2024

OpenRouter (Alex Atallah) ▷ #general (50 messages🔥):

LLM applications for language translation

LangChain issues affecting OpenRouter

LLM eval frameworks

Rate limit for Gemini model

Noromaid model removal

LLM translation capabilities debated: Members discussed the effectiveness of LLMs like GPT-4/4o, Claude Opus/Sonnet-3.5 vs specialized translation models, expressing skepticism about the reliability of LLMs for longer translations.
- kewlbunny appreciated an insightful explanation about the limitations of decoder-only models compared to encoder/decoder transformers for translation tasks, with a suggestion to watch Andrej Karpathy’s videos for a deeper understanding.
LangChain updates break OpenRouter: A member reported validation errors with LangChain and LangChain-openai following recent updates, impacting OpenRouter’s API functionality.
- Rolling back to previous versions resolved the issue, and others noted LangChain’s tendency to break compatibility frequently.
Interest in LLM eval frameworks: Alex Atallah inquired about experiences with LLM evaluation frameworks like Deepeval and Gentrace, prompting discussion but no detailed responses were provided.
- The query remains open for further insights from the community.
Concerns over Gemini model rate limit: A member asked about the rate limits applied to the Gemini 1.5 model but did not receive a direct answer from the community.
- The inquiry reflects ongoing concerns about usage limits in LLM deployment.
Noromaid model removal sparks discussion: Members expressed disappointment over the removal of the Noromaid model due to low usage, speculating on the impact of its pricing on usage rates.
- The conversation highlighted demand for cost-effective yet efficient models for regular use.

Links mentioned:

meta-llama/Meta-Llama-Guard-2-8B · Hugging Face: no description found
Andrej Karpathy: SuperThanks: optional, all revenue goes to supporting my work in AI + Education.
karpathy - Overview: I like to train Deep Neural Nets on large datasets. - karpathy

Jul 10, 2024 Test-Time Training, MobileLLM, Lilian Weng on Hallucination (Plus: Turbopuffer)

Wed, Jul 10, 2024

OpenRouter (Alex Atallah) ▷ #general (47 messages🔥):

Quota Exceeded Issue

Image Viewing Issues

Dolphin 2.9 Mixstral on OpenRouter in LangChain

Mistralai Mixtral v0.1 Error

LLM Applications for Language Translation

Quota Exceeded for OpenRouter API: A user experienced a quota exceeded error: ‘Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-1.5-flash’.
- It could be a limit imposed by Google on OpenRouter.
Image Viewing Issues on OpenRouter: A user reported issues with image viewing on models returning None across various models like gpt-4o, claude-3.5, and firellava13b.
- Another user confirmed these images were working well for them, indicating the issue might not be widespread.
Challenges Integrating Dolphin 2.9 Mixstral in LangChain: A user is trying to get Dolphin 2.9 Mixstral on OpenRouter to work in LangChain as a tool calling agent but is facing issues.
Mistralai Mixtral v0.1 Not Supported for JSON Mode: A user encountered the error ‘mistralai/Mixtral-8x22B-Instruct-v0.1 is not supported for JSON mode/function calling’, noting it happens sporadically.
- After testing, the user identified Together as the provider causing the issue.
LLM Applications for Language Translation Preferences: Users discussed the effectiveness of LLMs for translation, comparing them to specialized translation models.
- One user highlighted that modern LLMs use decoder-only models and may not be as reliable as true encoder/decoder transformers for translation tasks.

Links mentioned:

Activity | OpenRouter: See how you've been using models on OpenRouter.
Provider Routing | OpenRouter: Route requests across multiple providers

Jul 09, 2024 Problems with MMLU-Pro

Tue, Jul 9, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

New model browser UI

Noromaid Mixtral deprecation

OpenRouter launches new model browser UI: OpenRouter introduced a brand-new model browser UI featuring 16 parameter filters, category filters, context length, price, and more.
- The /models page is now significantly faster, especially on mobile devices, making it easier to explore 180 active language models processing 74 billion tokens per week.
Neversleep’s Noromaid Mixtral model deprecated: Due to decreased usage, the Noromaid Mixtral model will be deprecated and will continue to function over the API for the next two weeks before being removed.
- Say goodbye to Neversleep’s Noromaid Mixtral, as it will 404 after the set period.

Link mentioned: Tweet from OpenRouter (@OpenRouterAI): Announcing a brand-new model marketplace UI ✨ Explore 180 active language models processing 74 billion tokens/week 👇

OpenRouter (Alex Atallah) ▷ #app-showcase (6 messages):

Viinyx AI launch

Text to image API services

Viinyx AI Launch Boosts Productivity: Viinyx AI, a browser extension, launched to augment the browsing experience by integrating multiple generative AI models like ChatGPT, Anthropic, and Gemini to write and create images anywhere on the web. Check it out on the Chrome Web Store and the official website.
Seeking Text to Image API Services: A user asked for recommendations on services providing text-to-image API with different models, similar to OpenRouter. Replicate was suggested as a possible option, and other mentions included Novita and Fireworks.

Links mentioned:

Viinyx - AI Assistant (ChatGPT, GPT-4o, Claude, Gemini): Powerful all-in-one AI copilot to increase your productivity. Use generative AI (ChatGPT, Claude, Gemini) to write & paint anywhere.
Tweet from Viinyx AI - The Best All-in-one AI browser assistant: Viinyx AI browser extension - Use ChatGPT, Claude, Meta.ai, Microsoft Copilot on any web page. Summarize pages and videos to accelerate your learning. Viinyx AI is BYOK and use your own AI provider br...

OpenRouter (Alex Atallah) ▷ #general (27 messages🔥):

Crypto payments

Perplexity models

Generative video future

OpenRouter provider options

Model pricing competition

Explore multiple crypto options for payments: Users discussed that Coinbase Commerce allows payments in USDC, Matic via Polygon, and other cryptocurrencies.
- One noted that Matic payments worked well.
Perplexity models have API limitations: The Perplexity API does not perform as well as its web interface, especially lacking reference links in responses.
- Alternatives like Phind and direct scraping of GitHub and StackOverflow might be better for summarizing technical queries.
Generative video quality predictions: A user inquired about the future of generative video in terms of quality, speed, and price over the next 1-1.5 years.
- The discussion did not yield concrete predictions, highlighting the speculative nature of such advancements.
OpenRouter allows custom providers: Members confirmed that OpenRouter allows users to serve their own finetuned models if they can handle a substantial number of requests.
- This provides flexibility for developers seeking to integrate custom AI solutions.
Price war between DeepInfra and Novita on OpenRouter: DeepInfra and NovitaAI are competing for the top slot on OpenRouter for models like Llama3 and Mistral with minuscule price differences.
- Users joked about them lowering prices by 0.001 to switch ranking spots until very competitive thresholds were reached.

Jul 06, 2024 Qdrant's BM42: "Please don't trust us"

Sat, Jul 6, 2024

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Simple Telegram bot to interface with different AI models

First 1000 responses free on the bot

Try Mysticella Bot for AI Model Interfacing: Created a simple Telegram bot to interface with different AI models. First 1000 responses are free.
Telegram Bot First 1000 Responses Free: Check out the new Telegram bot Mysticella for free AI model interfacing. The first 1000 responses are completely free.

OpenRouter (Alex Atallah) ▷ #general (107 messages🔥🔥):

Quantisation of deployed LLM models in OpenRouter

Microsoft's API changes affecting OpenRouter

Infermatic's privacy policy update

Issues with DeepSeek Coder equations rendering

Mistral Codestral API pricing and performance

LLM models quantization confusion clarified: OpenRouter LLM models are deployed in FP16/BF16 unless a provider specifies otherwise, as explained by a user. Another user clarified the presence of a quantization icon indicating model quantization status.
Microsoft API change impacts OpenRouter: Microsoft introduced a breaking change to their API used by OpenRouter, but a patch was quickly deployed. User feedback praised the rapid response and fix.
Infermatic clarifies privacy policy: Infermatic does not log any input prompts or model outputs, processing data in real-time only, as clarified in their revised privacy policy. Users found this reassuring compared to older policies indicating potential data retention.
DeepSeek Coder equation issue resolved: Users experienced issues with equations not rendering correctly in DeepSeek Coder, although one user found solutions by manipulating output strings with regex. Another user reported the system prompts not being processed correctly on TypingMind’s frontend, raising the issue for review.
Mistral Codestral API pricing criticized: Users expressed dissatisfaction with Mistral’s Codestral API pricing, considering it overpriced for a 22B model. Alternative options like DeepSeek Coder were recommended for better cost efficiency and comparable coding performance.

Links mentioned:

UGI Leaderboard - a Hugging Face Space by DontPlanToEnd: no description found
Llama 3 Euryale 70B v2.1 by sao10k: Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and c...
Limits | OpenRouter: Set limits on model usage
Code generation | Mistral AI Large Language Models: Codestral is a cutting-edge generative model that has been specifically designed and optimized for code generation tasks, including fill-in-the-middle and code completion. Codestral was trained on 80+...
A guide to LLM inference and performance: Learn if LLM inference is compute or memory bound to fully utilize GPU power. Get insights on better GPU resource utilization.
SillyTavern/src/prompt-converters.js at release · SillyTavern/SillyTavern: LLM Frontend for Power Users. Contribute to SillyTavern/SillyTavern development by creating an account on GitHub.
Privacy Policy - Infermatic: no description found
no title found: no description found
Privacy Policy - Infermatic: no description found
lluminous: no description found

Jul 03, 2024 Not much happened today.

Thu, Jul 4, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Big update to /models page

Changing Google Token Sizes for Gemini and PaLM models

Deprecation of Default Model in settings page

Deprecation of custom auth headers for OpenAI API keys

Big Update Coming to /models Page: A significant update to the /models page is coming soon, with a sneak peek shared. Members are encouraged to provide feedback in the dedicated channel.
Google Token Sizes Changing for Gemini and PaLM Models: The Gemini and PaLM models will have their token lengths changed to match GPT-equivalent sizes, increasing token size roughly 3x and reducing context limits, leading to higher pricing but with the same model and API.
Deprecating the Default Model on Settings Page: The Default Model option on the /settings page is being deprecated as most apps set models themselves or use the auto router. Users with a valid use case are encouraged to provide feedback.
Deprecating Custom Auth Headers for API Keys: The use of custom auth headers for sending OpenAI API keys is being deprecated, with a replacement coming soon. This feature was used by a few people in mid-June but was never officially documented.

OpenRouter (Alex Atallah) ▷ #app-showcase (3 messages):

Quick and dirty wrapper shared by lastrosade

Feedback on the non-streamed response

Quick and dirty wrapper shared: lastrosade announced the creation of a quick and dirty wrapper, offering it to anyone interested in the community. No additional technical details or links were provided.
Feedback on the non-streamed response: A community member, clarie_starr, commented on the wrapper mentioning, “So all that for a non-streamed response. I’ll give it to you it’s um detailed.” followed by lastrosade agreeing that the wrapper “sucks”.

OpenRouter (Alex Atallah) ▷ #general (77 messages🔥🔥):

500 errors with Claude 3.5

Self-moderation issues with Claude

Different frontends for using and jailbreaking Claude

OpenRouter privacy settings and logging policies

Google models token size change announcement

500 Errors with Claude 3.5: Several users reported intermittent 500 errors while using Claude 3.5 on OpenRouter. Temporary fixes include switching to different versions like Claude 3.0.
OpenRouter Privacy and Logging Issues Addressed: Users discussed OpenRouter’s privacy settings, clarifying that some providers log requests while others do not, with an emphasis on NovitaAI and Infermatic not retaining data. Alex Atallah provided insights into the different privacy policies for third-party providers.
Google Models Token Size Update Clarification: A discussion on the token size change in Google models raised concerns about potential cost increases. LouisGV clarified that total pricing remains roughly the same despite the token size adjustment.
Exploring Different Frontends for Claude: Users explored various frontends like SillyTavern and LibreChat for jailbreaking or prefilling Claude models. Typingmind and Pal Chat were suggested as alternatives for a smoother user experience.
Quantization of LLM Models on OpenRouter: Questions about the quantization of deployed LLM models on OpenRouter were raised, focusing on whether models are in FP16 or other precisions. The discussion highlighted that models remain in their native precision unless specified otherwise by the provider.

Links mentioned:

no title found: no description found
RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing | LMSYS Org: <p>LLMs have demonstrated remarkable capabilities across a range of tasks, but there exists wide variation in their costs and capabilities, as seen from the ...
Llama 3 Euryale 70B v2.1 by sao10k: Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and c...
MythoMax 13B by gryphe: One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
Settings | OpenRouter: Manage your accounts and preferences
WizardLM-2 8x22B by microsoft: WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing ...
Privacy Policy - Infermatic: no description found
Privacy Policy - Infermatic: no description found

Jul 03, 2024 GraphRAG: The Marriage of Knowledge Graphs and RAG

Wed, Jul 3, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Big update to the /models page, Changing Google Token Sizes for Gemini and PaLM models, Deprecation of Default Model setting, Deprecation of custom auth headers for OpenAI API keys

Big Update to /models Page: A big update to the /models page is coming soon. A sneak peek was shared, and feedback is encouraged in the designated feedback channel.
- Let us know what you’d like to see.
Token Sizes Changing for Google Models: Gemini and PaLM models will be moving to GPT-equivalent token lengths to make their statistics more standard, resulting in a price increase and a decrease in context limits.
- This will make it the same model and API but with different token sizes and pricing.
Default Model Setting Deprecation: The Default Model setting on the /settings page is now in the deprecation queue.
- Feedback is requested if you have a valid use for it, as most apps are setting models individually or using auto router.
Deprecation of Custom Auth Headers: The use of custom auth headers to send OpenAI API keys is being deprecated.
- A better solution will replace it soon; it was used by a few requests in mid-June but has never been officially documented.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

lastrosade: I made a quick and dirty wrapper if anyone wants it.

OpenRouter (Alex Atallah) ▷ #general (68 messages🔥🔥):

Mistral API error handling, Differences between LiteLLM and OpenRouter, Improving efficiency in conversation bots, Claude 3.5 intermittent errors, Frontend Apps for OpenRouter on iOS

Troubleshooting Mistral API Errors: Members discussed receiving Mistral API errors when using OpenRouter with Sonnet 3.5, despite not directly using Mistral. Possible fallback to Mistral when the main model request fails seems to be the issue, as explained.
- It was noted that errors result from the inference provider’s backend, not OpenRouter itself. A recommendation was made to contact Aider’s support to verify why the fallback occurs.
LiteLLM vs. OpenRouter: Members questioned the differences between LiteLLM and OpenRouter, learning that LiteLLM is essentially software while OpenRouter is an online API service. Explanation clarified that OpenRouter forwards messages from the provider verbatim.
- Discussion highlighted OpenRouter’s role as an intermediary, with users needing to set preferences to avoid fallback to unwanted models.
Enhancing Discord Bot Conversations: A discussion focused on making Discord bots using OpenRouter more efficient by appending only relevant parts of messages to prompts, thus saving tokens. It’s verified that saving the entire conversation elsewhere and including just essential snippets is common practice.
- Using efficient prompt management strategies is vital for long conversations due to the limited context size models available, as discussed by SillyTavern Discord users.
Handling Claude 3.5 Errors: Members reported intermittent 500 errors with the Claude 3.5 model, affecting both self-moderated and generally moderated versions. Claude 3.0 continued to work without issues.
- The errors appeared to be model-specific glitches rather than broader infrastructure issues. It was suggested that fixes are underway.
Finding Suitable iOS Apps for OpenRouter: Queries about iOS apps that support OpenRouter led to recommendations for Pal Chat and Typingmind. Both apps reportedly fixed prior bugs related to OpenRouter and now support a variety of APIs.
- The community also discussed other frontends like LibreChat and SillyTavern, even though some users expressed dissatisfaction with roleplay-focused interfaces.

Links mentioned:

no title found: no description found
Transforms | OpenRouter: Transform data for model consumption
Everything you need to create images with AI | getimg.ai: Magical AI art tools. Generate original images, modify existing ones, expand pictures beyond their borders, and more.
Activity | OpenRouter: See how you've been using models on OpenRouter.

Jul 02, 2024 RouteLLM: RIP Martian? (Plus: AINews Structured Summaries update)

Tue, Jul 2, 2024

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Analytics on OpenRouter temporarily down: Our analytics on OpenRouter are temporarily down due to a database operation mistake. The team is working on fixing the data and apologizes for the inconvenience.
- The error does not affect customer data and credit. Users were reassured with a “sorry for the inconvenience” message.
Customer data unaffected: Despite the downtime of analytics on OpenRouter, customer data and credits remain unaffected. Users were informed that the team is addressing the issue.
- The notification included an apology for any inconvenience caused and assured users that customer data security is intact.

Link mentioned: LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

wyverndryke: These are amazing you guys! 😄

OpenRouter (Alex Atallah) ▷ #general (155 messages🔥🔥):

OpenRouter API response quality concerns: A user expressed concerns that prompts tested in OpenRouter produce different, lower-quality responses when executed via OpenRouter API. Suggestions to debug included checking the request sample and ensuring it matches the setup on OpenRouter.
- Other members discussed setting up request samples correctly and considered using Bedrock or other moderated models for better quality responses. One member mentioned using a reverse proxy for Bedrock setup.
Using Geminis Model context discrepancy: A user found an error on the website showing Gemini’s context as 2.8M while only 1M was observed, leading to confusion. Another member clarified that OpenRouter counts tokens differently, typically shorter.
- Additional insights highlighted that OpenRouter’s token counting results in shorter tokens compared to usual standards, hence the perceived discrepancy. Alternative routes for feedback and reporting errors were discussed.
DeepSeek Code V2 lauded for accuracy: A member praised the DeepSeek Code V2 through OpenRouter API for its high accuracy in solving calculus problems and coding implementations. They found the model both effective and economical.
- Another member confirmed the model is the full 263B one as it routes through the DeepSeek API, suggesting its accuracy and power in various use cases. External links were provided for further details on the model.
Issues with embedding models and local alternatives: A member inquired about OpenRouter’s support for embedding models, but another clarified that direct API usage for embeddings is recommended due to their low cost and compatibility concerns.
- Suggestions were made to use local embedding models, highlighting their advantage in maintaining consistency with already generated embeddings. Nomic models on HuggingFace were recommended as popular choices.
Error with Mistral API during Sonnet 3.5 use: A user reported encountering a Mistral API error while using Sonnet 3.5 on Aider chat, even though they were not using Mistral. It was suggested that fallback mechanisms may be automatically switching to Mistral.
- For resolving this issue, users were advised to reach out to Aider’s support for more specific debugging. The problem was flagged as a likely fallback to a different API when the primary request is blocked.

Links mentioned:

DeepSeek-Coder-V2 by deepseek: DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model. It is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. The origina...
Meta: Llama 3 70B Instruct by meta-llama: Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated stron...
MTEB Leaderboard - a Hugging Face Space by mteb: no description found
nomic-ai/nomic-embed-text-v1.5 · Hugging Face: no description found
deepseek-ai/DeepSeek-Coder-V2-Instruct · Hugging Face: no description found
khanon / oai-reverse-proxy · GitLab: Reverse proxy server for various LLM APIs. Features translation between API formats, user management, anti-abuse, API key rotation, DALL-E support, and optional prompt/response logging.

OpenRouter (Alex Atallah) ▷ #일반 (1 messages):

salnegyeron: 나쁘진 않는데 Cohere의 Aya23이 나은 것 같았어요

Jun 29, 2024 That GPT-4o Demo

Sat, Jun 29, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Gemma 2.9B launches new models: Free & standard variants for the new google/gemma-2-9b-it are now available. OpenRouter announced this update for 2023-2024.
Price cuts announced: Several popular models have received price reductions. Notable drops include cognitivecomputations/dolphin-mixtral-8x22b with a 10% cut, openchat/openchat-8b with a 20% reduction, and meta-llama/llama-3-70b-instruct with a 3.5% drop.

Links mentioned:

Google: Gemma 2 9B by google: Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class. Designed for a wide variety of tasks, it empowers developers...
Dolphin 2.9.2 Mixtral 8x22B 🐬 by cognitivecomputations: Dolphin 2.9 is designed for instruction following, conversational, and coding. This model is a finetune of [Mixtral 8x22B Instruct](/models/mistralai/mixtral-8x22b-instruct). It features a 64k context...
OpenChat 3.6 8B by openchat: OpenChat 8B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It...
MythoMax 13B by gryphe: One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
Meta: Llama 3 70B (Base) by meta-llama: Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This is the base 70B pre-trained version. It has demonstrated strong performance compared to leading closed...
Qwen 2 72B Instruct by qwen: Qwen2 72B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and gro...

OpenRouter (Alex Atallah) ▷ #general (57 messages🔥🔥):

OpenRouter Moderation Strictness Discussed: Members compared OpenRouter’s self-moderation to AWS and Anthropic, suggesting it is more censored. One user mentioned, “Both will refuse without a prefill but start writing with a basic prefill.”
Issues with Opus Availability: A user noted that enabling Opus is currently unavailable without enterprise support. They linked to a Reddit post discussing this limitation.
Troubleshooting GitHub Authentication: Members shared solutions for making GitHub pushes without repeatedly entering a passphrase, recommending tools like ssh-add -A and adding commands to ~/.bash_profile. One detailed guide was linked in a SuperUser post.
API Differences and Issues: Discussions revealed API discrepancies, particularly with Gemini models producing a “Status 400” error. It’s highlighted that Google APIs do not follow standard formatting, with specific adjustments required for tool roles.
Evaluating LLM APIs: A member suggested watching Simon Willison’s talk for an overview of LLM APIs, sharing a YouTube link and a link to his blog post.

Links mentioned:

Provider Routing | OpenRouter: Route requests across multiple providers
Reddit - Dive into anything: no description found
AI Engineer World’s Fair 2024 — Keynotes & CodeGen Track: https://twitter.com/aidotengineer
Cursor Community Forum: A place to discuss Cursor (bugs, feedback, ideas, etc.)
macOS keeps asking my ssh passphrase since I updated to Sierra: It used to remember the passphrase, but now it's asking it to me each time. I've read that I need to regenerate the public key with this command, which I did: ssh-keygen -y...

OpenRouter (Alex Atallah) ▷ #일반 (1 messages):

voidnewbie: Gemma 2가 명목상으로는 영어만 지원하지만 뛰어난 다국어 능력을 가지고 있는 것 같아요. 한국어를 시험해보신 분 계신가요?

OpenRouter (Alex Atallah) ▷ #tips (1 messages):

Set default model wisely: daun.ai suggests setting your default model to ‘auto’ for reliable output on most tasks. Alternatively, use ‘flavor of the week’ for more serendipitous results, which will be the fallback model if no specific model is chosen and a request fails.

Jun 28, 2024 Gemma 2: The Open Model for Everyone

Fri, Jun 28, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Stheno 8B debuts on OpenRouter: The L3 Stheno 8B 32K is now available on OpenRouter. This model is launched by OpenRouter, LLC for the year 2023-2024.
Flavor of the Week: Stheno 8B is highlighted as this week’s flavor on OpenRouter. The model continues to catch users’ interest under the promotion by OpenRouter, LLC for the year 2023-2024.

Links mentioned:

Llama 3 Stheno 8B v3.3 32K by sao10k: Stheno 8B 32K is a creative writing/roleplay model from [Sao10k](https://ko-fi.com/sao10k). It was trained at 8K context, then expanded to 32K context. Compared to older Stheno version, this model is...
Flavor of The Week by sao10k: This is a router model that rotates its underlying model weekly. It aims to be a simple way to explore the capabilities of new models while using the same model ID. The current underlying model is [L...

OpenRouter (Alex Atallah) ▷ #general (39 messages🔥):

NVIDIA Nemotron page issues perplex user: A user reported getting a ‘page not working’ error when trying to select NVIDIA Nemotron on their phone, even though it seemed to work for another user. They noted it was not a big issue if it was just an isolated problem with their phone.
OpenRouter API key compatibility inquiry: One user asked if applications that expect an OpenAI key could work with an OpenRouter API key. They were advised to try overriding the base API URL, though solutions may vary by application.
Model recommendations amidst censorship concerns: A user requested recommendations for uncensored models. Cmd-r and Euryale 2.1 (a fine-tuned Llama 3) were suggested, with Magnum mentioned as pending inclusion on OpenRouter, while jailbroken Claude 3 was also highlighted.
Google Gemini API updates excite developers: A shared Google blog post announced access to a 2 million token context window for Gemini 1.5 Pro, alongside code execution capabilities. This update aims to help developers manage input costs through context caching.
Artifacts feature in Claude 3 Web causes confusion: A user desired a similar feature to Anthropic’s Artifacts on OpenRouter. They were advised that Sonnet-3.5 might offer a partial workaround by generating code via the usual prompt methods.

Links mentioned:

no title found: no description found
Gemini 1.5 Pro 2M context window, code execution capabilities, and Gemma 2 are available today: no description found

Jun 27, 2024 Mozilla's AI Second Act

Thu, Jun 27, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

OpenRouter Introduces New Model: Check out the new 01-ai/yi-large model just announced by OpenRouter, LLC for 2023 - 2024. It’s the latest addition to their offering.
Recommended Parameters Tab Issue Fixed: There was an issue with the incorrect data being shown on the Recommended Parameters tab for model pages. This problem has been resolved and the tab now displays the correct information.

Link mentioned: Yi Large by 01-ai: The Yi Large model was designed by 01.AI with the following usecases in mind: knowledge search, data classification, human-like chat bots, and customer service. It stands out for its multilingual pro…

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

GPA Saver Website Launched: A member shared their new website, GPA Saver, integrating AI for academic assistance. They thanked OpenRouter for making LLM integration seamless and easy.
AI-Powered Academic Tools: The website offers several academic tools including an assistant chat, rapid quiz solver, PDF summarizer, interactive whiteboard, and flashcard generator. There’s a special discount code BETA for the first 100 users, providing ~37% off.

Link mentioned: GPA Saver: Leverage the power of AI for your studies.

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}

Jun 26, 2024 Shall I compare thee to a Sonnet's day?

Wed, Jun 26, 2024

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Unleash Jamba Instruct by AI21: Check out the new AI21: Jamba Instruct model. This model has been added to OpenRouter’s lineup for 2023-2024.
Navigate NVIDIA’s Nemotron-4 340B: OpenRouter introduces the NVIDIA Nemotron-4 340B Instruct model. Available now, it’s part of the 2023-2024 range.
Explore 01-ai/yi-large: A new model, 01-ai/yi-large, is now live on OpenRouter. This release comes under the 2023-2024 collection.
Notice: Incorrect Data on Recommended Parameters Tab: The Recommended Parameters tab on model pages is currently displaying incorrect data. A fix is in progress and an update will be shared soon.

Links mentioned:

Yi Large by 01-ai: The Yi Large model was designed by 01.AI with the following usecases in mind: knowledge search, data classification, human-like chat bots, and customer service. It stands out for its multilingual pro...
AI21: Jamba Instruct by ai21: The Jamba-Instruct model, introduced by AI21 Labs, is an instruction-tuned variant of their hybrid SSM-Transformer Jamba model, specifically optimized for enterprise applications. - 256K Context Wind...
NVIDIA Nemotron-4 340B Instruct by nvidia: Nemotron-4-340B-Instruct is an English-language chat model optimized for synthetic data generation. This large language model (LLM) is a fine-tuned version of Nemotron-4-340B-Base, designed for single...

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

AI takes on Elite: Dangerous!: rudestream developed an AI Integration for Elite: Dangerous acting as a ship computer that reacts and responds to in-game events and player requests. Check out their project on GitHub here and watch a demo video here.
Calls for STT and TTS support: The developer mentioned creating the project primarily using free models from OpenRouter but expressed a desire for support for Speech-to-Text (STT) and Text-to-Speech (TTS) models.

Link mentioned: A Day in the Life of a Bounty Hunter | Elite: Dangerous AI Integration: 🌟 Project on Github: https://github.com/RatherRude/Elite-Dangerous-AI-Integration ( github.com/RatherRude/Elite-Dangerous-AI-Integration )💬 Join our …

OpenRouter (Alex Atallah) ▷ #general (63 messages🔥🔥):

OR Announcement Post Delay Explained: Users inquired about the removal of an announcement post, which was explained as being related to the latest Jamba model that required further testing. “It’s up but we needed to test it a bit more.”
Is the Age of LLM Innovation Over?: A user expressed concerns about a plateau in large language model advancements, noting it’s been a year since GPT-4’s release. They recommended a podcast by Francois Chollet discussing the current state and future of AI.
AI21’s Jamba Instruct Model Issues: Several users reported errors while using the ai21/jamba-instruct model, causing frustration even after adjusting privacy settings. One user found success after resolving local caching issues and noted inconsistencies between chat and API usage.
Handling LLM Instructions: Users discussed the best practices for handling instructions within LLM prompts, considering alternatives like XML tags for specific models. Fry69_61685 shared a useful resource, the Anthropic Claude prompt engineering guide, for further reading.
Debate on AI Model Neutrality and Originality: A discussion arose around the restrictive nature of large corporate LLMs, which avoid taking definitive stances in philosophical or controversial topics. One user emphasized that “talking to a text wall” is less preferred compared to models capable of engaging in more dynamic and interesting dialogues.

Links mentioned:

Tiktokenizer: no description found
Settings | OpenRouter: Manage your accounts and preferences
280 | François Chollet on Deep Learning and the Meaning of Intelligence – Sean Carroll: no description found
280 | François Chollet on Deep Learning and the Meaning of Intelligence – Sean Carroll: no description found

Jun 25, 2024 Gemini Nano: 50-90% of Gemini Pro, <100ms inference, on device, in Chrome Canary

Tue, Jun 25, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

AI21 introduces Jamba-Instruct: Jamba-Instruct, an instruction-tuned variant by AI21, is tailored for enterprise use with an impressive 256K context window to handle large documents. Check out more details here.
NVIDIA releases Nemotron 4 340B Instruct: Nemotron-4-340B-Instruct is a chat model focused on synthetic data generation for English-language applications. Find out more here.

Links mentioned:

AI21: Jamba Instruct by ai21: The Jamba-Instruct model, introduced by AI21 Labs, is an instruction-tuned variant of their hybrid SSM-Transformer Jamba model, specifically optimized for enterprise applications. - 256K Context Wind…
NVIDIA Nemotron-4 340B Instruct by nvidia: Nemotron-4-340B-Instruct is an English-language chat model optimized for synthetic data generation. This large language model (LLM) is a fine-tuned version of Nemotron-4-340B-Base, designed for single…

OpenRouter (Alex Atallah) ▷ #app-showcase (7 messages):

JojoAI transforms into a proactive assistant: A member has transformed JojoAI into a proactive assistant capable of functions like setting reminders. They highlight that, unlike ChatGPT or Claude, JojoAI uses DigiCord integrations to remind users at specific times JojoAI site.
Pebble: AI reading comprehension tool: An AI-powered reading comprehension tool called Pebble was launched to help users remember information on the web. The developer used OpenRouter with Mistral 8x7b and Gemini and shared gratitude for the support of the OpenRouter team Pebble.
MoA project modified with OpenRouter: A contributor modified the MoA project to use OpenRouter and added a server with an API endpoint, creating a GUI for usage. The project is available on GitHub.

Links mentioned:

Pebble: no description found
DigiCord: The most powerful AI-powered Discord Bot ever!
MoA-Openrouter/gui.ipynb at main · timothelaborie/MoA-Openrouter: together MoA but with Openrouter. Contribute to timothelaborie/MoA-Openrouter development by creating an account on GitHub.

OpenRouter (Alex Atallah) ▷ #general (106 messages🔥🔥):

Nemotron 340b’s environmental impact questioned: “Nemotron 340b is definitely one of the most environmentally unfriendly models u could ever use.” Discussion continued with comparisons suggesting Gemini Flash and other smaller, cheaper models as better alternatives for synthetic data generation.
Claude self-moderated endpoints issue fixed: “Looks like the Claude self-moderated endpoints are gone?” After flagging a 404 error, a fix was implemented quickly, and the issue was resolved.
Sonnet 3.5 praised for coding: A user shared positive experiences using Sonnet 3.5 for coding, calling it impressive and pointing to a real-world demo with Retrieval Augmented Generation (RAG).
OpenRouter rate limits and credits explained: “How do you increase the rate limits for a particular LLM?” Documentation on rate limits and credits was shared, explaining how to check the balance and usage via API requests.
Handling exposed API keys: “Hey, I like an idiot, showed a newly made api key on a stream and someone used it.” Recommendations were given to disable rather than delete compromised keys to trace any improper usage better.

Links mentioned:

Transforms | OpenRouter: Transform data for model consumption
Limits | OpenRouter: Set limits on model usage
Building search-based RAG using Claude, Datasette and Val Town: Retrieval Augmented Generation (RAG) is a technique for adding extra “knowledge” to systems built on LLMs, allowing them to answer questions against custom information not included in their training d…

Jun 22, 2024 Shazeer et al (2024): you are overpaying for inference >13x

Sat, Jun 22, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Stripe payment issue resolved: A Stripe payment issue caused credits not to be added to users’ OpenRouter accounts for a brief period. “Fixed fully, for all payments,” confirms the resolution of this problem.
Claude’s brief outage fixed: Anthropic’s inference engine experienced a 30-minute outage that led Claude to return error 502. The issue has been resolved, as indicated on their status page.

Link mentioned: Anthropic Status: no description found

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Seeking App Listing on OpenRouter: A user asked about getting their app listed after spending a lot of tokens. Another user directed them to the OpenRouter Quick Start Guide, emphasizing the need to include certain headers like "HTTP-Referer" and "X-Title" for proper app ranking.

Link mentioned: Not Found | OpenRouter: The page you are looking for does not exist

OpenRouter (Alex Atallah) ▷ #general (63 messages🔥🔥):

Claude 3.5 Sonnet drops, sparks opinions: Members expressed excitement over the Claude 3.5 Sonnet release, comparing its Python coding skills to earlier versions. Some reported it as an improvement over 3.0, particularly for Python, while others found it lagging behind GPT-4 for JavaScript.
Anthropic server issues cause errors: Users experienced internal server errors with Anthropic models, confirmed by testing other models like OpenAI and Cohere which worked fine. Members pointed out that these issues were not yet reported on Anthropic’s status page.
Auto-routing updates announced: When asked if auto-routing would switch to Claude 3.5 Sonnet instead of 3.0, an update was confirmed to be in the works immediately.
Perplexity Labs offers Nemetron 340b at a decent speed: Perplexity Labs allows users to try out the Nemetron 340b model at a reasonable speed of 23t/s, inviting members to test it out on their platform.
VS2022 extension inquiry for OpenRouter: A member inquired about an extension for VS2022 that works with OpenRouter and allows toggling context via a click or shortcut. The question was whether Continue.dev is the only compatible option available.

Links mentioned:

GitHub - microsoft/promptbase: All things prompt engineering: All things prompt engineering. Contribute to microsoft/promptbase development by creating an account on GitHub.
Perplexity Labs: no description found

Jun 21, 2024 Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts

Fri, Jun 21, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Claude 3.5 Sonnet launches with blazing speed: Claude 3.5 Sonnet outperforms Anthropic’s largest model, Opus, but is 5x cheaper and 2.5x faster. It is available in both standard and self-moderated variants; check here for more information.
Stripe payment issues get resolved: Stripe payments were initially queuing credits instead of adding them to user accounts due to an unknown issue. The problem has been fully fixed, and all pending payments from the past 30 minutes are now processed.

Links mentioned:

Anthropic: Claude 3.5 Sonnet by anthropic: Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at: - Coding: Autonomously writes, edits, and runs code wit...
Anthropic: Claude 3.5 Sonnet (beta) by anthropic: This is a lower-latency version of [Claude 3.5 Sonnet](/models/anthropic/claude-3.5-sonnet), made available in collaboration with Anthropic, that is self-moderated: response moderation happens on the ...
Tweet from OpenRouter (@OpenRouterAI): Claude 3.5 Sonnet is now live! It outperforms Anthropic's largest model, Opus, but is 5x cheaper and 2.5x faster 🔥 Quoting Leon Builds Agents (@leonjcoe) It’s easy to always have access to th...

OpenRouter (Alex Atallah) ▷ #general (93 messages🔥🔥):

Nemotron is not widely hosted: A discussion revealed that Nemotron, in NVIDIA’s NeMo format, is not hosted by many because it is incompatible with mainstream inference engines and is quite large at 340B. One member noted, “Most providers are reluctant to host such large models w/o having some sort of guarantee it wouldn’t flop.”
Dolphin Mixtral 1x22b gains praise: One member argued that Dolphin Mixtral 1x22b, found on HuggingFace, deserves more credit. They highlighted its potential to “challenge and perhaps even completely replace Codestral with none of that restrictive licensing crap.”
OpenRouter website confusion resolved: A user reported the OpenRouter website being down, but it was determined to be a browser-related issue with Safari after the user restarted their PC and noted, “seems to be working now, all good.”
Sonnet 3.5 sparks excitement: Discussion on the release of Claude 3.5 Sonnet by @AnthropicAI noted its competitive pricing at “$3 per million input tokens and $15 per million output tokens”. A member commented on the positive impact, “And pricing still below Opus if I see that right.”
DeepSeek-Coder V2 context clash: A query about DeepSeek-Coder V2’s actual context length revealed a discrepancy; despite the model card noting 128K, the OpenRouter description caps it at 32K as clarified by a member, “its 32k its capped by the provider.”

Links mentioned:

de (li): no description found
Pricing: Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
cognitivecomputations/dolphin-2.9.1-mixtral-1x22b · Hugging Face: no description found
Tweet from Alex Albert (@alexalbert__): Claude 3.5 Sonnet is now available to @AnthropicAI devs everywhere. It's our best model yet - smarter than Claude 3 Opus and twice as fast. And it costs just $3 per million input tokens and $15 ...
DeepSeek-Coder-V2 – Run with a Standardized API: DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model. It is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. The origina...

Jun 18, 2024 Gemini launches context caching... or does it?

Wed, Jun 19, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Midnight Rose 70b price drop: The price for sophosympatheia/midnight-rose-70b has seen a significant decrease. It is now available at $0.8 per million tokens, marking a 90% price drop.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

mka79: Is it from OR team?

OpenRouter (Alex Atallah) ▷ #general (60 messages🔥🔥):

Hang tight, updates are imminent: The OpenRouter community expressed impatience over the lack of updates, with assurance from Alex Atallah that updates are coming soon. “it’s coming!”
Understanding OpenRouter: New users enquired about the purpose and use of OpenRouter, with responses explaining it focuses on prioritizing price or performance and features a standardized API to switch between models and providers easily. The explanation was complemented with a link to the principles page for more information.
Provider Uptime and Reliability: Queries were raised about the reliability and uptime of the service when switching between providers, with reassurance given that uptime is the collective uptime of all providers and any issues are communicated through a notification system. An example uptime link for Dolphin Mixtral was shared here.
Prompt Bug Fixes and System Tweaks: Issues such as Claude’s “self-moderated” function and the API key visibility problem were swiftly addressed by the team. “Ah just pushed a tweak, fixing”, “working on it”, highlighting proactive maintenance and user support.
Model Updates and Latency Concerns: Members mentioned specific updates and performance concerns, such as renaming DeepSeek coder to DeepSeek-Coder-V2 and DeepInfra’s Qwen 2 latency instability. This showcases the community’s active involvement in monitoring and improving service quality.

Links mentioned:

Principles | OpenRouter: Core concepts for model selection
Dolphin 2.9.2 Mixtral 8x22B 🐬 – Uptime and Availability: Uptime statistics for Dolphin 2.9.2 Mixtral 8x22B 🐬 across providers - Dolphin 2.9 is designed for instruction following, conversational, and coding. This model is a finetune of [Mixtral 8x22B Instru...

OpenRouter (Alex Atallah) ▷ #일반 (1 messages):

sigridjin.eth: 와 안녕하세요.

Jun 18, 2024 Is this... OpenQ*?

Tue, Jun 18, 2024

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Introducing GPT Notes app: A member showcased a hybrid application combining an LLM client and notes app, allowing users to dynamically include/exclude notes into the LLM’s context. The project, built without using any JS libraries, offers features like import/export, basic markdown, and responses management.
No mobile support, pure vanilla JS: Despite lacking mobile support, the app boasts of no reliance on libraries, purely built with vanilla JavaScript. It includes functionalities like storing API keys, history, and notes locally in the browser.
Explore the app on Codepen: The member provided a Codepen link for the project and a deployed fullscreen app. The application serves as an example for anyone looking for a similar tool.

Link mentioned: GPNotes: no description found

OpenRouter (Alex Atallah) ▷ #general (68 messages🔥🔥):

OpenRouter Errors without User Messages Sparking Debate: Users discussed the issue of OpenRouter returning errors if no user message is found, noting that some models require at least a user message as an opener, and even starting with an assistant is not supported by every model due to their instruct-tuned format. A suggested workaround was using the prompt parameter instead of messages (OpenRouter Docs).
Document Formatting and Uploading Puzzles Users: A user inquired about services for formatting text into structured “papers,” leading to a broader discussion on document formatting and uploading. The conversation highlighted the complexity of making PDFs LLM-friendly, with suggestions to preprocess PDFs using tools like PDF.js and Jina AI Reader.
Qwen2’s Censorship Criticized: Users shared their experiences with the Qwen2 model, labeling it as overly censored despite jailbreak attempts, evidenced by implausibly positive narrative outcomes. Alternative, less-censored models like Dolphin Qwen 2 were recommended.
Gemini Flash’s Context Limit Debate: A discrepancy in Gemini Flash’s token generation limits prompted questions, with OR listing 22k tokens while Gemini Docs claimed 8k. It was clarified that OR counts characters to match Vertex AI’s pricing model (OpenRouter Status).
Rate Limits and Model Configuration Questions Arise: Users inquired about rate limits for models like GPT-4o and Opus, leading to guidance on checking rate limits via API keys (OpenRouter Rate Limits). Also, discussions about maximizing model performance and configuration settings like “Sonnet from OR vs Sonnet with Claude key” and “LiteLLM vs OR Routing” unfolded, emphasizing custom retry options and API call efficiency.

Links mentioned:

Google: Gemini Flash 1.5 (preview) – Provider Status and Load Balancing: See provider status and make a load-balanced request to Google: Gemini Flash 1.5 (preview) - Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual u...
Limits | OpenRouter: Set limits on model usage
Reader API: Read URLs or search the web, get better grounding for LLMs.
Transforms | OpenRouter: Transform data for model consumption

OpenRouter (Alex Atallah) ▷ #일반 (1 messages):

is.maywell: <:a6adc388ea504e89751ecbbd50919d3a:1240669253699637339>

Jun 14, 2024 Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata

Sat, Jun 15, 2024

OpenRouter (Alex Atallah) ▷ #general (35 messages🔥):

Debate on Cogvlm2 Hosting: Members discussed the potential and cost-effectiveness of hosting cogvlm2 on OpenRouter, noting a lack of clarity on its availability.
Moderation Options for Google Gemini Pro: A user was able to pass arguments through OpenRouter to control moderation options in Google Gemini Pro but reported receiving an error message and suggested that OpenRouter needs to enable these settings. They highlighted Google’s instructions for billing and access.
Discounted AI Studio Pricing Query: A user inquired if the discounted pricing for Gemini 1.5 Pro and 1.5 Flash in AI Studio could be applied on OpenRouter, noting AI Studio’s token-based system is more convenient than Vertex.
Excitement Over NVIDIA’s Opened Assets: Members shared enthusiasm over NVIDIA opening up models, RMs, and data, specifically mentioning the Nemotron-4-340B-Instruct and the Llama3-70B variants. One member pointed out the availability of PPO techniques with these models as a treasure trove.
Discussion on June-Chatbot’s Origin: There was speculation about whether the “june-chatbot” in LMSYS was trained by NVIDIA with some members pointing to the 70B SteerLM model as a possibility.

Links mentioned:

nvidia/Nemotron-4-340B-Instruct · Hugging Face: no description found
nvidia/Llama3-70B-PPO-Chat · Hugging Face: no description found
nvidia/Llama3-70B-SteerLM-Chat · Hugging Face: no description found

Jun 13, 2024 Hybrid SSM/Transformers > Pure SSMs/Pure Transformers

Fri, Jun 14, 2024

OpenRouter (Alex Atallah) ▷ #general (16 messages🔥):

OpenRouter UI clamps unsupported parameters: When user questioned about supporting Temp > 1 and Min P for Command R in OpenRouter, Alex Atallah clarified that while the UI supports it, parameters like temp will be clamped down to temp=1, and Min P will not be passed.
High response times for Mistral 7B models: A user observed high response times for all Mistral 7B variants and linked it to changes in context length and possible rerouting of models. The discussion also pointed out a context length adjustment and continuous disruptions indicated by a model uptime tracker.
Employment offer: A user introduced themselves as a senior full stack & blockchain developer, stating they have enough experience and are seeking job opportunities.
Request for vision models: Another user inquired about plans to add more vision models, like cogvlm2, for better dataset captioning capabilities.

Links mentioned:

OpenRouter: LLM router and marketplace
OpenRouter API Watcher: Explore OpenRouter's model list and recorded changes. Updates every hour.
Mistral: Mistral 7B Instruct (nitro) – Uptime and Availability: Uptime statistics for Mistral: Mistral 7B Instruct (nitro) across providers - A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. Note: this is...

Jun 12, 2024 The Last Hurrah of Stable Diffusion?

Thu, Jun 13, 2024

OpenRouter (Alex Atallah) ▷ #general (27 messages🔥):

DeepEval integrates custom LLM metrics effortlessly: In DeepEval, users can define custom evaluation metrics for LLMs, including metrics like G-Eval, Summarization, Faithfulness, and Hallucination. For more details on implementing these metrics, check the documentation.
Uncensored models offer unfiltered responses: Users discussed the purpose of new uncensored models that lack filters or alignment biases, with many preferring these for their transparent responses and multiple use cases.
WizardLM-2 pricing sparks curiosity: Questions arose about how WizardLM-2 operates profitably at only $0.65 per million tokens. A member explained these models may use fewer active parameters to save costs, and renting GPUs might help with cost management.
Self-hosting vs. using providers for LLMs: Discussions highlighted the challenges and costs associated with self-hosting models compared to using service providers like OpenRouter. It was noted that self-hosting might not be economically viable unless there is consistent high demand or existing hardware investments.
Batch inference could justify GPU rental: Members debated the effectiveness of renting GPUs for heavy batch inference tasks, noting that it could be cost-effective compared to single-request scenarios. A practical suggestion was to use tools like Aphrodite-engine / vllm to optimize large batch processing.

Link mentioned: Metrics | DeepEval - The Open-Source LLM Evaluation Framework: Quick Summary

Jun 11, 2024 Francois Chollet launches $1m ARC Prize

Wed, Jun 12, 2024

OpenRouter (Alex Atallah) ▷ #general (11 messages🔥):

OpenRouter Uses Edge Networks for Speed: Users inquired about the location of OpenRouter servers and whether latency is an issue. The response clarified that OpenRouter leverages both Vercel Edge and Cloudflare Edge, ensuring nodes are close to the user to minimize latency.
Provider Selection Feature Queued: A user asked if it is possible to select the API provider in the OpenRouter playground. The response confirmed that this feature is in the queue, hinting at future availability.
Direct API Provider Selection Available: Another user noted that selecting the provider is not currently available in OpenRouter. However, it was pointed out that provider selection can be done via the API, with detailed instructions available in the OpenRouter documentation.

Link mentioned: Provider Routing | OpenRouter: Route requests across multiple providers

Jun 11, 2024 Talaria: Apple's new MLOps Superweapon

Tue, Jun 11, 2024

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Qwen 2 72B Instruct goes live: The Qwen 2 72B Instruct model is now available, as announced by OpenRouter.
Dolphin 2.9.2 Mixtral 8x22B launched as experiment: Dolphin 2.9.2 Mixtral 8x22B is now available for $1/M tokens, with the condition that it requires an average usage of 175 million tokens per day over the next week to avoid discontinuation. Users are recommended to use this model with a fallback to *ensure optimal uptime*.
StarCoder2 15B Instruct release: The StarCoder2 15B Instruct model is now available for use.

Links mentioned:

StarCoder2 15B Instruct by bigcode): StarCoder2 15B Instruct excels in coding-related tasks, primarily in Python. It is the first self-aligned open-source LLM developed by BigCode. This model was fine-tuned without any human annotations …
Qwen 2 72B Instruct by qwen): Qwen2 72B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and gro…
Dolphin 2.9.2 Mixtral 8x22B 🐬 by cognitivecomputations): Dolphin 2.9 is designed for instruction following, conversational, and coding. This model is a finetune of Mixtral 8x22B Instruct. It features a 64k context…

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

AI Code Brushes plugin showcased: A member shared a free AI code transformation plugin for Visual Studio Code using OpenRouter and Google Gemini. Check it out here.
AI Code Brushes compatibility discussed: Members discussed the compatibility of the AI Code Brushes plugin, highlighting that while any model works, the most popular models in the Programming/Scripting category tend to perform best. Explore the rankings here.

Links mentioned:

AI Code Brushes - Visual Studio Marketplace: Extension for Visual Studio Code - Supercharge Your Coding with AI
LLM Rankings: programming/scripting | OpenRouter: Language models ranked and analyzed by usage for programming/scripting prompts

OpenRouter (Alex Atallah) ▷ #general (75 messages🔥🔥):

Google and Apple Pay Payment Integration: Members discussed integrating Google Pay and Apple Pay into the payment system, with notes on their availability via mobile. A suggestion to add a crypto payment option for those preferring not to use apps was also discussed.
Handling Partial JSON in API Calls: Users shared challenges with receiving partial chunks while streaming OpenRouter chat completions and discussed solutions like maintaining a buffer for chunked data. One user referenced this article for more insights on handling chunked data.
Role Play Prompt Issues: Members exchanged tips on how to prevent chatbots from speaking as the user and recommended using detailed instructions in prompts to ensure better responses. A helpful guide was shared, Statuo’s Guide to Getting More Out of Your Bot Chats.
Language Support Discussion: There was a request and subsequent acknowledgment for adding a language category to evaluate models by language proficiency. Users anticipate better categorization for languages like Czech, French, Mandarin, etc.
Censorship and Bias in LLMs: An article, “An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct” was discussed, comparing Chinese and US LLM censorship approaches, with debates on the implications of these biases.

Links mentioned:

Stream response from /v1/chat/completions endpoint is missing the first token: This is partially off-topic, but I stumbled upon this thread by chance and noticed some potential problems with your code that I thought to point out since there’s so little information about this and…
Swift: Streaming OpenAI API Response (Chunked Encoding Transfer): Streaming Chunked encoded data is useful when large amounts of data is expected to receive from an URLRequest. For example, when we request…
Statuo’s Guide to Getting More Out of Your Bot Chats: Statuo’s Guide to Getting More Out of Your Bot Chats Introduction Bot Making General Tips and Guidelines Trash in, Trash Out First Person? Third? Second? My bot isn’t consistent and it’s forgetting th…

OpenRouter (Alex Atallah) ▷ #일반 (1 messages):

daun.ai: 오! 반가운 소식이네요 ㅎㅎ

Jun 07, 2024 HippoRAG: First, do know(ledge) Graph

Sat, Jun 8, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Qwen 2 72B Instruct Model Launched: The Qwen 2 72B Instruct model is now available. This release marks a significant addition to OpenRouter’s offerings for 2023-2024.

Link mentioned: Qwen 2 72B Instruct by qwen: Qwen2 72B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and gro…

OpenRouter (Alex Atallah) ▷ #general (36 messages🔥):

Widespread 504 Gateway Timeout Errors: Many users experienced issues with 504 gateway timeouts when attempting to use the server, specifically with the Llama 3 70B model. Alex Atallah confirmed and noted that the database strain coincided with the 504 errors and is moving jobs to a read replica to mitigate this issue.
Issues with WizardLM-2 8X22 Responses: Users reported that WizardLM-2 8X22 is generating unintelligible responses when routed through DeepInfra. Alex Atallah suggested excluding problematic providers using the order field in request routing, and further discussions revealed that other providers besides DeepInfra might also be causing issues.
Discussion on Routing Control and Model Provider: Asperfdd raised a concern about the inability to control routing options while using certain services like Chub Venus, looking for updates on resolving these provider issues. Discussions also hinted at an internal endpoint deployment for troubleshooting.
Debate on AI Security: A heated discussion occurred on the merit of AI security and the recent firing of Leopold Aschenbrenner from OAI. Opinions varied significantly, with some dismissing AI security concerns and others criticizing Aschenbrenner’s stance on the subject.
ChatGPT Performance During Peak Load: User pilotgfx speculated that ChatGPT might be performing worse during peak load hours, suspecting some form of performance quantization due to high user volume. Another user agreed but generalized that “3.5 is very dumb in general nowadays”.

Link mentioned: Provider Routing | OpenRouter: Route requests across multiple providers

OpenRouter (Alex Atallah) ▷ #일반 (1 messages):

voidnewbie: Qwen2도 한국어를 지원해요!

Jun 06, 2024 Qwen 2 beats Llama 3 (and we don't know how)

Fri, Jun 7, 2024

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Pilot Bot revolutionizes server management: Thanks to OpenRouter, a new Discord bot named Pilot helps server owners grow and manage their communities with ease. Pilot offers features like “Ask Pilot,” which understands the server and provides intelligent insights, “Catch Me Up,” which summarizes unread messages, and “Health Check,” providing weekly activity analyses.
Pilot Bot is free and easy to access: The bot is completely free to use and can be invited to servers via their website. This makes server management accessible and efficient for all server owners.
Visual guide available: Users can view screenshots to see Pilot in action and explore its various features like “Ask Pilot” for intelligent advising and “Catch Me Up” for staying updated.

Link mentioned: Pilot - The co-owner for your Discord server.: Pilot takes the work out of running a server. Get AI-enhanced advice, insights, and more to help you grow and manage your community.

OpenRouter (Alex Atallah) ▷ #general (40 messages🔥):

WizardLM 8x22b Faces Competition From Dolphin 8x22: There’s enthusiasm around WizardLM 8x22b, touted as the best model for role-playing. However, another member mentioned they’ve heard about Dolphin 8x22 as a potential competitor but haven’t tested it yet.
Query on Gemini Flash and Image Output Capabilities: A member asked if the Gemini Flash model can output images. Responses clarified that no LLM currently allows for direct image output, although it’s theoretically possible using base64 encoding or external function calls to image generators like Stable Diffusion.
Assistant Model Recommendations for Function Calls: A member sought recommendations for a model adept at handling function calls and specific formatting. Instructor was suggested as a suitable tool for their needs.
Insights on OpenRouter’s Free Model Limits: Members discussed the limits on message requests for free models, with references to OpenRouter’s documentation. There were also mentions of models like Llama 3 8B (free) and Mistral having reliability issues.
Assistant Prefill Support Confirmation: A member inquired if OpenRouter supports assistant prefill, especially via a reverse proxy. Alex Atallah confirmed that it’s supported as long as you end with an assistant message and can send the required prompt or chatml array.

Links mentioned:

Perplexity Labs: no description found
Limits | OpenRouter: Set limits on model usage

OpenRouter (Alex Atallah) ▷ #일반 (1 messages):

voidnewbie: GLM-4가 한국어를 지원해서 기대됩니다

Jun 06, 2024 5 small news items

Thu, Jun 6, 2024

OpenRouter (Alex Atallah) ▷ #general (13 messages🔥):

Compatibility of Rope Scaling with OpenRouter: A member inquired if rope scaling is compatible with OpenRouter or if another tool like LMStudio is required. The suggestion was to run it locally due to potential GPU limitations.
Codestral not the top choice for code specialization: A member queried about experimenting with Codestral. Another member mentioned there are better code specialist models available that are more efficient in size and performance, specifically recommending the model in channel <#1230206720052297888>.
OpenRouter experiencing 502 Bad Gateway errors: Multiple users discussed encountering 502 Bad Gateway errors from Cloudflare while mass generating synthetic data across various models. One member confirmed it wasn’t due to surge limits and identified that the issue was with the formatting of content in messages, which has since been resolved.
List of models used during error occurrence: The list of models involved during the error included a diverse set from Nous Research, Mistral, Cognitive Computations, Microsoft, and Meta-Llama. The issue was not with the number of requests but with specific content formatting in the messages.

Jun 04, 2024 Not much happened today

Wed, Jun 5, 2024

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

merfippio: I know this is really late, but did you find the FE help you were looking for?

OpenRouter (Alex Atallah) ▷ #general (106 messages🔥🔥):

Missing Credits after ETH Payment Issue: A user reported that their credits weren’t showing up after paying with ETH on Base. Another member suggested waiting 20 minutes to an hour before raising a complaint if the credits still do not appear.
Prefill Handling by LLMs: A user queried about LLM’s handling of prefill text and whether it would generate subsequent paragraphs the same way it would if done continuously. The consensus was that prefill is handled seamlessly, as if it were part of the original prompt.
GPT-3.5 Turbo Issues and Moderation: A user reported issues with GPT-3.5 Turbo not working for them while other OpenRouter LLMs were fine, leading to a discussion on possible API moderation affecting requests. OpenRouter confirmed that OpenAI requires all requests to be moderated using their moderation API.
Mistral Model Reliability Issues: Users reported consistently getting empty responses with Mistral: Mixtral 8x22B Instruct on Fireworks, suggesting a potential issue with the provider. OpenRouter’s admin suggested setting DeepInfra as a preferred provider and referred to load balancing documentation for manual provider whitelisting.
Best Models for Storytelling: A discussion took place on the best models for storytelling, with recommendations including roleplay-specific models from OpenRouter’s rankings and Wizardlm2.

Links mentioned:

Provider Routing | OpenRouter: Route requests across multiple providers
LLM Rankings: roleplay | OpenRouter: Language models ranked and analyzed by usage for roleplay prompts

Jun 03, 2024 Mamba-2: State Space Duality

Tue, Jun 4, 2024

OpenRouter (Alex Atallah) ▷ #announcements (13 messages🔥):

Database Timeouts in Asia Regions: Members reported experiencing database timeouts in regions such as Seoul, Mumbai, Tokyo, and Singapore. OpenRouter pushed a fix to resolve the issue but rollback previous latency improvements due to these problems.
API 504 Errors While Database Times Out: Some users encountered 504 errors with the API while the playground remained functional. Switching to an EU VPN temporarily resolved the issue for some users.
Fix Deployment and Apologies: The OpenRouter team noted the database was down intermittently for about 4 hours, mostly affecting non-US regions. A fix for the issue has since been deployed and verified to be working by users.
Decommissioning of Models: OpenRouter is decommissioning Llava 13B and Nous: Hermes 2 Vision 7B (alpha) due to low usage and high costs. They suggest alternatives like FireLlava 13B and LLaVA v1.6 34B.

Links mentioned:

Playground | OpenRouter: Experiment with different models and prompts
LLaVA 13B by liuhaotian | OpenRouter: LLaVA is a large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking [GPT-4](/models/open...
Nous: Hermes 2 Vision 7B (alpha) by nousresearch | OpenRouter: This vision-language model builds on innovations from the popular [OpenHermes-2.5](/models/teknium/openhermes-2.5-mistral-7b) model, by Teknium. It adds vision support, and is trained on a custom data...
FireLLaVA 13B by fireworks | OpenRouter: A blazing fast vision-language model, FireLLaVA quickly understands both text and images. It achieves impressive chat skills in tests, and was designed to mimic multimodal GPT-4. The first commercial...
LLaVA v1.6 34B by liuhaotian | OpenRouter: LLaVA Yi 34B is an open-source model trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: [Nou...

OpenRouter (Alex Atallah) ▷ #general (112 messages🔥🔥):

Connection Issues and Outages: Many users reported 504 errors and gateway timeouts while trying to connect to the API. Admins acknowledged ongoing issues with their database provider and promised to resolve them soon.
Regional Variability in API Functionality: Users located in Germany and the US noted that the OpenRouter API was functioning fine, while users in Southeast Asia and other regions continued experiencing issues.
OpenRouter Credits and Payments Confusion: A user reported an issue with OpenRouter credits after paying with a different wallet. The problem was resolved by realizing the credits were attributed to the initial wallet logged in.
Request for Enhanced Uptime Monitoring: Users like cupidbot.ai suggested adding provider-specific uptime statistics to the uptime chart to hold providers accountable for service reliability.
Questions about Model Performance and Configuration: Multiple users raised questions about the addition of new LLMs, rate limits on specific models like Gemini-1.5-Pro, and the quantization levels offered by providers.

Links mentioned:

Vertex AI Agent Builder: Build, test, deploy, and monitor enterprise ready generative AI agents and applications
Playground | OpenRouter: Experiment with different models and prompts

OpenRouter (Alex Atallah) ▷ #소개 (1 messages):

Welcome to OpenRouter: Members are introduced to OpenRouter, a platform with hundreds of language models available from numerous providers. Users can prioritize either price or performance for the lowest cost and optimal latency/throughput.
Standardized API eases model transitions: OpenRouter’s standardized API allows seamless switching between models or providers without code changes. This feature ensures that users can easily choose and pay for the best model.
Model popularity reflects real-world usage: Instead of relying solely on benchmarks, OpenRouter evaluates models based on how frequently and effectively they are used in real-world scenarios. Users can view these comparisons on the rankings page.
Experiment with multiple models: The OpenRouter Playground allows users to chat simultaneously with various models, facilitating a hands-on evaluation. Access it here.

Links mentioned:

LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps
Playground | OpenRouter: Experiment with different models and prompts

OpenRouter (Alex Atallah) ▷ #일반 (1 messages):

lemmyle: 첫 번째

OpenRouter (Alex Atallah) ▷ #紹介 (1 messages):

Welcome to OpenRouter: Users are encouraged to prioritize either price or performance when selecting from hundreds of language models and their respective dozens of providers. OpenRouter offers the lowest prices and optimal latency/throughput from numerous providers, allowing users to choose based on their priorities.
Standardized API benefits: With a standardized API, users can switch models or providers without the need to change their existing code. They also have the option to directly select and pay for the models they use.
Model usage as a benchmark: Instead of relying solely on traditional benchmarks, OpenRouter compares models based on usage frequency and application types. This data is available at OpenRouter Rankings.
Playground for model comparison: Users are invited to the OpenRouter Playground, where they can chat with multiple models simultaneously. This hands-on approach helps in making informed decisions about the best model for specific needs.

Links mentioned:

LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps
Playground | OpenRouter: Experiment with different models and prompts

OpenRouter (Alex Atallah) ▷ #一般 (1 messages):

lemmyle: 初め

May 31, 2024 Ways to use Anthropic's Tool Use GA

Sat, Jun 1, 2024

OpenRouter (Alex Atallah) ▷ #announcements (7 messages):

OpenRouter enhances API performance: Major scalability improvements have reduced latency by at least ~200ms globally, with significant gains for Africa, Asia, Australia, and South America. “By pushing more of our user data closer to the edge, we shaved off at least ~200ms from every single request.”
Monitor model uptime with new charts: OpenRouter introduced uptime charts to visualize the benefits of their provider load balancing, like the one on WizardLM-2 8x22b. This feature helps avoid impacts from sporadic upstream outages.
Early preview of Category Rankings available: Users can see how different models rank across various categories on openrouter.ai/rankings. Notable insights include MythoMax’s dominance in roleplay and GPT-4o leading in programming.
Laravel developers get a new package: moe-mizrak/laravel-openrouter was announced to help Laravel developers integrate with OpenRouter.
DB issues cause API disruptions but are resolved: An internal error with the DB cache led to API calls returning 504 or 500 errors. The problem primarily affected the India (bom1) and Singapore (sin1) regions but was resolved by adding a fallback direct DB lookup, as reported by “UPDATE: The fix is now up, and our 1-hour uptime chart is recovering.”

Link mentioned: WizardLM-2 8x22B by microsoft | OpenRouter: WizardLM-2 8x22B is Microsoft AI’s most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing …

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

MixMyAI.com offers pay-as-you-go AI services: Introducing mixmyai.com as a comprehensive solution for all AI needs without monthly fees to different vendors. The platform combines both closed and open-source models under one roof, providing the most affordable pricing options.
MixMyAI emphasizes user privacy: The service prioritizes privacy by not storing any chats on servers and offers a transparent dashboard to track spending. It also ensures models are always current by retiring old ones.
User-friendly and powerful UI: MixMyAI boasts a powerful user interface that allows users to search chat history, save prompts, and tweak LLM settings. The platform emphasizes ease of use and accessibility.

OpenRouter (Alex Atallah) ▷ #general (97 messages🔥🔥):

Latent Space podcast welcomes fans: The first in-depth interview on MosaicML MPT-7B discusses overcoming GPT-3 limitations in context length. They are also hosting meetups and inviting beta testers for an upcoming AI course.
OpenRouter Ruby library released: Obie Fernandez announces the release of the OpenRouter Ruby client library. He also mentions maintaining Ruby AI eXtensions for Rails, a library dependent on OpenRouter.
API 504 issues rampant across regions: Numerous users report encountering 504 errors with specific models like “Mixtral 8x7B Instruct” and “llama-3-70b-instruct”, spanning multiple global locations including India, Vietnam, and Poland. A temporary fix was applied, but stability remains inconsistent.
Category rankings feedback and updates: Discussions focused on improving category rankings data, with users suggesting new categories and highlighting the need to evaluate models based on common use cases and “quality”. Alex Atallah and others affirmed that more detailed rankings and additional categories are forthcoming.
Discussion on health check endpoints: Users requested a health check API to monitor OpenRouter’s status and take actions accordingly. Alex Atallah suggested using the model pages for health checks until a dedicated endpoint is available.

Links mentioned:

MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML: Ep 13: Training Mosaic's "llongboi" MPT-7B in 9 days for $200k with an empty logbook, how to prep good data for your training, and the future of open models
GitHub - OlympiaAI/open_router: Ruby library for OpenRouter API: Ruby library for OpenRouter API. Contribute to OlympiaAI/open_router development by creating an account on GitHub.
GitHub - OlympiaAI/raix-rails: Ruby AI eXtensions for Rails: Ruby AI eXtensions for Rails. Contribute to OlympiaAI/raix-rails development by creating an account on GitHub.

May 31, 2024 Contextual Position Encoding (CoPE)

Fri, May 31, 2024

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

Free tier mystery remains unsolved: A member expressed curiosity about how another user managed to get a free tier for a service. The conversation appears unresolved with no further details provided.
MixMyAI launch announcement excites users: A comprehensive introduction was given for mixmyai.com, touted as a “one-stop solution for all AI needs”. Key features include no monthly fees, cheapest pricing, privacy-focused operations, a powerful UI, and support for multiple AI models.

OpenRouter (Alex Atallah) ▷ #general (45 messages🔥):

Developer Seeking Opportunities: A user introduced themselves as a senior full stack, blockchain, and AI developer with experience in developing websites, dApps, and AI projects, asking if anyone is looking for a dev.
User Struggles with Free Models: A user named best_codes reported issues with free models not working and asked for help. The situation seemed resolved later as they confirmed the models were working fine for them now.
Gemini 1.5 Pro Ratelimit Clarified: A user asked about the ratelimit for Gemini 1.5 Pro, and another clarified that although the default in the docs is 15 RPM, they managed to negotiate a higher limit recently, suggesting possibility for custom account limits.
Moderated vs. Self-Moderated Models: A discussion clarified that self-moderated models have no external moderation, whereas moderated models use an external moderator model on the endpoint to filter inputs before processing. This applies mainly to Claude on OpenRouter.
Laravel and Ruby Packages Announcement: Two developers announced packages for integrating OpenRouter into Laravel and Ruby projects, respectively, and sought support and contributions from the community, sharing GitHub links for laravel-openrouter and open_router.

Links mentioned:

GitHub - moe-mizrak/laravel-openrouter: Laravel package for OpenRouter (A unified interface for LLMs): Laravel package for OpenRouter (A unified interface for LLMs) - moe-mizrak/laravel-openrouter
MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML: Ep 13: Training Mosaic's "llongboi" MPT-7B in 9 days for $200k with an empty logbook, how to prep good data for your training, and the future of open models
GitHub - OlympiaAI/open_router: Ruby library for OpenRouter API: Ruby library for OpenRouter API. Contribute to OlympiaAI/open_router development by creating an account on GitHub.
GitHub - OlympiaAI/raix-rails: Ruby AI eXtensions for Rails: Ruby AI eXtensions for Rails. Contribute to OlympiaAI/raix-rails development by creating an account on GitHub.
Docs | OpenRouter: Build model-agnostic AI apps

May 29, 2024 1 TRILLION token context, real time, on device?

Thu, May 30, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

OpenAI faces temporary downtime: “OpenAI usage is temporarily down for many users,” with Azure and Azure fallback still operational. The issue was resolved quickly with an update: “EDIT: it’s back.”
Cinematika model to be deprecated: The Cinematika model is being discontinued due to very low usage. Users are advised to migrate to a new model immediately: “Please switch to a new one!”

OpenRouter (Alex Atallah) ▷ #general (51 messages🔥):

OpenAI Models Hit Spending Limit: Members discussed issues with OpenAI models being inaccessible due to hitting spending limits unexpectedly. Alex Atallah promised an announcement and a fix, mentioning that normal OpenAI usage was restored with additional checks being put in place.
Prompting Gemini Models Request: A member asked for a guide on prompting Gemini models but did not receive a response. This request indicates ongoing interest and potential areas for user support or documentation.
Media Attachments Policy: Cupidbot.ai inquired about the restriction on sending media. Alex Atallah explained that media was restricted to a specific channel to control spam and promised to allow elevated roles to post attachments, with Louisgv agreeing to the change.
GPT-4o Context and Token Limits: A concern was raised about the GPT-4o context limit being reduced to 4096 tokens. Alex Atallah clarified that the context limit is 128k, with a maximum of 4096 output tokens.
Slow Image Processing with GPT-4o: A user reported slow image processing while using openai/gpt-4o with image-url input, taking minutes per prompt. This points to potential performance issues needing attention.

Links mentioned:

Oh No Homer GIF - Oh No Homer Simpsons - Discover & Share GIFs: Click to view the GIF
Streamlit: no description found
lluminous: no description found

May 29, 2024 Somebody give Andrej some H100s already

Wed, May 29, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

New Models Released: Announcing the launch of Mistral 7B Instruct v0.3 and Hermes 2 Pro - Llama-3 8B. The Mistral 7B Instruct and its free variant now point to the latest v0.3 version.
Versioned Model Access: Older versions like Mistral 7B Instruct v0.2 and v0.1 remain accessible.
OpenAI Outage Resolved Quickly: There was a brief outage affecting OpenAI usage. However, they swiftly resolved it, with Azure and its fallback remaining operational during the downtime.

Links mentioned:

Mistral: Mistral 7B Instruct by mistralai | OpenRouter: A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
NousResearch: Hermes 2 Pro - Llama-3 8B by nousresearch | OpenRouter: Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mod...
Mistral: Mistral 7B Instruct by mistralai | OpenRouter: A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
Mistral: Mistral 7B Instruct v0.2 by mistralai | OpenRouter: A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. An improved version of [Mistral 7B Instruct](/modelsmistralai/mistral-7b-instruct-v0.1), wi...
Mistral: Mistral 7B Instruct v0.1 by mistralai | OpenRouter: A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Inquiry about models on Max Loh’s website: A member asked which models are being used on Max Loh’s website. They also inquired if anyone knows how to find a list of all the uncensored models available on OpenRouter.

OpenRouter (Alex Atallah) ▷ #general (122 messages🔥🔥):

Debate Over Phi-3 Vision Costs and Availability: Discussion revolves around the high cost of using Phi-3 Vision on Azure, with a member suggesting that “looking at llama prices I’d hit $0.07/M”. Another member counters, pointing out that other providers also charge similarly.
Gemini’s Superior OCR Capabilities: Members discuss the OCR capabilities of Gemini, with claims that it “can read Cyrillic text pretty well” and is “better than Claude and GPT-4o” in reading both Cyrillic and English texts.
Langchain and Streamlit for Python Chatbots: Inquiries were made about suitable templates for building a Flask-based chatbot. Recommendations included checking out Streamlit templates and Langchain, with emphasis on easy integrations and the possibility of using database adapters.
OpenRouter Token Costs Clarifications: Participants debated the costs involved with OpenRouter tokens, clarifying that $0.26 buys 1M input + output tokens and discussing how token count affects pricing. Fry69_61685 emphasizes that each chat interaction recounts the entire history, increasing token usage.
Handling OpenAI Model Outages: An outage affected OpenAI GPT-4o, causing interruptions in service. Alex Atallah reassured users by confirming the issue was fixed quickly and promising better checks in the future.

Links mentioned:

Welcome To Instructor - Instructor: no description found
The Tokenizer Playground - a Hugging Face Space by Xenova: no description found
MythoMax 13B by gryphe | OpenRouter: One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge

May 28, 2024 Life after DPO (RewardBench)

Tue, May 28, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

New AI Model Alert: Phi-3 Medium 128k Instruct: OpenRouter announced the release of Phi-3 Medium 128k Instruct model. Users can check out the standard variant and the free variant, and join the discussion here to share their feedback on its performance and applicability.

Links mentioned:

Phi-3 Medium Instruct by microsoft | OpenRouter: Phi-3 Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjust...
Phi-3 Medium Instruct by microsoft | OpenRouter: Phi-3 Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjust...

OpenRouter (Alex Atallah) ▷ #general (41 messages🔥):

Wizard Model Shows Improved Performance: Members noticed that wizard model responses have become significantly better, with reduced wait times and more creative answers. “You still need to babysit it to avoid paragraph repetition, but otherwise, it was quite good,” highlighted one user.
Phi-3 Vision Gains Interest: Discussions led to the hype around Phi-3 Vision’s capabilities, with users sharing test links like Phi-3 Vision and mentioning its potential when combined with other models. Another model, CogVLM2, was recommended for vision tasks at CogVLM-CogAgent on Hugging Face.
Llama 3 Model Prompt Formatting Clarified: Members clarified that prompts for Llama 3 models get automatically transformed by OpenRouter’s API, eliminating the need for manual formatting. Manual prompt submission is an option, using the prompt parameters and the completions endpoint instead of chat/completions.
Llama 3 Parameter Update: Optimal parameters for Llama 3 models are being updated soon due to a recently fixed bug. This update will be pushed within approximately 48 hours, according to a team response.
Google’s Gemini API Issues and Limits: Users expressed frustration over Gemini FLASH returning blank outputs despite high token usage. It’s confirmed as a model-side issue, and the discussion highlighted Google’s new daily API usage limits, sparking curiosity about increased OpenRouter Gemini usage.

Links mentioned:

Azure AI Studio: no description found
CogVLM - a Hugging Face Space by THUDM: no description found

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Phi-3 Medium 128k Instruct Released: The new model microsoft/phi-3-medium-128k-instruct is available in both standard and free variants. Users are encouraged to check the discussion thread to share feedback on its performance.
Announcement on X Platform: @OpenRouterAI announced the new free model Phi-3 Medium with both a standard and free variant.
New Model and Price Reduction: Microsoft’s phi-3-mini-128k-instruct model is now available. Additionally, the llama-3-lumimaid-70b model has a significant 57% price cut.

Links mentioned:

Tweet from OpenRouter (@OpenRouterAI): New free model: Phi 3 Medium 🧠 microsoft/phi-3-medium-128k-instruct with a standard & free variant
Phi-3 Medium Instruct by microsoft | OpenRouter: Phi-3 Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjust...
Phi-3 Medium Instruct by microsoft | OpenRouter: Phi-3 Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjust...
Phi-3 Mini Instruct by microsoft | OpenRouter: Phi-3 Mini is a powerful 3.8B parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, i...
Llama 3 Lumimaid 70B by neversleep | OpenRouter: The NeverSleep team is back, with a Llama 3 70B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necessa...

OpenRouter (Alex Atallah) ▷ #general (260 messages🔥🔥):

Rate Limit Issues Explained with Documentation: Members discussed rate limiting in OpenRouter, including how rate limits scale with credits. A detailed explanation was referenced in the OpenRouter documentation, clarifying that higher credit balances allow for higher request rates.
Modal Fallback Error Handling: A user experienced a rate limit error while using modal fallback functionality despite having sufficient credits. The community suggested checking remaining free requests and potentially omitting the free model when limits are reached.
Claude Self-Moderated Models Usage Decline: Users speculated on the decline in usage of Claude’s self-moderated models, citing increased refusals and more stringent guardrails as potential reasons. Some users noted this change has made the models less human-like and more PR-oriented.
Cost Comparison for AI Model Hosting: Comparisons were made between different hosting solutions like RunPod, Vast.ai, and major cloud providers like Google Cloud and Amazon Bedrock. Users highlighted significantly lower prices for GPU usage on alternative platforms compared to traditional cloud services.
Vision Model Cost and Performance: Discussion on the cost-effectiveness and performance of various vision models, with suggestions to evaluate Gemini and its OCR capabilities for specific tasks. The community noted favorable performance and competitive pricing for Gemini’s vision services.

Links mentioned:

Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found
nvidia/NV-Embed-v1 · Hugging Face: no description found
Dead Space GIF - Dead Space - Discover & Share GIFs: Click to view the GIF
OpenRouter API Watcher: Explore OpenRouter's model list and recorded changes. Updates every hour.
Docs | OpenRouter: Build model-agnostic AI apps

May 24, 2024 Ten Commandments for Deploying Fine-Tuned Models

Sat, May 25, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

New AI Model Alert: Phi-3 Medium 128k Instruct: OpenRouter announced the release of Phi-3 Medium 128k Instruct model. Users can check out the standard variant and the free variant, and join the discussion here to share their feedback on its performance and applicability.

Links mentioned:

Phi-3 Medium Instruct by microsoft | OpenRouter: Phi-3 Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjust...
Phi-3 Medium Instruct by microsoft | OpenRouter: Phi-3 Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjust...

OpenRouter (Alex Atallah) ▷ #general (41 messages🔥):

Wizard Model Shows Improved Performance: Members noticed that wizard model responses have become significantly better, with reduced wait times and more creative answers. “You still need to babysit it to avoid paragraph repetition, but otherwise, it was quite good,” highlighted one user.
Phi-3 Vision Gains Interest: Discussions led to the hype around Phi-3 Vision’s capabilities, with users sharing test links like Phi-3 Vision and mentioning its potential when combined with other models. Another model, CogVLM2, was recommended for vision tasks at CogVLM-CogAgent on Hugging Face.
Llama 3 Model Prompt Formatting Clarified: Members clarified that prompts for Llama 3 models get automatically transformed by OpenRouter’s API, eliminating the need for manual formatting. Manual prompt submission is an option, using the prompt parameters and the completions endpoint instead of chat/completions.
Llama 3 Parameter Update: Optimal parameters for Llama 3 models are being updated soon due to a recently fixed bug. This update will be pushed within approximately 48 hours, according to a team response.
Google’s Gemini API Issues and Limits: Users expressed frustration over Gemini FLASH returning blank outputs despite high token usage. It’s confirmed as a model-side issue, and the discussion highlighted Google’s new daily API usage limits, sparking curiosity about increased OpenRouter Gemini usage.

Links mentioned:

Azure AI Studio: no description found
CogVLM - a Hugging Face Space by THUDM: no description found

May 23, 2024 Clémentine Fourrier on LLM evals

Fri, May 24, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

OpenRouter Adds Anthropic and Gemini Tool Calls: OpenRouter now supports using Anthropic and Gemini models with tools and function calling, and it uses the same syntax as OpenAI. Documentation and examples are available here.
New Features and Enhancements: Quantization levels are now displayed next to providers, and normalized token usage is available in all streaming requests. Full details can be found at response body documentation.
New Model for Roleplay Released: The Lumimaid 70B model, finetuned for roleplay by the NeverSleep team, has been released. More information is available here.
Price Drops Announced: There are significant price drops for several models: nousresearch/nous-hermes-llama2-13b (30%), mancer/weaver (40%), neversleep/noromaid-20b (33%), neversleep/llama-3-lumimaid-8b (10%), and sao10k/fimbulvetr-11b-v2 (31%).
Improved Performance and Upcoming Features: OpenRouter will be routing more traffic to better providers for improved wizard performance, and better quality visibility for providers will be released soon. Load balancing documentation can be found here, with uptime charts coming soon.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Roleplaying App Launches with Generous Free Tier: An AI roleplaying app was built and launched, thanks to OpenRouter, featuring a generous free tier. The creator shared the link RoleplayHub and requested feedback from the community.

Link mentioned: Chat with 100+ AI Characters for free, uncensored and NSFW | Role Play Hub: RoleplayHub offers unlimited characters and Chats with sexy AI characters, our chatbots are designed to provide you with a personalized experience.

OpenRouter (Alex Atallah) ▷ #general (64 messages🔥🔥):

Customer name error resolved by updating information: One member encountered a 400 error when adding money, which was resolved after updating their billing information. The issue was initially unclear but was fixed without further complications.
Streaming responses prematurely closing: Multiple users reported issues with streaming responses prematurely closing and timing out across various models, including Llama-3 and MythoMax. OpenRouter deployed a patch to mitigate these issues and continued monitoring to ensure stability.
Mistral-7B v0.3 model’s mixed reception: Members discussed the release and integration of the Mistral-7B v0.3 model, noting its new vocab/tokenizer. There was confusion about whether to treat this version as a separate model or upgrade the route directly.
Aya research initiative mentioned: Link to Cohere’s Aya research was shared, detailing a multilingual AI model and dataset initiative involving over 3,000 researchers across 119 countries. Cohere’s Aya aims to advance AI for 101 languages through open science.
New Smaug 70b model criticized: A YouTube video titled “New LLaMA 3 Fine-Tuned - Smaug 70b Dominates Benchmarks” was shared, which claimed superior performance over GPT-4. Users criticized the model for poor performance on simple logic tests and multilingual tasks, highlighting ongoing skepticism about such claims.

Links mentioned:

Aya: Cohere’s non-profit research lab, C4AI, released the Aya model, a state-of-the-art, open source, massively multilingual, research LLM covering 101 languages – including more than 50 previously underse...
New LLaMA 3 Fine-Tuned - Smaug 70b Dominates Benchmarks: Smaug 70b, a fine-tuned version of LLaMA 3, is out and has impressive benchmark scores. How does it work against our tests, though?Try LLaMA3 on TuneStudio f...
DeepSeek-V2 Chat by deepseek | OpenRouter: DeepSeek-V2 Chat is a conversational finetune of DeepSeek-V2, a Mixture-of-Experts (MoE) language model. It comprises 236B total parameters, of which 21B are activated for each token. Compared with D...

May 23, 2024 ALL of AI Engineering in One Place

Thu, May 23, 2024

OpenRouter (Alex Atallah) ▷ #general (85 messages🔥🔥):

Types of OpenRouter users debated: One user humorously pointed out two stereotypical types of OR users: those asking for affectionate interactions with AI and those requesting inappropriate stories, sparking a brief discussion about the prevalence of role-playing apps.
Phi-3 Vision Model Introduced: Information was shared on the Phi-3 Vision model available on HuggingFace, emphasizing its high-quality reasoning capabilities and rigorous enhancement processes. Read more about the model and its documentation.
Addressing Wizard’s verbosity issues: Members discussed how Wizard8x22 struggles with verbosity and improper punctuation, suggesting adjusting the repetition penalty as a potential fix. The discussion branched out to other models like Mixtral and highlighted the variability in model performance.
Managing billing errors for student platforms: A lengthy conversation unfolded regarding a user encountering billing errors while managing their student platform. The exchange culminated in a temporary resolution by deleting and re-entering billing information while expressing hope for future nonprofit discounts.
Exploring new LLM interaction techniques: One user shared their thread on Twitter about innovative ways of using LLMs through action commands, inviting feedback and experiences from others to expand the discussion.

Links mentioned:

microsoft/Phi-3-vision-128k-instruct · Hugging Face: no description found
microsoft/Phi-3-medium-4k-instruct · Hugging Face: no description found
Tweet from Leon Builds Agents (@leonjcoe): There's a new way of interacting with LLMs that no one is talking about. Action Commands So what are they and why are they so valuable? Let me show you

May 21, 2024 Anthropic's "LLM Genome Project": learning & clamping 34m features on Claude Sonnet

Wed, May 22, 2024

OpenRouter (Alex Atallah) ▷ #general (76 messages🔥🔥):

GPT-32k faces issues with rate limits: Users reported encountering token rate limit issues with Azure’s GPT-32k model. One user stated, “Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded the token rate limit.”
Phi-3 models discussed for robust performance: Members discussed Phi-3-medium-4k-instruct and Phi-3-vision-128k-instruct for high-quality, reasoning-dense data handling. Both models incorporate supervised fine-tuning and direct preference optimization for enhanced performance.
New interaction methods with LLMs: One user shared a thread on a new way of interacting with LLMs using “Action Commands.” They sought feedback from others to see if anyone had similar experiences.
Handling verbosity in models: Members discussed handling verbosity in models like Wizard8x22. One suggested lowering the repetition penalty to reduce verbosity, while another noted that different models might be better suited for specific tasks.
Discount request and credit issues for non-profits: A user had issues with Error 400 related to billing address and requested discounts for non-profits. An admin explained that OpenRouter passes bulk discounts down to users and keeps a 20% margin.

Links mentioned:

Tweet from Leon Builds Agents (@leonjcoe): There's a new way of interacting with LLMs that no one is talking about. Action Commands So what are they and why are they so valuable? Let me show you
microsoft/Phi-3-vision-128k-instruct · Hugging Face: no description found
microsoft/Phi-3-medium-4k-instruct · Hugging Face: no description found

Mar 12, 2024 Fixing Gemma

Tue, Mar 12, 2024

OpenRouter (Alex Atallah) ▷ #announcements (4 messages):

New Speed Champion Mistral 7b 0.2: @alexatallah proudly introduced Mistral 7b 0.2, boasting about a substantial speed boost—10x faster for short outputs and 20x faster for long ones, as well as a generous 32k context window. The performance was showcased in a demo tweet.
Gemma Nitro hits the market: A new cost-effective and high-speed model called Gemma Nitro was announced by @alexatallah, featuring impressive speeds of over 600+ tokens per second and offering an economical rate of $0.1 per million tokens. More details can be found on OpenRouter’s website.
Sneak peek tweet?: @alexatallah shared a mysterious Twitter link without additional context or comments.
OpenRouter flaunts no spending limits: @alexatallah revealed a user-friendly policy on OpenRouter, stating that there are no $ usage limits on the platform, potentially inviting users to utilize their services more freely.

Links mentioned:

Google: Gemma 7B (nitro) by google | OpenRouter: Gemma by Google is an advanced, open-source language model family, leveraging the latest in decoder-only, text-to-text technology. It offers English language capabilities across text generation tasks …

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Claude 3 Function Calling Made Easy: User @thevatsalsagalni introduced a function calling library tailored for the Claude 3 model family. The library supports Pydantic function schemas and is open for exploration and contribution at claudetools on GitHub.

Links mentioned:

GitHub - vatsalsaglani/claudetools: Claudetools is a Python library that enables function calling with the Claude 3 family of language models from Anthropic.: Claudetools is a Python library that enables function calling with the Claude 3 family of language models from Anthropic. - vatsalsaglani/claudetools

OpenRouter (Alex Atallah) ▷ #general (120 messages🔥🔥):

Censorship in AI Models Becomes a Hot Topic: Multiple users, including @.toppa and @lemmyle, expressed concerns about censorship creeping into AI models, such as the “Claude 2 self moderated versions,” and potential new restrictions related to copyright or AI responses. Conversations touched on how AI models, like Claude 3, are responding to user inputs and the desire for less censored options.
Querying AI Format Support and Parameter Functionality: In a technical discussion, @cupidbot.ai and @spaceemotion questioned the formatting of messages for various AI models and the functionality of system parameters such as json_object and add_generation_prompt=True. @alexatallah clarified some documentation points, including the removal of schema until it sees more support.
Model Output Limits Spark Curiosity and Friction: Users like @zulfiqaar and @.wingedsheep explored the output length limitations of various models, with specific mention of GPT-4’s 4096 token output cap. Despite users like @lemmyle showing dissatisfaction with current limitations, @alexatallah mentioned that longer completions could significantly increase memory usage.
Technical Assistance Sought and Offered Among Users: Users sought clarification and assistance on model intricacies, ranging from Claude API’s handling of system role messages (@njbbaer, with a response by @alexatallah) to adapting model files for personal use (@mikef0x.). Insights included OpenRouter’s facilitation of prompt customization using ChatML and direct prompts.
User Engagement with OpenRouter and Model Accessibility Issues: Conversations highlighted user engagement with OpenRouter, as shown by the creation of a Google Sheets connection app by @mostlystable, and addressed accessibility issues with models like Nous Hermes 70B. Updates on model status and functionality were given by users such as @louisgv and @spaceemotion, with official responses from @alexatallah.

Links mentioned:

TOGETHER: no description found
NeverSleep/Noromaid-20b-v0.1.1 · Hugging Face: no description found
The Introduction Of Chat Markup Language (ChatML) Is Important For A Number Of Reasons: On 1 March 2023 OpenAI introduced the ChatGPT and Whisper APIs. Part of this announcement was Chat Markup Langauge which seems to have gone…
openchat/openchat-3.5-0106-gemma · Hugging Face: no description found
FreeGPT for Google Sheets (Full Tutorial + Template): How to on using Openrouter inside Google Sheets with custom template. 📍 Unlock the power of AI in Google Spreadsheets for free! 🚀 In this video, we’ll walk…
Reddit - Dive into anything: no description found

Mar 08, 2024 FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs

Sat, Mar 9, 2024

OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Early Sneak Peek at ‘Nitro’ Models: @alexatallah alerted users to the appearance of new “nitro” models which are safe to use and build with, despite the possibility of minor changes before an official announcement.
Introducing Nitro Models and Extended Contexts: @alexatallah excitedly introduced Nitro models, including Mixtral, MythoMax, and Llama 70B, which feature a new Nitro variant button and are powered by Groq and other providers. Additionally, context-extended models are now available, with Mixtral expanding to 732,768 context (OpenRouter Models), and a dedicated video demonstration showcases the models’ improved speed and cost-effectiveness.
Developer Features and Dynamic Routing: New developer features are highlighted, including performance timelines, JSON mode, and dynamic routing. Early users are invited to check out the documentation for detailed information.
OpenRouter’s Path to Model Selection and Use: @alexatallah explains that OpenRouter helps in selecting models based on price and performance metrics, standardized APIs for easy switching between models, and upcoming features that include usage-based comparison and OAuth capabilities for user-choice models. Details can be found in the documentation and rankings.
Mistral 7b 0.2 Goes Nitro: @alexatallah reveals the latest Nitro model, Mistral 7b 0.2, noting its significant speed increase (up to 20x for long outputs), and an expanded context limit of 32k. A live demo is available on Twitter.

Links mentioned:

Mixtral 8x7B Instruct (nitro) by mistralai | OpenRouter: A pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters. Instruct model fin…
OpenRouter: Build model-agnostic AI apps

OpenRouter (Alex Atallah) ▷ #general (109 messages🔥🔥):

Model Comparison and Performance: @filth2 highlighted that Sonnet offers an impressive price-performance ratio, with costs as low as .03 for “5k context and 1200 response length,” making it a valuable option compared to other models. Meanwhile, @phoshnk and @mka79 debated the subtle differences and cost-effectiveness between Opus and Sonnet, with a general consensus on Sonnet being more affordable.
Moderation Layer Confusion Clarified: @filth2, @spaceemotion, and @alexatallah discussed the nuances of moderation in models offered by OpenAI, Anthropic, and OpenRouter. It was clarified that OpenRouter applies an additional layer of moderation, which could lead to more refusals compared to using the OpenAI or Anthropic API directly.
Data Retention and Training Practices Inquired: @mka79 raised questions about Anthropic’s use of customer content in model training. @spaceemotion shared links to Anthropic’s support articles, leading to the understanding that content from paid services may not be used for training.
Anthropic Endpoint Clarifications by Alex Atallah: @alexatallah illuminated how Anthropic moderates content specifically for OpenRouter self-moderated requests, which includes a server-side classifier and transformer affecting the responses. Users engaging directly with Anthropic’s API may not have an additional moderation layer, but risk facing repercussions without a proper moderation strategy in place.
Discussions on Nitro Models and Pricing Insights: Users like @starlord2629,@xiaoqianwx, and @louisgv talked about Nitro models, particularly their higher throughput and different pricing, with Groq now powering Mixtral 8x7b instruct nitro at a cost of 0.27/1M tokens. Users expressed optimism and interest around these developments.

Mar 08, 2024 Inflection-2.5 at 94% of GPT4, and Pi at 6m MAU

Fri, Mar 8, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Tweet on Group Chatting with Claude 3: @alexatallah shared a Twitter post about a positive experience with group chatting using Claude 3, which was self-moderated. The story is available on OpenRouterAI’s Twitter.
“Nitro” Models in Testing: @alexatallah informed users of the appearance of new “nitro” models, which are safe to use and build with. Users were advised that slight changes might occur until an official announcement is made as they are incorporating feedback from early testers.

OpenRouter (Alex Atallah) ▷ #general (94 messages🔥🔥):

Sponsorship Offer for VSCode Extension Builders: @alexatallah extended an offer to sponsor anyone willing to build a VSCode extension compatible with OpenRouter with free credits.
Community Discusses VSCode Extensions for LLMs: Community members shared various VSCode extensions for coding assistance with LLMs like OpenRouter and GPT-4, including alternatives such as Cursor, Continue, and Tabby.
Inefficiency with Long Documents on OpenRouter Chat: @aliarmani experienced issues with long document processing on OpenRouter chat inference and received recommendations for alternatives such as Typingmind and ChatbotUI.
Claude 3 Opus Conversations Engaging But Impact Wallets: Users like @phoshnk and @billbear discussed the engaging nature of conversations with Claude 3 Opus, while others like @xiaoqianwx lamented its cost; @filth2 highlighted Sonnet’s cost-effectiveness.
Moderation Layers on OpenRouter Explained: Community members explained the moderation layers applied to the models on OpenRouter, with OpenAI and Anthropic models receiving additional moderation compared to self-moderated beta models.

Links mentioned:

Continue: no description found
Home | Tabby: Description will go into a meta tag in <head />
Configuration | Continue: Configure your LLM and model provider
Perplexity: Sonar 8x7B by perplexity | OpenRouter: Sonar is Perplexity’s latest model family. It surpasses their earlier models in cost-efficiency, speed, and performance. The version of this model with Internet access is [Sonar 8x7B Online](/mo…
GitHub - continuedev/continue: ⏩ The easiest way to code with any LLM—Continue is an open-source autopilot for VS Code and JetBrains: ⏩ The easiest way to code with any LLM—Continue is an open-source autopilot for VS Code and JetBrains - continuedev/continue

Mar 07, 2024 Not much happened today

Thu, Mar 7, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Claude 3 Makes Group Chat a Breeze: @alexatallah shared a positive experience about group chatting with Claude 3, which self-moderates the conversation. They included a link to a Twitter story showcasing the functionality.

OpenRouter (Alex Atallah) ▷ #general (78 messages🔥🔥):

Question about Claude Versions: @quentmaker inquired about the difference between anthropic/claude-2.0 and anthropic/claude-2, with @alexatallah and @wikipediadotnet clarifying that Claude-2 automatically selects the latest 2.x version.
Uncertain Costs with Multithreading: @mhmm0879 expressed concern about actual costs exceeding predicted ones when using multithreading with gemma 7b and openchat 3.5. @alexatallah and @louisgv inquired about the specific use case and whether images were being sent to try and diagnose the issue.
Claude and Censorship Chat: Users @followereternal, @ayumeri, @billbear, and @scepty9097 had a mixed discussion on Claude 3, with some expressing disapproval of potential over-censorship and others praising the model for its conversational capabilities.
LangChain.js Issues with OpenRouter: @mysticfall pointed out difficulties using LangChain.js with OpenRouter's ChatOpenAI model for text completion. @spaceemotion mentioned that the endpoint for text completion might be marked as “legacy” by OpenAI, and @mysticfall noted potential problems due to hardcoded endpoints in OpenAI’s library.
Exploration of VSCode Extensions for OpenRouter: @_maximus01 inquired about a VSCode extension for code assistance that integrates with OpenRouter, leading to suggestions from @alexatallah about sponsoring such work, and @spaceemotion and @_sam___ sharing potential alternatives and an active GitHub project.

Links mentioned:

Home | Tabby: Description will go into a meta tag in <head />
Configuration | Continue: Configure your LLM and model provider
Perplexity: Sonar 8x7B by perplexity | OpenRouter: Sonar is Perplexity’s latest model family. It surpasses their earlier models in cost-efficiency, speed, and performance. The version of this model with Internet access is [Sonar 8x7B Online](/mo…
Continue: no description found
GitHub - 0xk1h0/ChatGPT_DAN: ChatGPT DAN, Jailbreaks prompt: ChatGPT DAN, Jailbreaks prompt. Contribute to 0xk1h0/ChatGPT_DAN development by creating an account on GitHub.
GitHub - continuedev/continue: ⏩ The easiest way to code with any LLM—Continue is an open-source autopilot for VS Code and JetBrains: ⏩ The easiest way to code with any LLM—Continue is an open-source autopilot for VS Code and JetBrains - continuedev/continue

Mar 05, 2024 Stable Diffusion 3 — Rombach & Esser did it again!

Wed, Mar 6, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Claude 3 Debuts on OpenRouter: @alexatallah announces the release of Claude 3 to OpenRouter, including an experimental self-moderated version for users to explore. Check it out here.
Introducing Claude 3 Opus with High EQ: @louisgv touts Claude 3 Opus for its exceptional emotional intelligence (EQ) and its ability to score 60% on a test that challenges PhDs. It’s multi-modal, supports assistant prefill, and it conforms to the new API. Details can be found here.
Claude 3 Sonnet Rivals GPT-4 at Lower Cost: Claude 3 Sonnet is introduced as a cost-effective alternative to GPT-4, also offering multi-modal capabilities. Experience it here.
Self-Moderated Claude 3 Available in Beta: Beta self-moderated versions of Claude 3 are available at no extra cost, offering users a chance to explore new functionalities without an additional charge. Users can go really ham with this version by visiting this link.
New Parameters API Unveiled for Developers: OpenRouter introduces a new Parameters API in beta, allowing developers to access a list of median parameter values for all models, facilitating a more standardized integration. Developers can find the documentation and learn more about this feature here.

Links mentioned:

Anthropic: Claude 3 Opus by anthropic | OpenRouter: Claude 3 Opus is Anthropic’s most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and benchmark re…
Anthropic: Claude 3 Sonnet by anthropic | OpenRouter: Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. See the launch announcement and …
Anthropic: Claude 3 Opus (self-moderated) by anthropic | OpenRouter: This is a lower-latency version of Claude 3 Opus, made available in collaboration with Anthropic, that is self-moderated: response moderation happens on the model&#x…
OpenRouter: Build model-agnostic AI apps

OpenRouter (Alex Atallah) ▷ #general (180 messages🔥🔥):

Claude 3 Sparks Joy and Confusion: Users like @justjumper_, @louisgv, and @arsoban express excitement for the new Claude 3, with @arsoban noting it outperforms GPT-4 in some tests. In contrast, @alexatallah assures that even the “experimental” version of Claude 3 will go live, responding to @wikipediadotnet and others’ queries.
Pricing Puzzles Everyone: A conversation about Claude 3’s pricing ensues with users like @oti5, @voidlunaa, and @xiaoqianwx. @voidlunaa finds Opus’s price jump from Sonnet bizarre, suggesting that it might be priced like a physical service due to @wikipediadotnet’s comic observation.
Interaction Issues with New Models: User @fillysteps reports getting blank responses from all Claude models except the 2.0 beta and suspects being banned, while others like @wikipediadotnet and @antoineross inquire about pricing and implementation details. @louisgv troubleshoots, suggesting issues might arise from region blocks or using unsupported features like image inputs.
Mixed Reception for Claude’s Literary Talents: While some, such as @khadame and @wikipediadotnet, praise Claude 3 Sonnet and Opus for their writing quality, others like @edgyfluff report repeated, unwanted auto-generated responses, with @wikipediadotnet offering troubleshooting tips.
Undercurrent of Model Comparisons and Costs: Discussions on model comparisons are rampant, with mentions of Claude 3 outperforming others like Gemini Ultra and GPT-4 by @arsoban, @voidlunaa, and @followereternal. Concerns like @mhmm0879’s about real vs. predicted costs of model usage indicate a need for clarity on pricing structures, while @alexatallah clarifies tokenization issues might be at fault.

Links mentioned:

OpenRouter: A router for LLMs and other AI models
codebyars.dev: no description found

Mar 04, 2024 Claude 3 just destroyed GPT 4 (see for yourself)

Tue, Mar 5, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Claude 3.0 Drops Today: @alexatallah announced that Claude 3 is being released on OpenRouter, including an experimental self-moderated version. The community’s anticipation is finally being met with this latest update.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

LLM Challenge by Leumon: @leumon has set up a server for a fun and educational game that tries to trick GPT3.5 into revealing a secret key. The game highlights the importance of treating AI output cautiously and ensuring there are additional safety measures when dealing with confidential information. The concept was originated by @h43z and has been refined by @leumon with new prompts.
Free Conversations with Diverse AIs: Alongside the challenge, @leumon’s server allows users to chat with various AI models like Claude-v1, Gemini Pro, Mixtral, Dolphin, and Yi for free using the openrouter API. This provides a unique opportunity to explore different LLMs’ capabilities and responses.

Links mentioned:

Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.

OpenRouter (Alex Atallah) ▷ #general (96 messages🔥🔥):

Claude-3 Access and Discussion: @justjumper_ expressed eagerness for Claude3 access shortly after its launch. @louisgv confirmed that all Claude 3 versions were being added, with a special note that the “experimental” version would also go live, while @arsoban shared that in their tests, Claude3 Opus demonstrated greater text comprehension than GPT-4.
OpenAI vs Claude Pricing: Members @oti5 and @voidlunaa debated the seemingly high pricing of Anthropic’s Claude 3 compared to GPT-4, with particular perplexity about the cost jump from Claude-3-Sonnet to Claude-3-Opus.
Claude Performance and Availability: The performance of Claude 3 variants were discussed, with @arsoban suggesting in some tests that Sonnet outperforms Opus and offering to share their insights in a voice chat. @alexatallah reassured @billbear that Claude 3 was on the way and that the “experimental” version would be available as well.
Testing Claude’s Abilities: Users @arsoban and @you.wish planned to conduct Claude 3 tests for English-to-code translation, particularly in the context of game development, despite @arsoban not having a game engine installed for practical implementation.
Deteriorating Model Performance over Time: @capitaindave observed a potential decrease in the reasoning capabilities of Gemini Ultra compared to its performance at launch, with the AI exhibiting a stronger pretense of coherence than actual substance.

Links mentioned:

OpenRouter: A router for LLMs and other AI models
codebyars.dev: no description found

Mar 01, 2024 The Era of 1-bit LLMs

Sat, Mar 2, 2024

OpenRouter (Alex Atallah) ▷ #general (45 messages🔥):

Prepaid Card Predicament: User @fakeleiikun inquired about prepaid card support on OpenRouter and mentioned issues such as error 402 or error 502 when using Google Pay, despite the card functioning on other sites. @louisgv advised that prepaid cards like Discovery may be flagged by Stripe Radar, but virtual cards from supported banks are generally accepted.
Asking for Assistance with Helicone Integration: User @wise_monkey_42910 sought help integrating Helicone with OpenRouter using Langchain ChatOpenAI. @louisgv provided a helpful link to an example on GitHub and the Helicone documentation for proper integration.
Token Troubles Clarified: @cupidbot.ai asked about streaming with function calling and the distinction between native_tokens_prompt and tokens_prompt. @alexatallah clarified that native_tokens refers to tokens in the model’s own tokenizer, and existing usage metrics are indeed native, with plans to update the documentation accordingly.
Elon Musk and OpenRouter: The conversation took a turn when @telepathyx suggested that Elon Musk might be entering a space that competes with OpenRouter. Though @louisgv was surprised at first, @alexatallah corrected that Groq, not Grok, could be a potential future addition to OpenRouter once rate limitations are addressed, debunking the idea of Musk’s direct competition.

Links mentioned:

OpenRouter - Helicone: no description found
openrouter-examples/examples/langchain/index.ts at main · OpenRouterTeam/openrouter-examples: Examples of integrating the OpenRouter API. Contribute to OpenRouterTeam/openrouter-examples development by creating an account on GitHub.

Mar 01, 2024 Dia de las Secuelas (StarCoder, The Stack, Dune, SemiAnalysis)

Fri, Mar 1, 2024

OpenRouter (Alex Atallah) ▷ #general (49 messages🔥):

Claude Models Prompt Errors: @quentmaker reported an error when a chat has more than 8 alternating messages between user and assistant, affecting various Anthropics’ Claude models. @louisgv acknowledged the issue and promised a fix is in the works.
OpenRouter Addressing Turn Order Issues: @alexatallah suggested a temporary workaround for the prompt issue by changing the first assistant message to a system message. Meanwhile, development is underway to handle conversations that begin with a message from the assistant.
Rate Limit Discussions for OpenRouter: @gunpal5_43100 inquired about rate limits when using OpenRouter for generating large numbers of articles. @alexatallah clarified that each user with their own API key would have separate rate limits, which cumulatively should provide sufficient throughput.
Caching Concerns with Mistral: Several users, including @natefyi_30842 and @spaceemotion, observed similarities in responses when repeating prompts to Mistral models, leading to speculation of caching behavior by the API. @alexatallah confirmed that Mistral’s API might cache queries.
Compatibility with Prepaid Cards: @fakeleiikun asked about OpenRouter’s support for prepaid cards, particularly those provided by e-wallet apps. @louisgv indicated that while some prepaid cards might work, virtual cards from unsupported banks might not be accepted due to Stripe’s fraud prevention measures.

Links mentioned:

no title found): no description found
OpenRouter: Build model-agnostic AI apps

Feb 29, 2024 ... and welcome AI Twitter!

Thu, Feb 29, 2024

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

louisgv: Fixed several issues related to message ordering/formatting for Perplexity and Gemma.

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

OpenRouter Enables Simplicity and Inclusivity: @e__lo highlighted the ease of creating new AI tools with OpenRouter and its ability to integrate models not only from OpenRouter but also from giants like Google Vertex AI, Amazon Bedrock, and Cloudflare AI, ensuring users can request to add any model they wish to use.
Czech Language LLM Leaderboard Launch: @hynek.kydlicek shared his project – a leaderboard dedicated to evaluating Large Language Models (LLMs) for the Czech language. He pointed out that using OpenRouter is the easiest and most cost-effective option for this extensive task with over 8k samples, providing a link to the project.
Applause for the LLM Leaderboard Initiative: @alexatallah expressed support and excitement regarding @hynek.kydlicek’s Czech LLM leaderboard, calling the achievement “fantastic!”.
Beta Testers Wanted for AI Voice Chat App: @beaudjango introduced Pablo, an AI Voice Chat app that facilitates voice interactions without the need for typing and supports multiple LLMs and voices. They’re seeking beta testers and offering free AI credits for services including GPT-4 to those who join, with a TestFlight link provided for those interested in participating.

Links mentioned:

Join the Pablo - AI Voice Chat beta: Available on iOS
CZ-EVAL - a Hugging Face Space by hynky: no description found

OpenRouter (Alex Atallah) ▷ #general (73 messages🔥🔥):

Discrepancies with Chat Templates Identified: @aerk._. highlighted an unexpected response issue when expecting a continuation on the topic of LLMs with Gemma 7B. After some back and forth with @louisgv, a fix was deployed, and @aerk._. confirmed the resolution worked well.
Template Troubleshooting for Turn-Based Chat: @quentmaker encountered errors with multiple models when attempting to continue conversations beyond 8 user/assistant message pairs. @louisgv and @alexatallah both engaged to offer solutions and acknowledged the need for a fix in OpenRouter’s system.
Query on OpenRouter’s Revenue Generation: In response to a question from @_lynett about how OpenRouter makes money, @alexatallah mentioned they aren’t optimizing for revenue yet, sharing that potential earnings come from splitting volume discounts with users.
Rate Limits on OpenRouter Explored: @gunpal5_43100 inquired about rate limits when using ChatGPT with OpenRouter, leading @alexatallah to point towards the documentation on OpenRouter’s website that outlines the current limitations.
Excitement for Upcoming Models: Discord members, including @wikipediadotnet and @RobinF, discussed their anticipation for the release of Claude 3, while also humorously mentioning the model’s potential aversion to the term “excited”.

Links mentioned:

no title found): no description found
google/gemma-2b-it · Hugging Face: no description found
OpenRouter: Build model-agnostic AI apps

Feb 27, 2024 Welcome Interconnects and OpenRouter

Wed, Feb 28, 2024

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Mistral Large Makes Its Debut: @alexatallah announced Mistral Large, a closed-source, flagship model that fits between GPT-4 and Claude 2 for advanced capabilities, accessible here. The model supports multiple languages with high accuracy and has a 32,000 token context window, with different pricing for input and output tokens detailed in the announcement.
Mistral’s Pricing Adjustment: There has been a price decrease for Mistral Medium, and increases for Mistral Tiny and Small, leading @alexatallah to recommend switching to Mistral 7B Instruct and Mixtral 8x7B Instruct for affordability.
Introducing Perplexity Sonar with Online Connectivity: @alexatallah highlighted new models from Perplexity named Sonar, including an internet-connected variant based on Mixtral 8x7B, available here. Sonar models pride themselves on cost-efficiency, speed, and current factual information, with a specific recommendation to transition to Sonar models due to PPLX models being deprecated on March 15.
OpenRouter Playground Upgraded: New parameters like Top P, Top K, and penalties have been added to the OpenRouter Playground, as noted by @alexatallah, enhancing user control over model interactions.
Messaging System Fixes Rolled Out: @louisgv shared an update on fixes to message ordering and formatting issues for various models including Perplexity, Mistral, and Gemma, streamlining the user experience.

Links mentioned:

Mistral: Mistral Large by mistralai | OpenRouter: This is Mistral AI’s closed-source, flagship model. It’s powered by a closed-source prototype and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https:/…
Perplexity: Sonar 8x7B Online by perplexity | OpenRouter: Sonar is Perplexity’s latest model family. It surpasses their earlier models in cost-efficiency, speed, and performance. This is the online version of [Sonar 8x7B](/models/perplexity/sonar-mediu…

OpenRouter (Alex Atallah) ▷ #app-showcase (5 messages):

Videotok Takes Product Hunt By Storm: @borjasoler announced the launch of Videotok, an AI-powered platform for creating short videos, on Product Hunt with the invitation to support and share the post. Sharing the launch post offers the chance to win an annual plan.
Discover Blust AI’s Multitool Platform: @e__lo introduced Blust AI, a subscription service hosting multiple apps, and invited developers to integrate their own AI tools. Detailed steps for integration can be found in the documentation.
Seeking Clarifications on Blust AI: @anehzat complimented the user interface of Blust AI and inquired about the models and hosting used for the application.
Blust AI Functionality in Question: @anehzat reported that the app seems non-functional, albeit without providing additional details.
OpenRouter Unleashes No-Code Automations via Make.com: @jim14199 introduced an app for Make.com that enables users to automate AI workflows by connecting OpenRouter with over 1,700 other apps without any coding. The app offers lifetime access for a one-time payment and includes quick-win automation examples such as a Customer Support Auto-Response System.

Links mentioned:

no title found: no description found
Overview | Blust AI Studio Documentation Hub: The integration process between Blust AI Studio and external AI tools is designed to seamlessly connect users with a wide range of AI tools. Once an AI tool is registered and listed in the blust.AI ca…
Tweet from Borja Soler (@borjasolerr): Launching Videotok on Product Hunt now! ⚡️ Shorts creation with AI, getting the voices, images, script… created instantly 🤯 Would love your support: https://www.producthunt.com/posts/videotok If…
OpenRouter Integration for Make.com | by Synergetic : Connect OpenRouter with thousands of other apps with this exclusive Make.com (formerly Integromat) addon. Unlock new, powerful automated workflows and save time — no code required.

OpenRouter (Alex Atallah) ▷ #general (152 messages🔥🔥):

Enthusiastic Feedback for New Features: User @wikipediadotnet expressed excitement about the new parameter feature and suggested improvements like an API for parameters and consideration of how model default settings affect user preferences.
Community Brainstorms Feature Enhancements: @cupidbot.ai requested a “use_default_params” feature that applies median optimal parameters after a model has been live for a certain period. They also recommended giving weight to input from users with high-paid volumes as they are likely production users.
Anticipation Builds Up for Mistral Large: Multiple users like @billbear and @louisgv discussed the anticipation and eventual release of Mistral Large on OpenRouter, highlighting @alexatallah’s engagement in the rollout and user feedback on server errors post-launch.
Perplexity Alternation Puzzles Users: Users @mostlystable and @lynxplayz experienced error 400 with Perplexity models on OpenRouter, which was being looked into and addressed by @louisgv to improve compatibility with existing workflows.
Users Seek the Ideal Interface: User @jamesm6228 inquired about the best UI for using all OpenRouter AI models like Typing Mind, and @louisgv engaged to understand their feature needs and suggest alternatives.

Links mentioned:

Cheers Happy GIF - Cheers Happy High Five - Discover & Share GIFs: Click to view the GIF
Mistral: Mistral Large by mistralai | OpenRouter: This is Mistral AI’s closed-source, flagship model. It’s powered by a closed-source prototype and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https:/…
google/gemma-2b-it · Hugging Face: no description found
OpenRouter: A router for LLMs and other AI models
Perplexity: Sonar 7B by perplexity | OpenRouter: Sonar is Perplexity’s latest model family. It surpasses their earlier models in cost-efficiency, speed, and performance. The version of this model with Internet access is [Sonar 7B Online](/mode…
Chat Completions: Generates a model’s response for the given chat conversation.

AINEWS - OpenRouter Discord

OpenRouter ▷ #announcements (1 messages):

OpenRouter ▷ #app-showcase (7 messages):

OpenRouter ▷ #general (569 messages🔥🔥🔥):

OpenRouter ▷ #new-models (2 messages):

OpenRouter ▷ #discussion (29 messages🔥):

OpenRouter ▷ #announcements (1 messages):

OpenRouter ▷ #app-showcase (5 messages):

OpenRouter ▷ #general (176 messages🔥🔥):

OpenRouter ▷ #new-models (2 messages):

OpenRouter ▷ #discussion (26 messages🔥):

OpenRouter ▷ #announcements (3 messages):

OpenRouter ▷ #app-showcase (2 messages):

OpenRouter ▷ #general (527 messages🔥🔥🔥):

OpenRouter ▷ #discussion (100 messages🔥🔥):

OpenRouter ▷ #announcements (3 messages):

OpenRouter ▷ #app-showcase (11 messages🔥):

OpenRouter ▷ #general (684 messages🔥🔥🔥):

OpenRouter ▷ #new-models (1 messages):

OpenRouter ▷ #discussion (91 messages🔥🔥):

OpenRouter ▷ #announcements (1 messages):

OpenRouter ▷ #app-showcase (2 messages):

OpenRouter ▷ #general (151 messages🔥🔥):

OpenRouter ▷ #new-models (6 messages):

OpenRouter ▷ #discussion (42 messages🔥):

OpenRouter ▷ #announcements (1 messages):

OpenRouter ▷ #app-showcase (2 messages):

OpenRouter ▷ #general (307 messages🔥🔥):

OpenRouter ▷ #new-models (6 messages):

OpenRouter ▷ #discussion (60 messages🔥🔥):

OpenRouter ▷ #announcements (1 messages):

OpenRouter ▷ #app-showcase (6 messages):

OpenRouter ▷ #general (459 messages🔥🔥🔥):

OpenRouter ▷ #new-models (1 messages):

OpenRouter ▷ #discussion (42 messages🔥):

OpenRouter ▷ #announcements (1 messages):

OpenRouter ▷ #app-showcase (6 messages):

OpenRouter ▷ #general (459 messages🔥🔥🔥):

OpenRouter ▷ #new-models (1 messages):

OpenRouter ▷ #discussion (42 messages🔥):

OpenRouter ▷ #announcements (1 messages):

OpenRouter ▷ #app-showcase (6 messages):

OpenRouter ▷ #general (459 messages🔥🔥🔥):

OpenRouter ▷ #new-models (1 messages):

OpenRouter ▷ #discussion (42 messages🔥):

OpenRouter ▷ #general (139 messages🔥🔥):

OpenRouter ▷ #new-models (1 messages):

OpenRouter ▷ #discussion (25 messages🔥):

OpenRouter ▷ #general (141 messages🔥🔥):

OpenRouter ▷ #new-models (1 messages):

OpenRouter ▷ #discussion (6 messages):

OpenRouter ▷ #announcements (2 messages):

OpenRouter ▷ #app-showcase (1 messages):

OpenRouter ▷ #general (106 messages🔥🔥):

OpenRouter ▷ #new-models (1 messages):

OpenRouter ▷ #discussion (37 messages🔥):

OpenRouter ▷ #announcements (2 messages):

OpenRouter ▷ #app-showcase (13 messages🔥):

OpenRouter ▷ #general (222 messages🔥🔥):

OpenRouter ▷ #new-models (1 messages):

OpenRouter ▷ #discussion (91 messages🔥🔥):

OpenRouter ▷ #app-showcase (125 messages🔥🔥):

OpenRouter ▷ #general (428 messages🔥🔥🔥):

OpenRouter ▷ #new-models (1 messages):

OpenRouter ▷ #discussion (93 messages🔥🔥):

OpenRouter ▷ #app-showcase (124 messages🔥🔥):

OpenRouter ▷ #general (126 messages🔥🔥):

OpenRouter ▷ #new-models (2 messages):

OpenRouter ▷ #discussion (28 messages🔥):

OpenRouter ▷ #announcements (1 messages):

OpenRouter ▷ #general (401 messages🔥🔥):

OpenRouter ▷ #new-models (1 messages):

OpenRouter ▷ #discussion (14 messages🔥):

OpenRouter ▷ #app-showcase (2 messages):

OpenRouter ▷ #general (306 messages🔥🔥):

OpenRouter ▷ #new-models (2 messages):

OpenRouter ▷ #discussion (21 messages🔥):

OpenRouter ▷ #app-showcase (2 messages):

OpenRouter ▷ #general (606 messages🔥🔥🔥):

OpenRouter ▷ #new-models (1 messages):