DeepSeek R1 vs OpenAI o1 vs Sonnet 3.5: Battle of Greatest LLMs

The day OpenAI launched the o1 mannequin, there was chatter in every single place that we at the moment are nearer to AGI than ever. Whereas AGI (Synthetic Normal Intelligence) nonetheless looms someplace sooner or later, we do have the o1 mannequin. Nevertheless, it isn’t actually accessible to many, because of its whopping ticket value of $200 per thirty days. Now what if I instructed you you could get entry to o1 degree reasoning and computational capabilities, utterly freed from value? Sure, its true – with DeepSeek’s new R1 mannequin, you may! The Chinese language AI startup, DeepSeek, has been raining presents for the reason that New Yr, beginning with the launch of DeepSeek V3 – a mannequin that competes with GPT 4o – and its cell app. Their newest reward to the AI neighborhood is DeepSeek R1 – a big language mannequin (LLM) that provides o1 a run for its cash for actually a fraction of the associated fee! On this weblog, we’ll evaluate DeepSeek R1 Vs OpenAI o1 Vs Sonnet 3.5 and see if the promising metrics stand true or not.

What’s DeepSeek R1?

DeepSeek R1 is a complicated reasoning-focused, open supply, LLM designed to revolutionize reasoning capabilities in AI methods. It introduces a novel strategy to coaching LLMs, leveraging reinforcement studying (RL) as its cornerstone, whereas minimizing using conventional supervised fine-tuning (SFT).

The mannequin emphasizes logic, problem-solving, and interpretability, making it apt for duties involving deep logical reasoning, resembling STEM duties, coding, and superior Chain-of-Thought (CoT) reasoning. This makes it a direct competitor to OpenAI’s o1 and Claude’s Sonnet 3.5.

What’s even higher is that DeepSeek R1’s API is a whopping 97% cheaper in comparison with Claude’s Sonnet 3.5 and nearly 93% cheaper in comparison with OpenAI’s o1.

Study Extra: DeepSeek R1- OpenAI’s o1 Largest Competitor is HERE!

How you can Entry DeepSeek R1?

You may discover DeepSeek R1 by way of the DeepSeek Chat interface. Right here’s what to do:

  1. Head to: https://chat.deepseek.com/.
  2. Check in to your account or join one.
  3. In the midst of the display screen, click on on: “Deepthink“.

The platform will already work with its DeepSeek R1 model.

Now, for those who want to use the API as an alternative:

  1. Get hold of your API key from the DeepSeek Developer Portal: https://api-docs.deepseek.com/
  2. Arrange your growth atmosphere with vital libraries resembling Python’s requests or OpenAI bundle.
  3. Configure your API shopper with the bottom URL: https://api.deepseek.com

DeepSeek R1 Vs OpenAI o1 Vs Claude Sonnet 3.5: Mannequin Comparability

Function DeepSeek-R1 OpenAI o1 Sequence Claude Sonnet 3.5
Coaching Strategy Reinforcement studying (RL) with minimal supervised information Supervised fine-tuning (SFT) + RLHF Supervised fine-tuning + RLHF
Particular Strategies Chilly-start information, rejection sampling, and pure RL Combines SFT and RL for normal versatility Centered on alignment and security
Core Focus Reasoning-intensive duties (math, coding, CoT) Normal-purpose LLM Moral and secure AI, balanced reasoning
Enter Token Value $0.14 (cache hit), $0.55 (cache miss) per million $1.50–$60 per million tokens $1.45–$8.60 per million tokens
Output Token Value $2.19 per million tokens $60 per million tokens $6–$10 per million tokens
Affordability Extraordinarily cost-effective, particularly for frequent use Excessive value for superior fashions Reasonably priced for security functions
Accessibility Totally open-source (free for internet hosting/customization) Proprietary, pay-per-use API Proprietary, pay-per-use API

DeepSeek R1 Vs o1 Vs Sonnet 3.5: Duties

I’m now going to check DeepSeek R1, OpenAI o1, and Sonnet 3.5 for a number of logical and coding associated duties, utilizing their chat interfaces. I’ll rank them from 1-3 based mostly on the responses they generate.

Right here,
1 – means the most effective response.
2 – means the second finest response.
3 – the final one.

On the finish, the mannequin with the lowest complete can be the winner!

Activity 1: Logical Reasoning

Immediate: “You stroll right into a room and see a mattress. On the mattress there are two canines, 4 cats, a giraffe, 5 cows, and a duck. There are additionally three chairs and a desk. What number of legs are on the ground?”

Outcome by DeepSeek R1

DeepSeek R1 vs OpenAI o1 vs Sonnet 3.5: Battle of Greatest LLMs

Outcome by OpenAI o1

OpenAI o1 - logical reasoning

Outcome by Sonnet 3.5

Sonnet 3.5 - logical reasoning

Assessment:

DeepSeek R1: This mannequin takes a while to generate its response. Whereas the calculations have been appropriate, the mannequin didn’t depend the legs of the desk and chairs. Though surprisingly, it did depend the human legs that the opposite two fashions didn’t.

OpenAI o1: This mannequin too takes  time to generate the response. Once more, whereas the calculations have been appropriate and there was an in depth rationalization for a similar, it fails to incorporate the human legs on the ground. Thus, its consequence can also be incorrect.

Sonnet 3.5: This mannequin is fast to generate the response and its calculations are appropriate. Nevertheless, it fails to account for the human legs which might have been current on the ground within the room. So its remaining reply is wrong.

General, I didnt’ get an accurate response from any mannequin! However DeepSeek R1’s logical strategy did impress me.

Outcome: DeepSeek R1: 1 | OpenAI o1: 3 | Sonnet 3.5: 2

Activity 2: Out of the Field Pondering

Immediate: “Create a secret language between two pals who’re nice at maths however very poor at english.”

Outcome by DeepSeek R1

DeepSeek R1 Vs OpenAI o1 Vs Sonnet 3.5 - Application output 1

Outcome by OpenAI o1

DeepSeek R1 Vs OpenAI o1 Vs Sonnet 3.5 - Application output 2

Outcome by Sonnet 3.5

DeepSeek R1 Vs OpenAI o1 Vs Sonnet 3.5 - Application output 3

Assessment:

DeepSeek R1: The mannequin thought by way of the duty and gave me three doable methods to create a secret language, all of which have been distinctive and equally secretive.

OpenAI o1: This mannequin gave an in depth perception into the language, guiding me by way of all the small print. However I discovered that the key language was a bit too pretty easy to decode, contemplating it needed to be a “secret”.

Sonnet 3.5: The mannequin was fast to generate the reply. Though its strategy was barely tedious, the consequence was a language that absolutely was tremendous secretive.

General, Deepseek R1 and Sonnet 3.5 stood out for me however R1 wins due to the alternatives it offered in its response.

Outcome: DeepSeek R1: 1 | OpenAI o1: 3 | Sonnet 3.5: 2

Activity 3: Scientific Reasoning

Immediate: “You might have a robust laser and a wonderfully reflective mirror. How are you going to purpose the laser on the mirror in such a manner that the mirrored beam by no means comes again to you?”

Outcome by DeepSeek R1

DeepSeek R1 - Scientific Reasoning

Outcome by OpenAI o1

OpenAI o1 - Scientific Reasoning

Outcome by Sonnet 3.5

Sonnet 3.5 - Scientific Reasoning

Assessment:

DeepSeek R1: The mannequin generates a really well-written consequence with a small visualization to boost the general expertise. It particulars down the logic in easy phrases and offers a properly rounded answer.

OpenAI o1: The mannequin generates a great consequence with clear rationalization however lacks the visible component that would additional improve the training expertise.

Sonnet 3.5: The mannequin offers a transparent response itemizing out the reply and the explanation behind it. Whereas the reply is well-written, it lacks the visible component that would take it a step additional.

General, DeepSeek R1 inches the victory dwelling with its small visible flowchart.

Outcome: DeepSeek R1: 1 | OpenAI o1: 3 | Sonnet 3.5: 2

Activity 4: Coding Expertise

Immediate: “Create a easy HTML web page with a button that modifications the background colour to Pink when clicked with some confetti”

Outcome by DeepSeek R1

Outcome by OpenAI o1

Outcome by Sonnet 3.5

Assessment:

DeepSeek R1: The mannequin writes the code and likewise offers an inbuilt interface to run the HTML code, making the complete course of fairly easy.

OpenAI o1: The mannequin rapidly writes the code however to test it you’ll have to run it in a separate interface.

Sonnet 3.5: The mannequin writes the code and likewise offers an interface to run the code inside it. Nevertheless, whereas operating the code, it encountered an error.

General, DeepSeek R1 and Sonnet3.5 take the lead with their inbuilt characteristic to run HTML code however since Sonnet’s code runs into an error, it’s DeepSeek R1 that presides right here.

Outcome: DeepSeek R1: 1 | OpenAI o1: 2 | Sonnet 3.5: 3

Activity 5: Drawback-Fixing Expertise

Immediate: “Write a program that may resolve a Sudoku puzzle”

Outcome by DeepSeek R1

def solve_sudoku(board):
    # Discover the following empty cell (denoted by 0)
    empty = find_empty(board)
    if not empty:
        return True  # Puzzle solved
    row, col = empty

    for num in vary(1, 10):
        if is_valid(board, row, col, num):
            board[row][col] = num

            if solve_sudoku(board):
                return True

            # Backtrack if no legitimate continuation
            board[row][col] = 0

    return False
def is_valid(board, row, col, num):

    # Examine row
    if num in board[row]:
        return False

    # Examine column
    if num in [board[i][col] for i in vary(9)]:
        return False

    # Examine 3x3 subgrid
    subgrid_row = (row // 3) * 3
    subgrid_col = (col // 3) * 3
    for i in vary(subgrid_row, subgrid_row + 3):
        for j in vary(subgrid_col, subgrid_col + 3):
            if board[i][j] == num:
                return False

    return True

def find_empty(board):
    for i in vary(9):
        for j in vary(9):
            if board[i][j] == 0:
                return (i, j)
    return None

def print_board(board):
    for i in vary(9):
        if i % 3 == 0 and that i != 0:
            print("-" * 21)
        for j in vary(9):
            if j % 3 == 0 and j != 0:
                print("|", finish=" ")
            print(f"{board[i][j] or '. '}", finish=" ")
        print()

# Instance utilization
if __name__ == "__main__":
    # Pattern Sudoku puzzle (0 = empty cell)
    puzzle = [
        [5,3,0,0,7,0,0,0,0],
        [6,0,0,1,9,5,0,0,0],
        [0,9,8,0,0,0,0,6,0],
        [8,0,0,0,6,0,0,0,3],
        [4,0,0,8,0,3,0,0,1],
        [7,0,0,0,2,0,0,0,6],
        [0,6,0,0,0,0,2,8,0],
        [0,0,0,4,1,9,0,0,5],
        [0,0,0,0,8,0,0,7,9]
    ]

    print("Authentic puzzle:")
    print_board(puzzle)

    if solve_sudoku(puzzle):
        print("nSolved puzzle:")
        print_board(puzzle)
    else:
        print("nNo answer exists")

Outcome by OpenAI o1

def solve_sudoku(board):
    """
    Clear up a 9x9 Sudoku puzzle in-place utilizing backtracking.
    board: A 9x9 2D checklist of integers, the place 0 denotes an empty cell.
    Returns True if the puzzle might be solved, in any other case False.
    """
    # Discover an empty cell
    empty = find_empty_cell(board)
    if not empty:
        # No empty cells left, puzzle is solved
        return True
    else:
        row, col = empty

    # Attempt digits 1 by way of 9
    for num in vary(1, 10):
        if is_valid(board, row, col, num):
            board[row][col] = num  # Place num

            # Recursively try to resolve
            if solve_sudoku(board):
                return True

            # Undo transfer (backtrack)
            board[row][col] = 0

    return False

def find_empty_cell(board):
    """
    Discover an empty cell within the board, indicated by 0.
    Returns (row, col) tuple or None if no empty cell is discovered.
    """
    for r in vary(9):
        for c in vary(9):
            if board[r][c] == 0:
                return (r, c)  # row, col
    return None

def is_valid(board, row, col, num):
    """
    Examine if inserting num at board[row][col] is legitimate
    based on Sudoku guidelines:
      - num not in present row
      - num not in present column
      - num not in present 3x3 field
    """
    # Examine row
    if num in board[row]:
        return False

    # Examine column
    for r in vary(9):
        if board[r][col] == num:
            return False

    # Examine 3x3 sub-box
    # Establish top-left nook of the 3x3 field
    box_row = (row // 3) * 3
    box_col = (col // 3) * 3
    for r in vary(box_row, box_row + 3):
        for c in vary(box_col, box_col + 3):
            if board[r][c] == num:
                return False

    return True

if __name__ == "__main__":
    # Instance Sudoku puzzle (0 = empty)
    puzzle = [
        [7, 8, 0, 4, 0, 0, 1, 2, 0],
        [6, 0, 0, 0, 7, 5, 0, 0, 9],
        [0, 0, 0, 6, 0, 1, 0, 7, 8],
        [0, 0, 7, 0, 4, 0, 2, 6, 0],
        [0, 0, 1, 0, 5, 0, 9, 3, 0],
        [9, 0, 4, 0, 6, 0, 0, 0, 5],
        [0, 7, 0, 3, 0, 0, 0, 1, 2],
        [1, 2, 0, 0, 0, 7, 4, 0, 0],
        [0, 4, 9, 2, 0, 6, 0, 0, 7]
    ]

    print("Preliminary puzzle:")
    for row in puzzle:
        print(row)

    if solve_sudoku(puzzle):
        print("nSolved puzzle:")
        for row in puzzle:
            print(row)
    else:
        print("No answer discovered.")

Outcome by Sonnet 3.5

def print_board(board):
    """Print the Sudoku board in a readable format."""
    for i in vary(len(board)):
        if i % 3 == 0 and that i != 0:
            print("- - - - - - - - - - - -")
        for j in vary(len(board[0])):
            if j % 3 == 0 and j != 0:
                print("|", finish=" ")
            if j == 8:
                print(board[i][j])
            else:
                print(str(board[i][j]) + " ", finish="")

def find_empty(board):
    """Discover an empty cell within the board (represented by 0)."""
    for i in vary(len(board)):
        for j in vary(len(board[0])):
            if board[i][j] == 0:
                return (i, j)  # row, column
    return None
def is_valid(board, num, pos):
    """Examine if the quantity might be positioned within the given place."""
    # Examine row
    for j in vary(len(board[0])):
        if board[pos[0]][j] == num and pos[1] != j:
            return False

    # Examine column
    for i in vary(len(board)):
        if board[i][pos[1]] == num and pos[0] != i:
            return False

    # Examine 3x3 field
    box_x = pos[1] // 3
    box_y = pos[0] // 3
    for i in vary(box_y * 3, box_y * 3 + 3):
        for j in vary(box_x * 3, box_x * 3 + 3):
            if board[i][j] == num and (i, j) != pos:
                return False

    return True

def resolve(board):
    """Clear up the Sudoku puzzle utilizing backtracking."""
    empty = find_empty(board)
    if not empty:
        return True

    row, col = empty

    for num in vary(1, 10):
        if is_valid(board, num, (row, col)):
            board[row][col] = num

            if resolve(board):
                return True

            board[row][col] = 0
    return False

# Instance utilization
if __name__ == "__main__":
    # 0 represents empty cells
    puzzle = [
    [0, 2, 0, 6, 0, 8, 0, 0, 0],
    [5, 8, 0, 0, 0, 9, 7, 0, 0],
    [0, 0, 0, 0, 4, 0, 0, 0, 0],
    [3, 7, 0, 0, 0, 0, 5, 0, 0],
    [6, 0, 0, 0, 0, 0, 0, 0, 4],
    [0, 0, 8, 0, 0, 0, 0, 1, 3],
    [0, 0, 0, 0, 2, 0, 0, 0, 0],
    [0, 0, 9, 8, 0, 0, 0, 3, 6],
    [0, 0, 0, 3, 0, 6, 0, 9, 0]
]

    print("Authentic Puzzle:")
    print_board(puzzle)
    print("nSolving...n")

    if resolve(puzzle):
        print("Solved Puzzle:")
        print_board(puzzle)
    else:
        print("No answer exists.")

Assessment:

DeepSeek R1: The mannequin generates the code which is kind of is extra user-friendly as a result of its well-structured output formatting with a devoted print_board operate. This makes it simpler to visualise the puzzle earlier than and after fixing.

OpenAI o1: The mannequin generates the code which has barely cleaner strategy with higher naming conventions within the find_empty_cell operate and the primary fixing logic, which might be extra readable for some builders.

Sonnet 3.5: The mannequin generates a really written, concise code which comes with class based mostly divisions that makes it simpler to know.

General, Sonnet3.5 takes the lead with its clear and concise code.

Outcome: DeepSeek R1: 2 | OpenAI o1: 3 | Sonnet: 1

Ultimate Outcome

Ultimate Rating: DeepSeek R1: 6 | OpenAI o1: 15 | Sonnet: 9

DeepSeek R1 emerges because the winner with an in depth chase by Sonnet 3.5. However every mannequin has its personal key options that make them particular. o1 offers out detailed explanations that may actually be useful for individuals who want to perceive a subject intimately. However, Sonnet 3.5 may be very fast in its responses. Itt generates responses in half the time as in comparison with the opposite two fashions. Its responses are concise and all the time to the purpose. In the meantime, DeepSeek R1, though takes time to generate its response, comes up with nice outcomes. Nevertheless, the solutions can have syntax errors which could let down the general expertise.

Conclusion

DeepSeek R1 stands out as a game-changer within the LLM area. It affords reasoning capabilities akin to OpenAI’s o1 sequence and Claude’s Sonnet 3.5, at a fraction of the associated fee. Its reinforcement learning-based strategy and give attention to logic-intensive duties make it a powerful competitor for customers needing assist with deep reasoning, math, or coding duties. Whereas its output is spectacular, occasional syntax errors and slower response occasions present that there’s room for enchancment.

Thus, the selection between DeepSeek R1, o1, and Sonnet 3.5 is determined by your particular job necessities—whether or not it’s value, pace, detailed insights, or reasoning-focused outputs.

Steadily Requested Questions

Q1. What’s DeepSeek R1?

A. DeepSeek R1 is an open-source giant language mannequin designed for logic-intensive duties like math, coding, and Chain-of-Thought (CoT) reasoning, utilizing reinforcement studying as its core coaching methodology.

Q2. How does DeepSeek R1 evaluate to OpenAI’s o1 and Claude’s Sonnet 3.5?

A. DeepSeek R1 affords reasoning and computational capabilities on par with o1 and Sonnet 3.5 however at a a lot decrease value, making it an inexpensive various for customers.

Q3. What’s is the associated fee for utilizing DeepSeek R1?

A. DeepSeek R1 is considerably cheaper, with token enter prices beginning at $0.14 per million for cache hits and $2.19 per million for output tokens.

This autumn. How can I entry DeepSeek R1?

A. You may entry DeepSeek R1 by way of the DeepSeek Chat interface at chat.deepseek.com or by way of its API by signing up at api-docs.deepseek.com.

Q5. What are some use instances of DeepSeek R1?

A. DeepSeek R1 is right for superior reasoning, math problem-solving, coding, creating complicated logic workflows, and Chain-of-Thought reasoning.

Q6. Does DeepSeek R1 assist API integration?

A. Sure, you may combine DeepSeek R1 utilizing its API, which helps real-time requests and job execution with minimal setup.

Q7. Is DeepSeek R1 really open supply?

A. Sure, DeepSeek R1 is absolutely open-source, permitting customers to host and customise it for particular functions.

Anu Madan has 5+ years of expertise in content material creation and administration. Having labored as a content material creator, reviewer, and supervisor, she has created a number of programs and blogs. At the moment, she engaged on creating and strategizing the content material curation and design round Generative AI and different upcoming know-how.