Learn About AI

Sigmoid Function

The sigmoid function is defined as:

\sigma(x) = \frac{1}{1 + e^{-x}}

It maps any real-valued number into the range (0, 1).

Why is "e"? https://www.youtube.com/watch?v=1GqYpmLjTRQ&t=1s (opens in a new tab)

We can choose any number bigger than 1. But make the sigmoid function more simple, especially when doing differencial.

Deep Learning and Neural Networks

https://www.youtube.com/watch?v=BR9h47Jtqyw (opens in a new tab)

Minimal error
Neural Network
- Sigmoid function to calculate propability.
- Product of the propability to get maximum likelihood.
- Turn Product to Sum to advoid small number, easier to do calculation. Multiplication is an algorithm built on top of addition.
- Logistic regression
- Real neuron include: Dendrites (multi-input), Nucleus (Process), Axon (Single output)
Non-linear region by combining regions with weight.
Constant vs coefficients vs variables in ax + by = c
Deep Neural Network is Neural Network with many hidden layers.

Recurrent Neural Networks

https://www.youtube.com/watch?v=UNmqTiOnRfg (opens in a new tab)

Matrix multiplication
- The number of collumn in 1st matrix must == no of row in 2nd matrix
- AxB != BxA
- Size: The result will be (# rows in 1st) × (# cols in 2nd).
- Calculate: To find the element in position [i, j] of the result, take the dot product of row i from the first matrix and column j from the second matrix.

How Transformer LLMs Work

https://learn.deeplearning.ai/courses/how-transformer-llms-work (opens in a new tab)

Bag-of-words
Word2Vec
RNN is to capture text's context.
Attention to solve issue with long sequence.
Tokenizer
Vocabulary
Greedy decoding
KVCache
Self-attention
- Relevance scoring
- Combination information

"Attention Is All You Need"

https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/ (opens in a new tab)

Hands-on

https://chatgpt.com/share/68ceb9e0-3378-8006-81ff-d056700b5e87 (opens in a new tab) https://grok.com/share/c2hhcmQtNA%3D%3D_e98b1933-5a44-4044-ba4b-013b4ac1c6af (opens in a new tab) https://github.com/minhxuvi/llm-gateway-qa (opens in a new tab)

"Attention Is All You Need" Part 2

https://jalammar.github.io/illustrated-transformer/ (opens in a new tab) https://arxiv.org/abs/1706.03762 (opens in a new tab)

Review https://learn.deeplearning.ai/courses/how-transformer-llms-work (opens in a new tab)

From self-attention lecture
Feed Forward Neural Network Dimension?
- Mixture of Expert
Model dimension?
Attention head?
Key/Value head?
Need to watch https://www.youtube.com/watch?v=UPtG_38Oq8o (opens in a new tab)

Multi Agent From the Big Bang to Dark Energy