Yavin – Understanding Artificial Intelligence

Foundations

What is Computation?

To understand artificial intelligence, we must first understand what computers do at their most fundamental level. A computer is a machine that performs computation – the systematic execution of instructions to transform input data into output data.

An algorithm is simply a precise sequence of steps to solve a problem. Consider a recipe: it takes ingredients (input) and produces a cake (output) through a series of instructions. Algorithms work the same way, but with data instead of ingredients.

Simple Algorithm Example

Problem: Find the largest number in a list

Algorithm:

Set "largest" to the first number
For each remaining number:
If it's bigger than "largest", update "largest"
Return "largest"

Traditional Programming vs. AI

In traditional programming, a human programmer writes explicit rules for every scenario. If you want a program to identify spam emails, you might write rules like:

If the subject contains "FREE MONEY", mark as spam
If there are more than 5 exclamation marks, mark as spam
If the sender is unknown and includes links, mark as spam

This approach has a fundamental limitation: the programmer must anticipate every possible pattern. What about misspellings? New spam tactics? Legitimate emails that trigger these rules?

Artificial Intelligence represents a paradigm shift: instead of programming explicit rules, we create systems that learn rules from data.

What is Artificial Intelligence?

At its core, Artificial Intelligence (AI) is the science of creating computer systems that can perform tasks typically requiring human intelligence. These tasks include:

Recognizing patterns (faces, voices, objects)
Understanding language
Making decisions based on complex information
Learning from experience
Adapting to new situations

The key insight is that many intelligent behaviors emerge from the ability to recognize patterns in data and make predictions based on those patterns.

Machine Learning

The Core Concept

Machine Learning (ML) is a subset of AI focused on creating algorithms that improve automatically through experience. Instead of being explicitly programmed, these systems learn patterns from data.

Think of it like learning to ride a bicycle: no one gives you a formula. You try, fall, adjust, and gradually develop an intuitive understanding through repeated experience.

The Three Types of Learning

Supervised Learning

The system learns from labeled examples – data where we already know the correct answer.

Example: Teaching a system to recognize cats by showing it thousands of images labeled "cat" or "not cat". The system learns to identify patterns that distinguish cats from other objects.

Applications: Email spam detection, medical diagnosis, price prediction

Unsupervised Learning

The system finds patterns in data without labels, discovering hidden structure on its own.

Example: Given customer shopping data, the system might discover that customers naturally group into segments like "budget shoppers," "luxury buyers," and "discount hunters" – without being told these categories exist.

Applications: Customer segmentation, anomaly detection, data compression

Reinforcement Learning

The system learns through trial and error, receiving rewards for good actions and penalties for bad ones.

Example: Learning to play chess by playing millions of games, receiving positive feedback for wins and negative feedback for losses, gradually improving strategy.

Applications: Game playing, robotics, autonomous vehicles

How Does Learning Actually Work?

At its heart, machine learning is about finding functions – mathematical relationships between inputs and outputs.

Imagine you want to predict house prices based on size. You collect data:

1,000 sq ft → $200,000
1,500 sq ft → $280,000
2,000 sq ft → $350,000

A machine learning algorithm searches for a function (like "price = 150 × square_feet + 50,000") that best fits this data. When a new house appears, we can use this function to predict its price.

The algorithm finds this function through optimization: it starts with a random guess, measures how wrong it is (using a "loss function"), and adjusts the function to reduce the error. This process repeats thousands or millions of times until the predictions are accurate.

The Learning Process

Initialize: Start with a random function (model)
Predict: Use the function to make predictions on training data
Measure Error: Calculate how far off the predictions are
Adjust: Modify the function to reduce error
Repeat: Steps 2-4 until error is minimized

Neural Networks

The Biological Inspiration

Your brain contains roughly 86 billion neurons – cells that process and transmit information. Each neuron:

Receives signals from thousands of other neurons through connections called synapses
Processes these signals (some excitatory, some inhibitory)
Fires its own signal if the combined input exceeds a threshold
Passes this signal to downstream neurons

Learning happens by strengthening or weakening connections between neurons. When you practice piano, you're not adding neurons – you're adjusting connection strengths so the right neurons fire together.

Artificial Neurons

An artificial neuron (also called a perceptron) is a simplified mathematical model:

How an Artificial Neuron Works

Receive inputs: Multiple numerical values (x₁, x₂, x₃, ...)
Weight them: Each input is multiplied by a weight (w₁, w₂, w₃, ...) representing connection strength
Sum them: Add all weighted inputs plus a bias term
Activate: Pass the sum through an activation function
Output: Send the result to the next layer

Output = Activation(w₁x₁ + w₂x₂ + w₃x₃ + ... + bias)

The activation function determines whether and how strongly the neuron "fires." Common examples include:

Sigmoid: Outputs values between 0 and 1 (smooth on/off)
ReLU (Rectified Linear Unit): Outputs the input if positive, otherwise zero
Tanh: Outputs values between -1 and 1

From Neurons to Networks

A single neuron can only learn simple linear patterns. The magic happens when we connect many neurons into a neural network:

Input Layer: Receives the raw data (pixel values, sensor readings, text, etc.)
Hidden Layers: Intermediate layers that transform the data, extracting increasingly abstract features
Output Layer: Produces the final prediction or classification

Information flows forward through the network (forward propagation). Each layer transforms the data, allowing the network to learn complex, non-linear patterns.

Example: Image Recognition

When recognizing a face in a photo:

Layer 1: Detects edges and simple shapes
Layer 2: Combines edges into facial features (eyes, noses, mouths)
Layer 3: Recognizes facial patterns and arrangements
Output: Identifies the specific person

Training Neural Networks

The network starts with random weights. Training adjusts these weights to minimize prediction errors through an algorithm called backpropagation:

Forward Pass: Input data flows through the network, producing a prediction
Calculate Error: Compare the prediction to the correct answer using a loss function
Backward Pass: Work backwards through the network, calculating how much each weight contributed to the error
Update Weights: Adjust weights in the direction that reduces error (using gradient descent)
Repeat: Process thousands or millions of examples

This process is called "backpropagation" because error information propagates backward through the network, guiding weight adjustments.

Deep Learning

What Makes It "Deep"?

Deep Learning refers to neural networks with many hidden layers (hence "deep"). While a traditional neural network might have 1-2 hidden layers, deep networks can have dozens or even hundreds.

Why does depth matter? Each layer learns to represent the data at different levels of abstraction, enabling the network to understand incredibly complex patterns.

The Deep Learning Revolution

For decades, neural networks were theoretical curiosities. Three factors converged around 2012 to trigger a revolution:

Big Data

Deep networks need massive amounts of training data. The internet provided billions of labeled images, text documents, videos, and more – fuel for training.

Computational Power

Training deep networks requires enormous computation. Graphics Processing Units (GPUs), originally designed for video games, turned out to be perfect for the parallel calculations neural networks need.

Algorithmic Improvements

Better activation functions (like ReLU), improved initialization techniques, and advanced optimization algorithms made training deep networks practical.

Specialized Architectures

Different problems require different network architectures:

Convolutional Neural Networks (CNNs)

Purpose: Image and video processing

Key Insight: Images have spatial structure – nearby pixels are related. CNNs use "convolutional layers" that scan across the image with filters, detecting local patterns like edges and textures. These patterns combine into higher-level features.

Applications: Face recognition, medical imaging, autonomous vehicles

Recurrent Neural Networks (RNNs)

Purpose: Sequential data like text, speech, and time series

Key Insight: Language and time-based data have temporal dependencies – the meaning of a word depends on previous words. RNNs maintain "memory" of previous inputs, allowing them to process sequences.

Applications: Language translation, speech recognition, music generation

Transformers

Purpose: Natural language processing and beyond

Key Insight: Instead of processing sequences one element at a time, transformers use "attention mechanisms" to weigh the importance of different parts of the input simultaneously. This allows parallel processing and better handling of long-range dependencies.

Applications: ChatGPT, language translation, code generation

Generative Models

Traditional AI focuses on analysis – classifying images, predicting values. Recent advances enable generation – creating new content:

Generative Adversarial Networks (GANs): Two networks compete – one generates fake images, the other tries to detect fakes. Competition drives both to improve, resulting in photorealistic synthetic images.
Variational Autoencoders (VAEs): Learn compressed representations of data, then generate new examples by sampling from this learned space.
Diffusion Models: Learn to gradually denoise random noise into coherent images, enabling high-quality image generation.

Modern AI

Large Language Models

Language models predict the next word in a sequence. When trained on massive text datasets (billions of words from books, websites, articles), they develop remarkable capabilities:

Understanding context and nuance in language
Generating coherent, contextually appropriate text
Answering questions by synthesizing information
Translating between languages
Writing code, poetry, essays, and more

Models like GPT (Generative Pre-trained Transformer) use transformer architecture with billions of parameters (weights). They're trained in two phases:

Pre-training: Learn general language patterns from massive unlabeled text
Fine-tuning: Specialize for specific tasks using smaller labeled datasets

The surprising finding: language models trained simply to predict the next word develop emergent capabilities not explicitly programmed – like reasoning, problem-solving, and common sense understanding.

Computer Vision

AI vision systems now rival or exceed human performance on specific tasks:

Object Detection: Identifying and locating multiple objects in images
Semantic Segmentation: Labeling every pixel (crucial for autonomous vehicles)
Facial Recognition: Identifying individuals with high accuracy
Medical Imaging: Detecting tumors, analyzing X-rays, predicting disease progression
Image Generation: Creating photorealistic images from text descriptions

Multimodal AI

Recent systems combine multiple types of data:

Vision + Language: Models that understand both images and text can answer questions about images, generate descriptions, or create images from text
Audio + Language: Systems that transcribe speech, understand spoken commands, or generate realistic speech
Video Understanding: Analyzing motion, events, and context across time

These multimodal systems reflect how humans naturally perceive the world – through multiple senses simultaneously.

Real-World Applications

Healthcare

Diagnostic assistance, drug discovery, personalized treatment plans, medical imaging analysis

Finance

Fraud detection, algorithmic trading, credit scoring, risk assessment

Transportation

Autonomous vehicles, traffic optimization, predictive maintenance

Communication

Language translation, speech recognition, sentiment analysis

Science

Protein folding, climate modeling, particle physics, astronomy

Entertainment

Recommendation systems, content generation, game AI

Current Limitations

Despite impressive progress, current AI has fundamental limitations:

True Understanding: AI systems pattern-match extraordinarily well, but lack genuine comprehension of meaning
Common Sense: Humans effortlessly navigate everyday situations; AI struggles with obvious inferences
Generalization: AI trained on one task rarely transfers knowledge to different contexts
Causation: AI finds correlations but doesn't understand cause-and-effect relationships
Creativity: While AI generates novel combinations, true original creative insight remains elusive
Consciousness: We have no evidence AI systems are conscious or have subjective experience

Ethics & Society

Bias and Fairness

AI systems learn from data created by humans – and human data contains human biases. If training data reflects historical discrimination, the AI will perpetuate it:

Facial recognition systems that perform worse on darker skin tones
Hiring algorithms that discriminate based on gender or ethnicity
Criminal justice systems that disproportionately flag certain demographics
Language models that generate stereotypical or offensive content

Addressing bias requires diverse training data, careful evaluation, and ongoing monitoring. Technical solutions exist, but they require acknowledging that AI systems encode societal values.

Privacy and Surveillance

AI enables unprecedented data collection and analysis:

Facial recognition in public spaces
Behavioral tracking and profiling
Predictive policing
Mass surveillance capabilities

Balancing innovation with privacy rights remains an ongoing challenge. Questions include: Who owns your data? How should it be used? What consent is required?

Transparency and Accountability

Deep neural networks are often "black boxes" – even their creators can't fully explain individual decisions. This creates challenges:

How do we audit AI decisions in critical domains (healthcare, criminal justice)?
Who is liable when AI systems cause harm?
Can we trust AI we don't understand?

Research in "explainable AI" aims to create interpretable models, but fundamental tensions exist between performance and interpretability.

Economic Impact

AI will transform the job market:

Automation: Routine cognitive tasks increasingly automated
Augmentation: AI tools enhance human capabilities
New Jobs: Emerging roles in AI development, deployment, and oversight
Disruption: Entire industries may transform or disappear

History shows technology creates new opportunities while eliminating others. The challenge is ensuring smooth transitions and equitable outcomes.

Looking Forward

Artificial General Intelligence

Current AI excels at narrow tasks. AGI would match human-level intelligence across all domains. While predictions vary wildly, most experts believe AGI is decades away – if achievable at all.

Alignment

How do we ensure advanced AI systems pursue goals aligned with human values? Specifying "good" behavior is surprisingly difficult – simple objectives can lead to unexpected, harmful outcomes.

Governance

Who controls powerful AI systems? How should AI development be regulated? International cooperation, safety research, and ethical frameworks are critical.

AI is a tool – powerful, transformative, but ultimately shaped by human choices. Understanding AI empowers informed participation in decisions about its development and deployment. You don't need to be a technical expert to engage with the societal implications of this technology.