LLM-1B-Lab

Educational implementation of a 1.1B parameter LLaMA-style Decoder-Only Transformer, trained from scratch on FineWeb-Edu.

Model Details

Attribute	Value
Parameters	~1.1B
Architecture	LLaMA-style (RMSNorm, RoPE, GQA, SwiGLU, Weight Tying)
Hidden dim	2048
Layers	22
Attention heads	16 (Q) / 4 (KV)
Max sequence length	2048
Vocab size	32,000
Training steps	20,000
Best val loss	2.3653 (perplexity: 10.65)

Training

Dataset: FineWeb-Edu (sample-10BT)
Tokenizer: Pretrained LLaMA 2 (NousResearch/Llama-2-7b-hf, 32K vocab)
Hardware: Google Colab Pro+ (A100 40GB)
Precision: bfloat16 mixed precision
Optimizer: AdamW (lr=3e-4, weight_decay=0.1, beta2=0.95)
Scheduler: Cosine warmup (2000 warmup steps)
Effective batch size: 128

Usage

import torch
from safetensors.torch import load_file
from transformers import AutoTokenizer

# 1. Load config and rebuild model
from llm_lab.config import ModelConfig
from llm_lab.model import LLMModel

model = LLMModel(ModelConfig.base_1b())
state_dict = load_file("model.safetensors")
model.load_state_dict(state_dict, strict=False)  # strict=False for weight tying
model.eval()

# 2. Load tokenizer (pretrained LLaMA 2)
tokenizer = AutoTokenizer.from_pretrained("Vjeong/LLM-1B-Lab")

# 3. Generate text
prompt = "The future of AI is"
input_ids = torch.tensor([tokenizer.encode(prompt)])
output = model.generate(input_ids, max_new_tokens=100, temperature=0.8, top_p=0.9)
print(tokenizer.decode(output[0].tolist()))

License

Apache 2.0

Downloads last month: 258

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Vjeong
/

LLM-1B-Lab

LLM-1B-Lab

Model Details

Training

Usage

License

Dataset used to train Vjeong/LLM-1B-Lab