Your Notes Are Useless (Unless You Know Your 'Why') Part II

What Caught My Eye This Week - #9

Shen Huang

Apr 18, 2025

Each week, I'll bring you the most relevant and insightful tech stories, saving you time and keeping you informed.

Here is what I'm curretly adopting for taking notes:

There are four main things related to note taking: type, folder, tags, links.

Type

First is the note type.

Fleeting Notes are more like raw notes that I can write at any time.
Permanent Note is more like ideas that writes in a good readable format.
Index Note is for indexing notes within same topic. or a organized notes just for indexing purpose
Project Note is for project related notes. Tracking project ideas or progress etc.
Concept: Defining one concept. Smaller than a index note
Question: notes related to one question. How to ....?

Folder

Folder is only used for process purpose. E.g. handling raw notes to permanent notes.

00_Inbox #raw notes
01_Journay # daily-notes
10_LiteratureNotes
20_PermanentNotes
21_IndexNotes
22_Weekly # folder for weekly newsletter
30_Projects # tracking project notes
40_Resources # images etc
50_Archive # notes not used any more

Tags

Pre-generated 20+ tags for topics related to the note.

Links

When two notes are related, just adding a link for the note. Making sure knowledge is connected.

Notes graph can be auto generated after you add links between two notes.

AI…

A birdeye view of RAG techniques:

Foundational RAG Techniques (🌱):

Basic Implementation: Setting up the fundamental RAG pipeline using LangChain and LlamaIndex.
Specific Data Sources: How to use CSV files as a data source for building RAG systems.
Reliability Enhancement (Reliable RAG): Adding validation and refinement steps to basic RAG to ensure information accuracy.
Chunk Size Optimization: Exploring how to choose appropriate text chunk sizes to balance context preservation and retrieval efficiency.
Proposition Chunking: Breaking down text into smaller, factual propositional units for more precise query matching.

Query Enhancement Techniques (🔍):

Query Transformations: Optimizing the user's original query through methods like rewriting, decomposition (sub-queries), or step-back prompting to better match relevant documents.
Hypothetical Questions (HyDE): Generating potential questions that a document chunk might answer, transforming retrieval into a "question-question" matching task to improve relevance.
Hypothetical Prompt Embeddings (HyPE): Pre-computing embeddings for hypothetical questions during indexing, allowing for faster query-time matching.

Context and Content Enrichment Techniques (📚):

Contextual Chunk Headers: Prepending summary information about the source document or section to each chunk to enrich its embedding context.
Relevant Segment Extraction: Dynamically merging adjacent relevant chunks after retrieval to provide more complete context.
Sentence Window Retrieval: Retrieving the most relevant single sentence and automatically including its preceding and succeeding sentences to expand local context.
Semantic Chunking: Dividing documents based on semantic coherence rather than fixed sizes.
Contextual Compression: Using an LLM to compress retrieved content, preserving the most query-relevant information.
Document Augmentation (via Question Generation): Generating various potential questions for documents and adding them to the index to increase the chances of retrieval.

Advanced Retrieval Methods (🚀):

Fusion Retrieval: Combining the strengths of multiple retrieval methods (e.g., keyword search + vector search).
Intelligent Reranking: Using more sophisticated models (like Cross-Encoders or LLMs) to re-order initial retrieval results, boosting the relevance of top results.
Multi-faceted Filtering: Applying various filters based on metadata (date, source), similarity thresholds, content keywords, etc.
Hierarchical Indices: Creating multi-level index structures (e.g., document summaries and detailed chunks) for efficiency.
Ensemble Retrieval: Combining results from multiple different retrieval models or algorithms.
Dartboard Retrieval: Optimizing retrieval for both relevance and diversity simultaneously.
Multi-modal Retrieval: Techniques for handling and retrieving data involving multiple types like text and images.

Iterative and Adaptive Techniques (🔁):

Retrieval with Feedback Loops: Using user feedback to continuously improve retrieval and generation models.
Adaptive Retrieval: Dynamically adjusting retrieval strategies based on query type or user context.
Iterative Retrieval: Performing multiple rounds of retrieval, using results from previous rounds to refine subsequent queries.

Evaluation (📊):

DeepEval / GroUSE Evaluation: Providing methods and metrics using specific frameworks (like DeepEval, GroUSE) for comprehensive RAG system performance evaluation.

Explainability (🔬):

Explainable Retrieval: Offering methods to explain why specific pieces of information were retrieved, increasing system transparency.

Advanced Architectures (🏗️):

Knowledge Graph Integration (Graph RAG): Incorporating structured information from knowledge graphs to enhance retrieval and generation.
GraphRag (Microsoft): Microsoft's open-source advanced RAG system utilizing knowledge graphs.
RAPTOR: A method involving recursive processing and summarization of information, organized in a tree structure for retrieval.
Self RAG / Corrective RAG: More intelligent RAG frameworks that can autonomously decide whether to retrieve, assess retrieval quality, and even use web search to correct or supplement information.

Special Advanced Technique (🌟):

Sophisticated Controllable Agent: An advanced agent solution using a deterministic graph, designed to tackle complex questions that simple semantic similarity retrieval cannot solve.

New Products…

TheLibrarian.io - Your WhatsApp AI Assistant

WhatsApp AI Assistant designed to Master Your Inbox, Control Your Schedule, Find Anything You Need - so you can focus on what truly matters. It seamlessly integrates with all your Google Apps (Gmail, Drive, Calendar, Contacts), Slack, and Notion.

@shenhuang

Discussion about this post