# Information Retrieval

This note mainly focuses on text information retrieval. It is mainly based on JHU's, [ETHzurich's](https://www.systems.ethz.ch/courses/spring2018/informationretrieval) and [Stanford's](http://web.stanford.edu/class/cs276/) information retrieval course.

## Overview

Data -> Search -> User

More detail:

* Data: Getting documents and preprocessing documents
  * Crawler
  * Text preprocessing, Clustering, Information Extraction (Named Entity, Relation, Topic Models, etc.)
  * Forward Index, Inverted Index
* Search: Querying content (Search engine) or filtering content (Recommendation system). Not different that much.
  * Querying: Boolean Retrieval, Vector Space Model, Probabilistic Model, Learning to Rank
  * Filtering: Content Filtering, Collaborative Filtering, Also use Querying methods
  * Ranking: Scoring, Link Analysis
* User: Content presentation

## Retrieval In General

* The way user accessing data: Push mode (Recommendation system like news feed) -> filtering content and Pull model (Search engine) -> querying content.
* Retrieval compared to Database: Database usually holds structured data, with well-defined query semantics.
* Know that user information need is almost always larger than the given query.
* Search core methods: Selection (binary decision) or Ranking (Continous scoring and thresholding). If we assume the utility of a document to a user is independent of any other document and the usesr browse the results sequentially, we could rank documents in descending order of the probability that a document is relevant to the query.
* Search results evaluation: Precision and Recall.

## General Text Preprocessing

* Tokenization
* Normalization: Map term variant to the same form.
* Stemming: Extract root word.
* Stop words: Omit common words

## Main Topics

* [Ad Hoc Retrieval](/topics/informationretrieval/ad_hoc_retrieval.md)
* [Classification and Clustering](/topics/informationretrieval/classification_and_clustering.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://www.noobcoding.com/topics/informationretrieval.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
