What is Indexed?
|
|
Pages |
News |
Files |
Users |
Apps & Links |
Plugins (e.g. Forms & Surveys) |
---|---|---|---|---|---|---|---|
Title |
Full word |
||||||
Prefix |
|||||||
Content |
Full word |
User fields |
|||||
Prefix |
|||||||
Metadata |
- |
Description |
Teaser text |
- |
- |
Description |
- |
Full word |
|||||||
Prefix |
|||||||
Date |
- |
Title: Refers to the name given to a page, a news, or a file.
Content: Refers to the substance of a page, a news, or a file.
Metadata: Refers to information that is not part of the core of a news or a page. For example, the teaser text for a news and the description field for pages
Description: This refers to additional context about a page. The description is available for Pages only and shows in the search results.
Teaser Text: Refers to the purpose and aim of the news. The teaser text is available for News only.
Full word: Refers to exact word matching. Example: If users search for “Staffbase”, the search results show content that matches the word “Staffbase” exactly.
Prefix: Refers to the prefix word matching. Example: If users search for “Staff”, the search results show content that matches the prefix “Staff”, such as “Staffbase”.
User fields: Refers to the profile fields (system and custom) created by an admin in the Studio settings.
Data Processing During the Indexing
During the indexing, data is processed to make it easier to retrieve information later.
When indexing, the search analyses text in the following ways:
- Case insensitivity: All characters in the content are considered lowercase. For example, apple and Apple are considered the same.
- ASCII characters: The indexing converts alphabetic, numeric, and symbolic characters that are not in the Basic Latin Unicode block (first 127 ASCII characters) to their ASCII equivalent, if one exists. For example, the indexing process changes à to a.
- Language analyzer: The language analyzers are applied to content and teaser text within News and content and description within Pages. The aim is to analyze specific language texts. The analyzers handle the following using the language context:
- Stop words: Stop words are a commonly used word in a language that is typically ignored in search queries or text analysis because it is considered to be of little value in representing the meaning of a sentence. These words are often short and occur frequently in a language but only carry a little specific information about the content, such as “the”," "and," "is," "in," "of," and "to".
- Stemming: Stemming removes suffixes from words to obtain a common linguistic base. This helps to group variations of a word, reduce the dimensionality of the data, and improve the efficiency of text processing and analysis.
- Special characters: The special characters, such as !"#$%&'()*+,-./:;<=>?@[]^_`{|}~§° are replaced with an empty space.
How is Content Ranked?
Staffbase uses a full-text search algorithm, which is a combination of:
- Term frequency (TF): The number of times a given word (term) appears in a document
- Inverse Document Frequency (IDF): The importance of a term used in a text, considering term frequency and document frequency
- Document Length (DL): The length of a document compared to the average length of all documents
The algorithm calculates a relevance score per keyword in the search query. The final score is the sum of all the relevance scores.
Additionally, Staffbase applies boosts based on where the information matches. Those boosts are multiplication factors of the relevance score when the keyword matches the content in:
- Title: x 3
- Page Description: x 2
For News posts and Pages, Staffbase applies boosts to prioritize the most recent news, resulting in the final relevance score:
- Published today or yesterday: +15
- Published this or last week: +10
- Published this or last month: +5
The results are then ranked, with the most relevant ones appearing at the top.
It is also possible to use the dropdown menus for sorting your search results, for example, by date or alphabetical.
Comments
0 comments
Please sign in to leave a comment.