Intellyze: Arabic text frequency analyzer software guide | Print |

This article exists in other translations [ Id: ar00006  عربي ], also accessible through the articles page.

👋👋👋 Try Anab 🍇, the Quran words retriever with amazing accuracy and speed. Here is an introductory 6 minute video, here is a detailed 30 minute video, and here is the app.

1. What is Intellyze 3.0

Intellyze 3.0 analyzes text and calculates frequency of Arabic letters and words, all else is dismissed. Letters supported by Intellyze 3.0 are:

ا أ إ آ ء ب ت ة ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن هـ و ؤ ي ى ئ

Words include these two letters also: پ and ڤ. Text input areas of Intellyze are supported by Intellark, Intellaren's new Arabic keyboard layout. To analyze text that contains hundreds of thousands of words, which in turn contain five times as many letters, use Intellyze which is built specifically for that.

In addition to this Intellyze User Guide, there are several supporting videos that show Intellyze-in-action.

2. Intellyze output

During text analysis, Intellyze produces several tables that contain statistics such as number of unique words, total number of letters and words, letter frequency, word frequency, and letter histograms that can sorted using alphabetic ordering or frequency of letters. Figure 1 shows a snapshot of Intellyze at work where we can see the main command panel at top right, several tables at top left, and four tabs at the bottom.

   Figure 1
 

 

Translation of text shown in text-area of Figure 1

Hello and welcome -

Intellyze, or انتلايز , is designed to perform frequency analysis on Arabic letters and words. Intellyze seeks to be the analyst's oasis.

Intellyze is supported with Intellark keyboard layout ( انتلارك ). Type the letters a, s, d now to get the word اسد ; more in the Intellark tab below.

Thanks for reading...

 

3. Ideas behind Intellyze Icon Images

Intellyze comes with fresh ideas for some if its icon images; they are explained next.

  • The image    highlights letters (center top to bottom), which constitute Intellyze input, and numbers (top left to right), which constitute Intellyze output

  • The image    represents empty frequency tables which get filled upon hovering like this:

  • The images    and    represent the two sort types shown in the histogram

  • This image    of clouds represents filled up tables and text area; the image of this clearing sun,  , appears as you hover over

  • The image    represents a green player ready to replace the red one; hovering over shows  , indicating that a replacement operation has begun

  • The image    represents the Internet; hovering over it changes it to , to indicate that search is ready through the Intellaren Search page

  • The image    represents a currently open file or a used up text-area, hovering over it changes it to  , indicating the need to start anew

  • The image    represents closed books or files; hovering over shows this image of an open book:

  • The image    represents a USB to indicating a save operation

  • The image    represents two USB's to indicating a save-as operation

 

4. Intellyze Functions

Intellyze provides more functions that meets the eye. In addition to some of the statistical operations unprecedented in any other software application on Arabic language, Intellyze helps you to easily:

  • copy analysis output to other applications

  • search within text using advanced operations

  • search the Internet

  • analyze frequency of a word that may come in numerous forms (root and branches)
  • apply replace operations while picking from a list that's dynamically provided (see Tool 16 below)

Below is a listing of Intellyze functions to help text analysts. Functions are numbered for ease of future reference.

 

5. Upper Pane Functions

The upper pane shown in Figure 2 is the place where most of the commands for analyzing and modifying text in the text area take place.

 Figure 2


The pane contains different tools that carry out different type of operations; they are explained in the following table.

  Number   Type   Description of expected behavior
  1   single action   each press carries out same operation
  2   on/off   these tools are in one of two states: on or off
  3   text entry   areas where users input text
  4   data display   areas where data are displayed

 

The following table provides a description for each tool in this order: tool number, name, icon, type as shown in the above table, short-cut keys if exist, and finally a description of the tool function.
 

 

Number

Tool

Icon

Type

Short-cut

 

Description

 

1)

analyze frequency

1

Control Enter

 

Count frequency of letters and words, produce graphical histogram for letters

 

2)

sort

1

Control R

 

Sort histogram letter bars, each press switches between sort on alphabetical order or on frequency

 

3)

clear

1

Control N

 

Clear text area, tables and histogram

 

4)

undo

1

Control Z

 

Undo last operation in text area

 

5)

redo

1

Control Y

 

Redo last undone operation in text area

 

6)

alif-lam

2

- - -

 

During search on words without the article "ال", the article "ال" is considered part of the word, and when searching for words that begin with the article "ال", words without the article "ال" are also matched. For example, when searching for the word "الترتيب", the word "ترتيب" is also matched

 

7)

all hamzas equal

2

- - -

 

During searching, the letter alif and its modified forms with hamza (i.e., ا، أ، إ، آ، ء و ٰ ) are considered the same. For example, searching for the word "أكثر" or "اكثر" is considered the same

 

8)

diacritics

2

- - -

 

During search, diacritic symbols are considered part of what has to be matched. For example, searching for the word "زُر" does not match with "زِر" or "زر"

 

9)

word root

2

- - -

 

During searching, search word is considered to be a root word. For example, searching for the word "رتب" matches with all words that contain the letters "ر" ,"ت" and  "ب" (as in the words ترتيبمرتب , and  الترتيبية ), and in the given order, so the word "ربت" does not match

 

10)

advanced search

2

- - -

 

During search, any of wild characters "*", "+", or "؟" may be inserted between letters to look for zero or more, one or more, or zero or one matching letters. Here are three example:

  • searching for "ر*ب" matches words that begin with the letter "ر", zero or more in-between letters, followed by "ب"

  • searching for "ر+ب" matches words that begin with the letter "ر", one or more in-between letters, followed by "ب"

  • searching for "ر؟ب" matches words that begin with the letter "ر", zero or one letter in between, followed by "ب"

 

11)

whole word

2

 

 

During search, match whole words only (i.e. those surrounded by spaces, punctuation marks, end of line...) and not as part of other words. For example, search for the word "كل" does not match any of 
"كلمة" , "كلمات"  or  "كليلة"

 

12)

backward search

1

F4 or Shift F3

  Search backward
 

13)

search field

3

F3 or Control F

 

Focus is brought to search field, and search begins in forward direction

 

14)

number of matches

4

- - -

 

Number of search word matches; note that this number is dependent on the six search parameters exampled in Functions 6 to 11 above

 

15)

forward search

1

F3

 

Search forward

 

16)

replacement

1

Control B

 

This tool provides hyper search (i.e., search begins just as keys are being pressed), and as text replaces from a list that you provide. Figure 3 shows the dialog box that takes care of such interaction and it contains:

  • a search field for the word to be replaced or sought for
  • two buttons for forward and backward searches
  • a text field for the replacement word
  • a green-tick button for executing the replacement operations

If a replace-all operation is desired, tick the "إستبدال شامل" checkbox at the bottom of the dialog box.

But, what if you had more than one word in mind to replace the search word with? In this case all you have to do is add such a list of potential replace-with words using the"أو بـ٠٠٠؟" (or with) button, then replace the search word with any of the replacement words when stopping over matched words in the text. This is shown in figures 4 and 5. In Figure 4, the word "ملك" is replaced with any of the words shown in the list. Figure 5 shows how putting diacritics on last letter of words may be accomplished:

  1. insert a space character in the search field, that triggers a search for end of words, then
  2. add a diacritic character in the Replace-With or Or-With fields you provide below such as " ُ "; that is a damma diacritic followed by a space character, or " َ" which is a fat-ha diacritic followed by a space character...

During search, the search tool will stop at every word that is followed by space, and all you have to do is to select which of the replace-with words you would like to replace the search word with.

Finally, note that you may dispose of any extra or-with field simply by pressing the remove button, exhibited by the red-x button

 

17)

Internet search

1

Control I

 

Searches for the word entered in the search field (see Tool 13 above) using the Intellaren search page

 

18)

new

1

Control Shift N

 

Open new file after properly closing current one if exists

 

19)

open

1

Control O

 

Open files. Note that you can also drag files directly into the text area

 

20)

save

1

Control S

 

Save files. Intellyze will provide the extension "txt" unless another extension is provided by the user

 

21)

save as

1

Control Shift S

 

Save files under a different name

 

 

 Figure 3

 

 

 

Figure 4

Figure 5
 


6. Text Area Functions

The text area comes with the typical expected functions as shown in Figure 6. Following is a listing of such functions.

   Figure 6
 
Translation of text shown in text-area of Figure 6

In this text area, you may use the Intellark keyboard layout for typing, whatever your system allows, or in English.

In addition to opening files using the Open button or through the command Control O, you may also drag the desired file from anywhere outside of Intellyze to this area and it would promptly open.

You may drag parts of the text from one place to another using the left-mouse button after highlighting the desired text.

It is possible to use the right-mouse button to display a menu to cut, copy or paste text. And in the case that there is highlighted text, the mouse will provide you with two more functionalities:

 - either transmit highlighted text to the search field and begin searching for it, or

 - perform partial analysis: analyzing letters and words of all of the highlighted text.

 

 

1)

 

The patented Intellark Keyboard layout is supported when typing in the text area; see the Intellark tab below

 

2)

 

You may type in English or whatever your system supports when disabling the Intellark layout from the Options pane below, or by pressing Control L to enable/disable Intellark

 

3)

 

You may use the mouse buttons to to drag highlighted text around, cut ( ), copy ( ) or paste ( ) text in the text area. Or in the case of the presence of highlighted text, you may also search (    ) for that text, or frequency-analyze(    ) its letters and words.


7. General Statistics Table Functions

Simply pressing the Calculate Frequency button (see Tool 1 above) fills the cells of the table shown in Figure 7.

 

 Figure 7

 

 

 

1)

 

The general statistics table shows these five cells:

  • number of unique words

  • number of all the words

  • number of letters

  • number of lines

  • number of spaces


8. Letter Table Functions

The Letter table displays each letter with its frequency as found in the text and the percentage of that frequency. See Figure 8.

 

 Figure 8

 

 

 

1)

 

Columns are sorted in ascending or descending order when pressing on the column header as shown in Figure 8

 

2)

 

When right-clicking the mouse over a column header, the contents of selected columns are highlighted and are transferred to the clipboard for pasting purposes (simply issue the paste command or Control V for example) in other applications; this is highlighted in the same figure

 

3)

 

You may include the column header titles during the copying process; this is performed by explicitly choosing so from the Options pane described later (see Function 2.3 in Section 11 below)

 

4)

 

When hovering with the mouse on any letter, that letter is highlighted in the histogram pane for ease of visual comparison as shown in Figure 9

 

 

 Figure 9

 

 


9. Word Table Functions

The word table displays each word, together with an identifying number and its frequency as encountered in the text. See Figure 10.

   Figure 10
 
 
Translation of text shown in text-area of Figure 10

Ibn Battuta: 30 years of travel (extracted from Wikipedia pages on the Internet)
-------------------------------------------------------------------------------

::: The first paragraph is extracted from http://ar.wikipedia.org/wiki/ابن_بطوطة :::

More? Click on "بطوطة" in the words table to have the word transferred to the search field above and highlighted in the text area, then use Intellaren search tool (Control I) to search for the word over Intellaren's search page over the Internet.

 

 

1)

 

Columns are sorted in ascending or descending order when pressing on the column header as shown in Figure 8

 

2)

 

When right-clicking the mouse over a column header, the contents of selected columns are highlighted and are transferred to the clipboard for pasting purposes (simply issue the paste command or Control V for example) in other applications; this is also highlighted in Figure 8

  3)   You may include the column header titles during the copying process; this is performed by explicitly choosing so from the Options pane described later (see Function 2.3 in Section 11 below)
 

4)

 

When clicking on a word in the words table, the following is performed:

  1. it is transferred to the search field (Tool 13)

  2. its frequency in the text is counted as a function of the six search parameters (Tools 6 to 11)

  3. the number of occurrences is displayed in the designated field (Tool 14)

  4. search for the word begins starting from current caret position in the text area, where matches are also highlighted

 

5)

 

Searching for words in the word table in the Internet is none but two clicks away:

  1. click on the word in the table; this transfers the word to the search field (Tool 13)

  2. click on the Internet search button or press Control I (Tool 17); this submits the word to the Intellaren search page. Figures 10 and 11 highlight the simplicity of this operation

 

 

 Figure 11

 


10. Matches Table Functions

The Matches table displays a list of words that branch out of a root word supplied in the search field. In Figure 12, an example shows the Matches table displaying a list of words that result when an advanced search is carried out on the root word يوم (yawm, or day in English) in the entire text of the Quran.

   Figure 12
 

 

  1)   The forth column from the right is entitled تجاهل (tajahal, or ignore in English). When a cell is checked, the frequency of the corresponding word is subtracted from the total (جمع) shown in the last row, and the count of words is accordingly decreased by one
  2)   The fifth column from the right is entitled إحذف (ihthif, or delete in English). When a cell is clicked on the button, the whole row is removed, and the cumulative statistics are updated accordingly
  3)   Searching for the same root words again regenerates the original list
  4)   Clicking on a word in words column locates the word in the text area; further clicks locate the next occurrence in a rotary fashion

 


11. Lower Tabs Functions

The lower tabs provide many functions such as displaying letter frequency histogram, setting some preferences, a short description of the Intellark keyboard layout, and information about Intellyze. Any tab may be displayed by simply clicking on its icon handle shown on the right hand side. Following is a description for the function of each tab which is provided in this manner: function number, tab name, identifying icon image, and a description of its contents.

 

Number

Tab

Icon

Description

 

1)

Histogram

Letters are displayed with their names and frequencies based on alphabetic ordering or on frequency

 

2)

Options

The following facilities are provided:

  1. Enable or disable Intellark keyboard layout

  2. Hide or show tooltips when hovering with the mouse over a tool

  3. Include or exclude column header titles when copying table column contents to the clipboard

  4. Resize the font size in the text area

  5. A check-for-update utility to keep your Intellyze copy up-to-date

  6. Intellark typing response tolerance. The values of the time elapsed between key presses range on the slider from very fast (سريع جدًا), which is 200 milliseconds, to relaxed (مرتاح), which is 400 milliseconds, where the value associated with each tick increases by 50 when going down. For example, if the slider is set at 200, then to type ش , ذ , ض or any of the characters that require two or more presses, the elapsed time between any two key presses must be 200 milliseconds or less.

 

3)

Intellark

A short description about Intellark is provided, together with a map of the Intellark keyboard layout. Links to Intellark's main page over the Internet and to tutorials are also provided

 

4)

Intellyze

This tab contains information about the Intellyze copy running on your machine, and contains links to offline and online documentations about Intellyze

 


12. Flexibility of Intellyze Frame


 

1)

 

It is possible to stretch the Intellyze frame to occupying a bigger size, this is accomplished by moving about the boundaries identified with blue-colored ovals

 

2)

 

Intellyze sits on several split panes that are stretchable in the horizontal or vertical direction, this is accomplished by moving the boundaries identified by the cyan-colored ovals. It is also possible to hide a pane in favor of extending an adjacent one, this is accomplished by clicking the arrows on the boundaries identified by the orange-colored ovals

 

 

  Figure 13

 


  

 

 

13. Try Intellyze Now

Copy the sura of Alfati-ha (Sura 1 in the Quran) to the text area of Intellyze and run frequency analysis on it now. Notice how Intellyze steps over numbers, symbols and diacritics, leaving you with the opportunity to focus on what needs to be analyzed, without needing prior text preprocessing activities from you. See http://www.intellaren.com/articles/en/qss to learn more about how Intellyze may be used to analyze the whole text of the Quran.


1|بِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ
2|الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ
3|الرَّحْمَٰنِ الرَّحِيمِ
4|مَالِكِ يَوْمِ الدِّينِ
5|إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ
6|اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ
7|صِرَاطَ الَّذِينَ أَنْعَمْتَ عَلَيْهِمْ غَيْرِ الْمَغْضُوبِ عَلَيْهِمْ وَلَا الضَّالِّينَ

 

14. Contact us

For more information about Intellyze and its new updates, visit www.intellaren.com/intellyze. You may also contact us at www.intellaren.com/contact.