Arabic
[ class tree: Arabic ] [ index: Arabic ] [ all elements ]

Class: ArNormalise

Source Location: /sub/ArNormalise.class.php

Class Overview


This class provides various functions to manipulate arabic text and normalise it by applying filters, for example, to strip tatweel and tashkeel, to normalise hamza and lamalephs, and to unshape a joined Arabic text back into its normalised form.


Author(s):

Copyright:

  • 2006-2010 Khaled Al-Shamaa

Variables

Methods



Class Details

[line 96]
This class provides various functions to manipulate arabic text and normalise it by applying filters, for example, to strip tatweel and tashkeel, to normalise hamza and lamalephs, and to unshape a joined Arabic text back into its normalised form.

The functions are helpful for searching, indexing and similar functions.




Tags:

author:  Djihed Afifi <djihed@gmail.com>
copyright:  2006-2010 Khaled Al-Shamaa
link:  http://www.ar-php.org
license:  LGPL


[ Top ]


Class Variables

$chars = array()

[line 101]



Tags:

access:  protected

Type:   mixed


[ Top ]

$normaliseHamzaInput =  'utf-8'

[line 143]

"normaliseHamza" method input charset



Tags:

access:  public

Type:   String


[ Top ]

$normaliseHamzaOutput =  'utf-8'

[line 155]

"normaliseHamza" method output charset



Tags:

access:  public

Type:   String


[ Top ]

$normaliseHamzaVars = array('text')

[line 149]

Name of the textual "normaliseHamza" method parameters



Tags:

access:  public

Type:   Array


[ Top ]

$normaliseInput =  'utf-8'

[line 179]

"normalise" method input charset



Tags:

access:  public

Type:   String


[ Top ]

$normaliseLamalephInput =  'utf-8'

[line 161]

"normaliseLamaleph" method input charset



Tags:

access:  public

Type:   String


[ Top ]

$normaliseLamalephOutput =  'utf-8'

[line 173]

"normaliseLamaleph" method output charset



Tags:

access:  public

Type:   String


[ Top ]

$normaliseLamalephVars = array('text')

[line 167]

Name of the textual "normaliseLamaleph" method parameters



Tags:

access:  public

Type:   Array


[ Top ]

$normaliseOutput =  'utf-8'

[line 191]

"normalise" method output charset



Tags:

access:  public

Type:   String


[ Top ]

$normaliseVars = array('text')

[line 185]

Name of the textual "normalise" method parameters



Tags:

access:  public

Type:   Array


[ Top ]

$stripTashkeelInput =  'utf-8'

[line 125]

"stripTashkeel" method input charset



Tags:

access:  public

Type:   String


[ Top ]

$stripTashkeelOutput =  'utf-8'

[line 137]

"stripTashkeel" method output charset



Tags:

access:  public

Type:   String


[ Top ]

$stripTashkeelVars = array('text')

[line 131]

Name of the textual "stripTashkeel" method parameters



Tags:

access:  public

Type:   Array


[ Top ]

$stripTatweelInput =  'utf-8'

[line 107]

"stripTatweel" method input charset



Tags:

access:  public

Type:   String


[ Top ]

$stripTatweelOutput =  'utf-8'

[line 119]

"stripTatweel" method output charset



Tags:

access:  public

Type:   String


[ Top ]

$stripTatweelVars = array('text')

[line 113]

Name of the textual "stripTatweel" method parameters



Tags:

access:  public

Type:   Array


[ Top ]

$unshape_keys = array()

[line 99]



Tags:

access:  protected

Type:   mixed


[ Top ]

$unshape_map = array()

[line 98]



Tags:

access:  protected

Type:   mixed


[ Top ]

$unshape_values = array()

[line 100]



Tags:

access:  protected

Type:   mixed


[ Top ]

$utf8StrrevInput =  'utf-8'

[line 197]

"utf8Strrev" method input charset



Tags:

access:  public

Type:   String


[ Top ]

$utf8StrrevOutput =  'utf-8'

[line 209]

"utf8Strrev" method output charset



Tags:

access:  public

Type:   String


[ Top ]

$utf8StrrevVars = array('str')

[line 203]

Name of the textual "utf8Strrev" method parameters



Tags:

access:  public

Type:   Array


[ Top ]



Class Methods


constructor __construct [line 215]

ArNormalise __construct( )

Load the Unicode constants that will be used ibn substitutions and normalisations.



Tags:

access:  public


[ Top ]

method normalise [line 333]

string normalise( string $text)

Takes a string, it applies the various filters in this class to return a unicode normalised string suitable for activities such as searching, indexing, etc.



Tags:

return:  the result normalised string.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

string   $text   the text to be normalised.

[ Top ]

method normaliseHamza [line 270]

string normaliseHamza( string $text)

Normalise all Hamza characters to their corresponding aleph character in an Arabic text.



Tags:

return:  the normalised text.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

string   $text   The text to be normalised.

[ Top ]

method normaliseLamaleph [line 298]

string normaliseLamaleph( string $text)

Unicode uses some special characters where the lamaleph and any hamza above them are combined into one code point. Some input system use them. This function expands these characters.



Tags:

return:  the normalised text.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

string   $text   The text to be normalised.

[ Top ]

method stripTashkeel [line 246]

string stripTashkeel( string $text)

Strip all tashkeel characters from an Arabic text.



Tags:

return:  the stripped text.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

string   $text   The text to be stripped.

[ Top ]

method stripTatweel [line 233]

string stripTatweel( string $text)

Strip all tatweel characters from an Arabic text.



Tags:

return:  the stripped text.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

string   $text   The text to be stripped.

[ Top ]

method unichr [line 318]

string unichr( char $u)

Return unicode char by its code point.



Tags:

return:  the result character.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

char   $u   code point

[ Top ]

method unshape [line 363]

string unshape( string $text)

Takes Arabic text in its joined form, it untangles the characters and unshapes them.

This can be used to process text that was processed through OCR or by extracting text from a PDF document.

Note that the result text may need further processing. In most cases, you will want to use the utf8Strrev function from this class to reverse the string.

Most of the work of setting up the characters for this function is done through the ArUnicode.constants.php constants and the constructor loading.




Tags:

return:  the result normalised string.
author:  Djihed Afifi <djihed@gmail.com>
access:  public


Parameters:

string   $text   the text to be unshaped.

[ Top ]

method utf8Strrev [line 376]

string utf8Strrev( string $str, [boolean $reverse_numbers = false])

Take a UTF8 string and reverse it.



Tags:

return:  The reversed string.
access:  public


Parameters:

string   $str   the string to be reversed.
boolean   $reverse_numbers   whether to reverse numbers.

[ Top ]


Documentation generated on Sat, 14 Aug 2010 13:23:56 -0700 by phpDocumentor 1.4.0