The application of KNN in handwriting recognition in (Java)

Classification:

This article mainly introduces a classification method based on machine learning, K- (KNN), adjacent to and the use of this method to complete a simple handwritten numeral recognition system.

KNN overview

What is KNN

KNN (K - nearest-neighbor), namely K- neighbor algorithm, called K is adjacent, K nearest neighbors, said that each sample can be used with its closest neighbors said to K.

working principle

There is a sample data set, also known as training samples, and the sample data are on each label, which we know correspondence between the sample data and the concentration of each classification. No label input new data, new data of each feature and sample concentration data corresponding to the characteristics were compared, then the algorithm to extract samples of most similar data classification labels, generally speaking, we just take the sample data in the first k a most similar data sets, finally in the k data and Statistics Department appeared the highest number of classification, the classification of new data.

The algorithm features

• Advantages: high accuracy and sensitivity, the abnormal value of data input hypothesis

• Disadvantages: high computational complexity and high space complexity.

• Applicable scope: nominal and numerical data type

The algorithm flow

Each store of unknown types of attribute data set in order to perform the following operations:

• Between the calculated known categories data set points and current point distance

• In accordance with the distance increasing order.

• The current point selection and the minimum distance of K points

• To determine the frequency K point category

• Returns the first K points appear as the highest frequency of the category classification of current point

For calculating the distance, we use Euclidean distance formula:

Application of KNN handwriting recognition (Java)

sketch

Handwriting recognition of what we do is to recognize the handwritten digital data form is simple, the text file:

We have some sample data, and then use some test data to test the algorithm.

The algorithm source code and data, see:Https://github.com/luoyhang003/machine-learning-in-java/tree/master/k-Nearest-Neighbour

Concrete realization

The code is rotten, but the realization of the KNN algorithm, and no optimization, please forgive me!

• First of all we need to put these text into vector can be stored in the array.
``````    Public Static IntData2Vec (String fileName) {[]
Int= arr[]New Int[Thirty-two*Thirty-two];

Try{

For(Int Index=Zero;Index<Thirty-two;Index{+ +)
Int= lengthStr(.Length);
For(Int= IZeroI; length; i++) {
String = CStr.substring (I, I +One);
Arr[Thirty-two*Index+i] = Integer.parseInt (C);
}
}

}Catch(FileNotFoundException E) {
(e.printStackTrace);
}Catch(IOException E) {
(e.printStackTrace);
}

ReturnArr;
}
``````
• You need to define an algorithm to calculate the distance between each of the two vector
``````    Public Static Double CalDistance(Int[] a,Int{B [])
Double= resultZero;
Int= tempZero;
For(Int= IZeroI; a.length; i++) {
Temp = (a[i] - b[i]) * (a[i] - b[i]);
}
Result = Math.sqrt (temp);

ReturnResult;
}
``````
• Then we can begin to classify
``````    Public Static IntClassify (String fileName) {[]
Int= result[]New Int[Two];

IntArr[] = data2Vec ("Samples/testDigits/"+fileName);

Result[Zero((fileName.split = Integer.parseInt]"_"[)Zero]);

Double= dis[]New Double[K];
Int= num[]New Int[K];

For(Int Index=Zero;Index< K;Index{+ +)
Dis[Index=]Thirty-two;
Num[Index] = -One;
}

For(Int= IZero< = I;Nine{i++);
For(Int= JZeroJ; "One hundred{j++);
IntTemp_arr[] = data2Vec ("Samples/trainingDigits/"+i+"_"+j+".txt");
DoubleTemp_dis = calDistance (arr, temp_arr);

For(Int= kZeroK; K; k++) {
If(temp_dis < dis[k]) {
Dis[k] = temp_dis;
Num[k] = i;
Break;
}
}
}
}

Int Count[] =New Int[Ten];

For(Int= IZeroI; "TenI++);
Count= [i]Zero;

For(Int= IZeroI; K; i++) {
If(num[i] = -!One)
Count[num[i]]++;
}

Int= maxZero;
For(Int= IZeroI; "Ten{i++);
If(Count[i]>max) {
= maxCount[i];
Result[One] = i;
}
}

ReturnResult;
}``````
• Finally, we test the algorithm
``````    Public Static Void Main(String args[]) {

Double= rightZero;
Double= sumZero;

For(Int= IZeroI; "Ten{i++);
For(Int= JZeroJ; "Fifty{j++);
IntResult[] = classify (""+i+"_"+j+".txt");

System.Out(.Println"The classifier came back with"+result[One]+"The, real answer is"+result[Zero]);
Sum++;
If(result[Zero]==result[One])
Right++;
}
}
System.Out(.Println"Right"+right);
System.Out(.Println"Sum"+sum);

DoubleRate = right/sum;
System.Out(.Println"The total right rate is"+ rate);

}``````

The result is:

``````TheClassifier cameBack With:Zero,The RealAnswerIs:Zero
......

Right:Four hundred and eighty-six
Sum:Five hundred
TheTotal right rateIs:Zero point nine seven two
``````

See the complete code and test data:Https://github.com/luoyhang003/machine-learning-in-java/tree/master/k-Nearest-Neighbour

Disclaimer: This article as a blogger blogger allows the original article, shall not be reproduced without.

The source:Http://blog.luoyuanhang.com

top
Two
step on
Zero
Guess you're looking for
* the above user comments represent the personal views do not represent the views or position of the CSDN website
personal data
• visit34555 times
• Integral:One thousand two hundred and twenty-seven