Disclaimer: This article as a blogger blogger allows the original article, shall not be reproduced without. (source: http://blog.luoyuanhang.com)

This article mainly introduces a classification method based on machine learning, K- (KNN), adjacent to and the use of this method to complete a simple handwritten numeral recognition system.

# KNN overview

## What is KNN

KNN (K - nearest-neighbor), namely K- neighbor algorithm, called K is adjacent, K nearest neighbors, said that each sample can be used with its closest neighbors said to K.

## working principle

There is a sample data set, also known as training samples, and the sample data are on each label, which we know correspondence between the sample data and the concentration of each classification. No label input new data, new data of each feature and sample concentration data corresponding to the characteristics were compared, then the algorithm to extract samples of most similar data classification labels, generally speaking, we just take the sample data in the first k a most similar data sets, finally in the k data and Statistics Department appeared the highest number of classification, the classification of new data.

## The algorithm features

Advantages: high accuracy and sensitivity, the abnormal value of data input hypothesis

Disadvantages: high computational complexity and high space complexity.

Applicable scope: nominal and numerical data type

## The algorithm flow

Each store of unknown types of attribute data set in order to perform the following operations:

Between the calculated known categories data set points and current point distance

In accordance with the distance increasing order.

The current point selection and the minimum distance of K points

To determine the frequency K point category

Returns the first K points appear as the highest frequency of the category classification of current point

For calculating the distance, we use Euclidean distance formula:

# Application of KNN handwriting recognition (Java)

## sketch

Handwriting recognition of what we do is to recognize the handwritten digital data form is simple, the text file:

We have some sample data, and then use some test data to test the algorithm.

The algorithm source code and data, see:Https://github.com/luoyhang003/machine-learning-in-java/tree/master/k-Nearest-Neighbour

## Concrete realization

The code is rotten, but the realization of the KNN algorithm, and no optimization, please forgive me!

- First of all we need to put these text into vector can be stored in the array.

```
Public Static IntData2Vec (String fileName) {[]
Int= arr[]New Int[Thirty-two*Thirty-two];
Try{
FileReader = readerNewFileReader (fileName);
BufferedReader = bufferNewBufferedReader (reader);
For(Int Index=Zero;Index<Thirty-two;Index{+ +)
StringStr(= buffer.readLine);
Int= lengthStr(.Length);
For(Int= IZeroI; length; i++) {
String = CStr.substring (I, I +One);
Arr[Thirty-two*Index+i] = Integer.parseInt (C);
}
}
}Catch(FileNotFoundException E) {
(e.printStackTrace);
}Catch(IOException E) {
(e.printStackTrace);
}
ReturnArr;
}
```

- You need to define an algorithm to calculate the distance between each of the two vector

```
Public Static Double CalDistance(Int[] a,Int{B [])
Double= resultZero;
Int= tempZero;
For(Int= IZeroI; a.length; i++) {
Temp = (a[i] - b[i]) * (a[i] - b[i]);
}
Result = Math.sqrt (temp);
ReturnResult;
}
```

- Then we can begin to classify

```
Public Static IntClassify (String fileName) {[]
Int= result[]New Int[Two];
IntArr[] = data2Vec ("Samples/testDigits/"+fileName);
Result[Zero((fileName.split = Integer.parseInt]"_"[)Zero]);
Double= dis[]New Double[K];
Int= num[]New Int[K];
For(Int Index=Zero;Index< K;Index{+ +)
Dis[Index=]Thirty-two;
Num[Index] = -One;
}
For(Int= IZero< = I;Nine{i++);
For(Int= JZeroJ; "One hundred{j++);
IntTemp_arr[] = data2Vec ("Samples/trainingDigits/"+i+"_"+j+".txt");
DoubleTemp_dis = calDistance (arr, temp_arr);
For(Int= kZeroK; K; k++) {
If(temp_dis < dis[k]) {
Dis[k] = temp_dis;
Num[k] = i;
Break;
}
}
}
}
Int Count[] =New Int[Ten];
For(Int= IZeroI; "TenI++);
Count= [i]Zero;
For(Int= IZeroI; K; i++) {
If(num[i] = -!One)
Count[num[i]]++;
}
Int= maxZero;
For(Int= IZeroI; "Ten{i++);
If(Count[i]>max) {
= maxCount[i];
Result[One] = i;
}
}
ReturnResult;
}
```

- Finally, we test the algorithm

```
Public Static Void Main(String args[]) {
Double= rightZero;
Double= sumZero;
For(Int= IZeroI; "Ten{i++);
For(Int= JZeroJ; "Fifty{j++);
IntResult[] = classify (""+i+"_"+j+".txt");
System.Out(.Println"The classifier came back with"+result[One]+"The, real answer is"+result[Zero]);
Sum++;
If(result[Zero]==result[One])
Right++;
}
}
System.Out(.Println"Right"+right);
System.Out(.Println"Sum"+sum);
DoubleRate = right/sum;
System.Out(.Println"The total right rate is"+ rate);
}
```

The result is:

```
TheClassifier cameBack With:Zero,The RealAnswerIs:Zero
TheClassifier cameBack With:Zero,The RealAnswerIs:Zero
TheClassifier cameBack With:Zero,The RealAnswerIs:Zero
TheClassifier cameBack With:Zero,The RealAnswerIs:Zero
......
TheClassifier cameBack With:Nine,The RealAnswerIs:Nine
TheClassifier cameBack With:Nine,The RealAnswerIs:Nine
TheClassifier cameBack With:Nine,The RealAnswerIs:Nine
TheClassifier cameBack With:Nine,The RealAnswerIs:Nine
TheClassifier cameBack With:Nine,The RealAnswerIs:Nine
Right:Four hundred and eighty-six
Sum:Five hundred
TheTotal right rateIs:Zero point nine seven two
```

See the complete code and test data:Https://github.com/luoyhang003/machine-learning-in-java/tree/master/k-Nearest-Neighbour

**Disclaimer: This article as a blogger blogger allows the original article, shall not be reproduced without.**

The source:Http://blog.luoyuanhang.com

- top
- Two

- step on
- Zero

- Guess you're looking for