Skip to content

Commit 2cafc2b

Browse files
authored
Merge pull request #12 from IntelLabs/nhasabni/readme_datasets
Added multiple versions of training dataset
2 parents fd62819 + 6bc1734 commit 2cafc2b

File tree

1 file changed

+12
-8
lines changed

1 file changed

+12
-8
lines changed

README.md

+12-8
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
**A friendly request: Thanks for visiting control-flag GitHub repository! If you find control-flag useful, we would appreciate a note from you (to niranjan.hasabnis@intel.com or justin.gottschlich@intel.com). And, of course, we love testimonials!
1+
**A friendly request: Thanks for visiting control-flag GitHub repository! If you find control-flag useful, we would appreciate a note from you (to niranjan.hasabnis@intel.com or justin.gottschlich@intel.com). And, of course, we love testimonials!**
22

33
-- The ControlFlag Team
44

@@ -78,20 +78,24 @@ Verilog support is WIP.
7878

7979
#### Using patterns obtained from 6000 GitHub repos to scan repository of your choice
8080

81-
Download the training data for C language first
82-
([link](https://drive.google.com/file/d/1-jzs3zrKU541hwChaciXSk8zrnMN1mYc/view?usp=sharing)).
81+
Download the training data for C language depending on the memory constraints of your device. Note, however, that using smaller datasets may lead to lower accuracy in results.
82+
83+
Dataset name | Size on disk | Memory requirements | Direct link | gdown ID | MD5 checksum
84+
-------------|--------------|---------------------|-------------|----------|-------------
85+
Small | ~100MB | ~400MB | [link](https://drive.google.com/file/d/1gvUyRXq1SeZD9g3i__RaamYAMo_QaQIb/view?usp=sharing) | 1gvUyRXq1SeZD9g3i__RaamYAMo_QaQIb | 2825f209aba0430993f7a21e74d99889
86+
Medium | ~450MB | ~1.3GB | [link](https://drive.google.com/file/d/1zsCFJAKlZlSAWKPfBcVGcQNlFB5Gtwo3/view?usp=sharing) | 1zsCFJAKlZlSAWKPfBcVGcQNlFB5Gtwo3 | aab2427edebe9ed4acab75c3c6227f24
87+
Large | ~9GB | ~13GB | [link](https://drive.google.com/file/d/1-jzs3zrKU541hwChaciXSk8zrnMN1mYc/view?usp=sharing) | 1-jzs3zrKU541hwChaciXSk8zrnMN1mYc | 1ba954d9716765d44917445d3abf8e85
8388

8489
```
85-
$ python -m pip install gdown && gdown https://drive.google.com/uc?id=1-jzs3zrKU541hwChaciXSk8zrnMN1mYc
86-
$ (optional) md5sum c_lang_if_stmts_6000_gitrepos.ts.tgz
87-
1ba954d9716765d44917445d3abf8e85
88-
$ tar -zxf c_lang_if_stmts_6000_gitrepos.ts.tgz
90+
$ python -m pip install gdown && gdown https://drive.google.com/uc?id=<id_from_table>
91+
$ (optional) md5sum <tgz_file>
92+
$ tar -zxf <tgz_file>
8993
```
9094

9195
To scan C code of your choice, use below command:
9296

9397
```
94-
$ scripts/scan_for_anomalies.sh -d <directory_to_be_scanned_for_anomalies> -t c_lang_if_stmts_6000_gitrepos.ts -o <output_directory_to_store_log_files>
98+
$ scripts/scan_for_anomalies.sh -d <directory_to_be_scanned_for_anomalies> -t <training_data>.ts -o <output_directory_to_store_log_files>
9599
```
96100

97101
Once the run is complete (which could take some time depending on your system and the

0 commit comments

Comments
 (0)