Instructions to use Lazyhope/python-clone-detection with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Lazyhope/python-clone-detection with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-feature-extraction", model="Lazyhope/python-clone-detection", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("Lazyhope/python-clone-detection", trust_remote_code=True) model = AutoModel.from_pretrained("Lazyhope/python-clone-detection", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| # Python clone detection | |
| This is a codebert model for detecting Python clone codes, fine-tuned on the dataset shared by [PoolC](https://github.com/PoolC) on [Hugging Face Hub](https://huggingface.co/datasets/PoolC/1-fold-clone-detection-600k-5fold). The original source code for using the model can be found at https://github.com/sangHa0411/CloneDetection/blob/main/inference.py. | |
| # How to use | |
| To use the model in an efficient way, you can refer to this repository: https://github.com/RepoAnalysis/PythonCloneDetection, which contains a class that integrates data preprocessing, input tokenization, and model inferencing. | |
| You can also follow the original inference source code at https://github.com/sangHa0411/CloneDetection/blob/main/inference.py. | |
| More conveniently, a pipeline for this model has been implemented, and you can initialize it with only two lines of code: | |
| ```python | |
| from transformers import pipeline | |
| pipe = pipeline(model="Lazyhope/python-clone-detection", trust_remote_code=True) | |
| ``` | |
| To use it, pass a tuple of code pairs: | |
| ```python | |
| code1 = """def token_to_inputs(feature): | |
| inputs = {} | |
| for k, v in feature.items(): | |
| inputs[k] = torch.tensor(v).unsqueeze(0) | |
| return inputs""" | |
| code2 = """def f(feature): | |
| return {k: torch.tensor(v).unsqueeze(0) for k, v in feature.items()}""" | |
| is_clone = pipe((code1, code2)) | |
| is_clone | |
| # {False: 1.3705984201806132e-05, True: 0.9999862909317017} | |
| ``` | |
| # Credits | |
| We would like to thank the original team and authors of the model and the fine-tuning dataset: | |
| - [PoolC](https://github.com/PoolC) | |
| - [sangHa0411](https://github.com/sangHa0411) | |
| - [snoop2head](https://github.com/snoop2head) | |
| # Lincese | |
| This model is released under the MIT license. | |