Instructions to use Shuu12121/NightOwl-CodeEmbedding with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Shuu12121/NightOwl-CodeEmbedding with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Shuu12121/NightOwl-CodeEmbedding") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
NightOwl CodeEmbedding🦉
NightOwl-CodeEmbedding is a 768-dimensional dense embedding model specialized
for code retrieval, code-edit retrieval, and technical question answering. It
is fine-tuned from Shuu12121/NightOwl
and uses CLS pooling with cosine similarity.
The model does not require query or document prefixes.
Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Shuu12121/NightOwl-CodeEmbedding")
queries = ["Python function that sorts a list in descending order"]
documents = [
"def sort_desc(values): return sorted(values, reverse=True)",
"def average(values): return sum(values) / len(values)",
]
query_embeddings = model.encode(queries, normalize_embeddings=True)
document_embeddings = model.encode(documents, normalize_embeddings=True)
scores = query_embeddings @ document_embeddings.T
print(scores)
Model Details
| Property | Value |
|---|---|
| Base model | Shuu12121/NightOwl |
| Architecture | ModernBERT |
| Parameters | 150,779,136 |
| Embedding dimension | 768 |
| Pooling | CLS |
| Maximum sequence length | 1,024 tokens |
| Similarity | Cosine |
| Query/document prefixes | None |
| Weight dtype | FP32 |
| Weight memory | 575 MiB |
| License | Apache-2.0 |
MTEB Results
The model was evaluated using:
- MTEB version:
2.14.5 - Metric: NDCG@10
- Hardware: NVIDIA GeForce RTX 5090
Multi-subset task scores are macro averages. CodeEditSearchRetrieval uses its
official train evaluation split; the other tasks use test.
| Task | Split | NDCG@10 |
|---|---|---|
| AppsRetrieval | test | 0.36361 |
| COIRCodeSearchNetRetrieval | test | 0.84063 |
| CodeEditSearchRetrieval | train | 0.74720 |
| CodeFeedbackMT | test | 0.76277 |
| CodeFeedbackST | test | 0.85137 |
| CodeSearchNetCCRetrieval | test | 0.91646 |
| CodeSearchNetRetrieval | test | 0.89187 |
| CodeTransOceanContest | test | 0.74091 |
| CodeTransOceanDL | test | 0.35802 |
| CosQA | test | 0.41207 |
| StackOverflowQA | test | 0.86031 |
| SyntheticText2SQL | test | 0.68354 |
| Macro average | 0.70240 |
Training
The model was trained with CachedMultipleNegativesRankingLoss using
bidirectional query-to-document and document-to-query objectives. The generated
training metadata reports 2,534,400 training samples with one positive and
fifteen negatives per anchor.
The training data covers the following MTEB task families:
AppsRetrievalCOIRCodeSearchNetRetrievalCodeFeedbackMTCodeFeedbackSTCodeSearchNetCCRetrievalCodeSearchNetRetrievalCodeTransOceanContestCodeTransOceanDLCosQAStackOverflowQASyntheticText2SQL
Limitations
- The model is specialized for code-related retrieval and may underperform general-purpose text embedding models on unrelated domains.
- Inputs longer than 1,024 tokens are truncated.
- Benchmark scores may include in-domain tasks related to the training data and should not be interpreted as strictly zero-shot results.
Citation
If you use this model, cite Sentence Transformers and the base model where appropriate.
- Downloads last month
- 17
Model tree for Shuu12121/NightOwl-CodeEmbedding
Base model
Shuu12121/NightOwl