From 568cd7a59bf61bb9f8477e25bf5cf3309be68836 Mon Sep 17 00:00:00 2001 From: Dylan Ayrey Date: Fri, 30 Dec 2016 23:08:12 -0600 Subject: [PATCH 001/108] Initial commit --- LICENSE | 339 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 2 + 2 files changed, 341 insertions(+) create mode 100644 LICENSE create mode 100644 README.md diff --git a/LICENSE b/LICENSE new file mode 100644 index 000000000000..23cb790338e1 --- /dev/null +++ b/LICENSE @@ -0,0 +1,339 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc., + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Lesser General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + {description} + Copyright (C) {year} {fullname} + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License along + with this program; if not, write to the Free Software Foundation, Inc., + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) year name of author + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + {signature of Ty Coon}, 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. diff --git a/README.md b/README.md new file mode 100644 index 000000000000..ac62b403cf0b --- /dev/null +++ b/README.md @@ -0,0 +1,2 @@ +# truffleHog +Searches through git repositories for high entropy strings, digging deep into commit history From 2ead44894dd97c874ae8113809a7102c7d05fd5d Mon Sep 17 00:00:00 2001 From: flower Date: Fri, 30 Dec 2016 23:10:07 -0600 Subject: [PATCH 002/108] adding initial files --- temp/nothing | 1 + truffleHog.py | 101 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+) create mode 100644 temp/nothing create mode 100644 truffleHog.py diff --git a/temp/nothing b/temp/nothing new file mode 100644 index 000000000000..9c558e357c41 --- /dev/null +++ b/temp/nothing @@ -0,0 +1 @@ +. diff --git a/truffleHog.py b/truffleHog.py new file mode 100644 index 000000000000..20454b6239d8 --- /dev/null +++ b/truffleHog.py @@ -0,0 +1,101 @@ +import os, math, string +import argparse +from uuid import uuid4 +from git import Repo + +BASE64_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=" +HEX_CHARS = "1234567890abcdefABCDEF" + +def shannon_entropy(data, iterator): + if not data: + return 0 + entropy = 0 + for x in (ord(c) for c in iterator): + p_x = float(data.count(chr(x)))/len(data) + if p_x > 0: + entropy += - p_x*math.log(p_x, 2) + return entropy + + +def get_strings_of_set(word, char_set, threshold=20): + count = 0 + letters = "" + strings = [] + for char in word: + if char in char_set: + letters += char + count += 1 + else: + if count > 20: + strings.append(letters) + letters = "" + count = 0 + if count > threshold: + strings.append(letters) + return strings + +class bcolors: + HEADER = '\033[95m' + OKBLUE = '\033[94m' + OKGREEN = '\033[92m' + WARNING = '\033[93m' + FAIL = '\033[91m' + ENDC = '\033[0m' + BOLD = '\033[1m' + UNDERLINE = '\033[4m' + +def find_strings(git_url): + new_project = str(uuid4()) + project_path = os.path.join("temp", new_project) + + Repo.clone_from(git_url, project_path) + + repo = Repo(project_path) + + + for i in repo.remotes.origin.fetch(): + branch_name = str(i).split('/')[1] + try: + repo.git.checkout(i, b=branch_name) + except: + pass + + prev_commit = None + for curr_commit in repo.iter_commits(): + if not prev_commit: + pass + else: + diff = prev_commit.diff(curr_commit, create_patch=True) + for i in diff: + #print i.a_blob.data_stream.read() + printableDiff = i.diff + foundSomething = False + lines = i.diff.split("\n") + for line in lines: + for word in line.split(): + base64_strings = get_strings_of_set(word, BASE64_CHARS) + hex_strings = get_strings_of_set(word, HEX_CHARS) + for string in base64_strings: + b64Entropy = shannon_entropy(string, BASE64_CHARS) + if b64Entropy > 4.5: + foundSomething = True + printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) + for string in hex_strings: + hexEntropy = shannon_entropy(string, HEX_CHARS) + if hexEntropy > 3: + foundSomething = True + printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) + if foundSomething: + print printableDiff + + + prev_commit = curr_commit + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') + parser.add_argument('git_url', type=str, help='URL for secret searching') + + + args = parser.parse_args() + find_strings(args.git_url) + From 739c615472f8ba258136db07c73a9546e474d1f3 Mon Sep 17 00:00:00 2001 From: flower Date: Fri, 30 Dec 2016 23:11:12 -0600 Subject: [PATCH 003/108] adding dependancies --- requirements.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 requirements.txt diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 000000000000..0c6b635dc061 --- /dev/null +++ b/requirements.txt @@ -0,0 +1 @@ +GitPython==2.1.1 From 202564cf776b402800a4aab8bb14fa4624888475 Mon Sep 17 00:00:00 2001 From: flower Date: Fri, 30 Dec 2016 23:16:57 -0600 Subject: [PATCH 004/108] updating the readme --- README.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ac62b403cf0b..e2f565778a64 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,13 @@ -# truffleHog -Searches through git repositories for high entropy strings, digging deep into commit history +# Truffle Hog +Searches through git repositories for high entropy strings, digging deep into commit history and branches. This is effective at finding secrets accidentally committed that contain high entropy. + +![Example](https://i.imgur.com/aGSIEd9.png) + +## Setup +The only requirement is GitPython, which can be installed with the following +``` +pip install -r requirements.txt +``` + +## How it works +This module will go through the entire commit history of each branch, and check each diff from each commit, and evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text in each diff. If at any point a high entropy string is detected, it will print to the screen. From d15627104d07846ac2914a976e8e347a663bbd9b Mon Sep 17 00:00:00 2001 From: flower Date: Fri, 30 Dec 2016 23:19:39 -0600 Subject: [PATCH 005/108] Oh no a secret file --- secretFile.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 secretFile.txt diff --git a/secretFile.txt b/secretFile.txt new file mode 100644 index 000000000000..7a9c22b4cc0c --- /dev/null +++ b/secretFile.txt @@ -0,0 +1 @@ +iZNOIPugKnpjh6D6tyNmeGmU7lyPfhIIqGeKYOoyFN9WUB9ZaDPQmv3sPi3g5wFY3UxmhowWYD1GYVq7E54xJYMvrLH6yJt8UlN4 From 709f22821820a7815106f82a03e8d90f50b2b653 Mon Sep 17 00:00:00 2001 From: flower Date: Fri, 30 Dec 2016 23:19:51 -0600 Subject: [PATCH 006/108] removing secret --- secretFile.txt | 1 - 1 file changed, 1 deletion(-) delete mode 100644 secretFile.txt diff --git a/secretFile.txt b/secretFile.txt deleted file mode 100644 index 7a9c22b4cc0c..000000000000 --- a/secretFile.txt +++ /dev/null @@ -1 +0,0 @@ -iZNOIPugKnpjh6D6tyNmeGmU7lyPfhIIqGeKYOoyFN9WUB9ZaDPQmv3sPi3g5wFY3UxmhowWYD1GYVq7E54xJYMvrLH6yJt8UlN4 From 9ed54617547cfca783e0f81f8dc5c927e3d1e345 Mon Sep 17 00:00:00 2001 From: flower Date: Fri, 30 Dec 2016 23:20:50 -0600 Subject: [PATCH 007/108] OH no a secret --- README.md | 4 ++++ temp/nothing | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index e2f565778a64..3f9521233797 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,10 @@ # Truffle Hog Searches through git repositories for high entropy strings, digging deep into commit history and branches. This is effective at finding secrets accidentally committed that contain high entropy. +``` +python truffleHog.py https://github.com/dxa4481/truffleHog.git +``` + ![Example](https://i.imgur.com/aGSIEd9.png) ## Setup diff --git a/temp/nothing b/temp/nothing index 9c558e357c41..8f60b4dec620 100644 --- a/temp/nothing +++ b/temp/nothing @@ -1 +1 @@ -. +iZNOIPugKnpjh6D6tyNmeGmU7lyPfhIIqGeKYOoyFN9WUB9ZaDPQmv3sPi3g5wFY3UxmhowWYD1GYVq7E54xJYMvrLH6yJt8UlN4. From 94ba3861069f8533989bc6d570021c3d92caf821 Mon Sep 17 00:00:00 2001 From: flower Date: Fri, 30 Dec 2016 23:21:09 -0600 Subject: [PATCH 008/108] Nothing to see here --- temp/nothing | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/temp/nothing b/temp/nothing index 8f60b4dec620..9dafe9be2099 100644 --- a/temp/nothing +++ b/temp/nothing @@ -1 +1 @@ -iZNOIPugKnpjh6D6tyNmeGmU7lyPfhIIqGeKYOoyFN9WUB9ZaDPQmv3sPi3g5wFY3UxmhowWYD1GYVq7E54xJYMvrLH6yJt8UlN4. +nothing From 2775689dea9b996acbfeef33552950657d301312 Mon Sep 17 00:00:00 2001 From: flower Date: Sat, 31 Dec 2016 08:47:06 -0600 Subject: [PATCH 009/108] added commit info --- truffleHog.py | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/truffleHog.py b/truffleHog.py index 20454b6239d8..b6a64953cd74 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -1,5 +1,4 @@ -import os, math, string -import argparse +import os, math, string, datetime, argparse from uuid import uuid4 from git import Repo @@ -53,10 +52,10 @@ def find_strings(git_url): repo = Repo(project_path) - for i in repo.remotes.origin.fetch(): - branch_name = str(i).split('/')[1] + for remote_branch in repo.remotes.origin.fetch(): + branch_name = str(remote_branch).split('/')[1] try: - repo.git.checkout(i, b=branch_name) + repo.git.checkout(remote_branch, b=branch_name) except: pass @@ -66,11 +65,11 @@ def find_strings(git_url): pass else: diff = prev_commit.diff(curr_commit, create_patch=True) - for i in diff: + for blob in diff: #print i.a_blob.data_stream.read() - printableDiff = i.diff + printableDiff = blob.diff foundSomething = False - lines = i.diff.split("\n") + lines = blob.diff.split("\n") for line in lines: for word in line.split(): base64_strings = get_strings_of_set(word, BASE64_CHARS) @@ -86,6 +85,9 @@ def find_strings(git_url): foundSomething = True printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) if foundSomething: + commit_time = datetime.datetime.fromtimestamp(curr_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') + print bcolors.OKGREEN + "Commit: " + curr_commit.message + "Date: " + commit_time + bcolors.ENDC + print bcolors.OKGREEN + "Branch: " + branch_name + bcolors.ENDC print printableDiff From 6ebcd5a82e917c59527de7d221e7b2fab9972101 Mon Sep 17 00:00:00 2001 From: flower Date: Sat, 31 Dec 2016 09:02:25 -0600 Subject: [PATCH 010/108] updating funny color issue --- truffleHog.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/truffleHog.py b/truffleHog.py index b6a64953cd74..cf190339b537 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -85,9 +85,10 @@ def find_strings(git_url): foundSomething = True printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) if foundSomething: - commit_time = datetime.datetime.fromtimestamp(curr_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') - print bcolors.OKGREEN + "Commit: " + curr_commit.message + "Date: " + commit_time + bcolors.ENDC + commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') + print bcolors.OKGREEN + "Date: " + commit_time + bcolors.ENDC print bcolors.OKGREEN + "Branch: " + branch_name + bcolors.ENDC + print bcolors.OKGREEN + "Commit: " + prev_commit.message + bcolors.ENDC print printableDiff From e424001fc3dc9ee872ebbe0367b49a6968abf361 Mon Sep 17 00:00:00 2001 From: flower Date: Sat, 31 Dec 2016 09:15:08 -0600 Subject: [PATCH 011/108] cleaning up temp when finished --- truffleHog.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/truffleHog.py b/truffleHog.py index cf190339b537..f3eb6bb58935 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -1,4 +1,4 @@ -import os, math, string, datetime, argparse +import shutil, os, math, string, datetime, argparse from uuid import uuid4 from git import Repo @@ -93,6 +93,7 @@ def find_strings(git_url): prev_commit = curr_commit + shutil.rmtree(project_path) if __name__ == "__main__": parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') From a8f989897ddd111e9f313fe8b998824f88cb230e Mon Sep 17 00:00:00 2001 From: Nat Welch Date: Sat, 31 Dec 2016 09:59:55 -0800 Subject: [PATCH 012/108] Ignore all files in temp/ --- temp/.gitignore | 1 + 1 file changed, 1 insertion(+) create mode 100644 temp/.gitignore diff --git a/temp/.gitignore b/temp/.gitignore new file mode 100644 index 000000000000..72e8ffc0db8a --- /dev/null +++ b/temp/.gitignore @@ -0,0 +1 @@ +* From c3644ccaf1bb0799ef919a5897b7737ebff428db Mon Sep 17 00:00:00 2001 From: Nat Welch Date: Sat, 31 Dec 2016 10:01:03 -0800 Subject: [PATCH 013/108] Delete nothing --- temp/nothing | 1 - 1 file changed, 1 deletion(-) delete mode 100644 temp/nothing diff --git a/temp/nothing b/temp/nothing deleted file mode 100644 index 9dafe9be2099..000000000000 --- a/temp/nothing +++ /dev/null @@ -1 +0,0 @@ -nothing From 7147cc7525c27d459152548e3284e03a73688907 Mon Sep 17 00:00:00 2001 From: flower Date: Sun, 1 Jan 2017 19:53:49 -0600 Subject: [PATCH 014/108] fixing unicode commit message problem --- truffleHog.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/truffleHog.py b/truffleHog.py index f3eb6bb58935..3f05313a1d38 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -1,7 +1,10 @@ -import shutil, os, math, string, datetime, argparse +import shutil, sys, os, math, string, datetime, argparse from uuid import uuid4 from git import Repo +reload(sys) +sys.setdefaultencoding('utf8') + BASE64_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=" HEX_CHARS = "1234567890abcdefABCDEF" @@ -90,7 +93,6 @@ def find_strings(git_url): print bcolors.OKGREEN + "Branch: " + branch_name + bcolors.ENDC print bcolors.OKGREEN + "Commit: " + prev_commit.message + bcolors.ENDC print printableDiff - prev_commit = curr_commit shutil.rmtree(project_path) From 61f759a603e98a28da5b67b2e47bd19913b99c78 Mon Sep 17 00:00:00 2001 From: Dylan Ayrey Date: Sun, 1 Jan 2017 23:00:26 -0600 Subject: [PATCH 015/108] Update README.md --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 3f9521233797..11a04f276701 100644 --- a/README.md +++ b/README.md @@ -14,4 +14,9 @@ pip install -r requirements.txt ``` ## How it works -This module will go through the entire commit history of each branch, and check each diff from each commit, and evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text in each diff. If at any point a high entropy string is detected, it will print to the screen. +This module will go through the entire commit history of each branch, and check each diff from each commit, and evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen. + +## Wishlist + +- A way to detect and not scan binary diffs +- Don't rescan diffs if already looked at in another branch From 6589d5a379fce9431094343bb7c20bd8d9ff335c Mon Sep 17 00:00:00 2001 From: flower Date: Mon, 2 Jan 2017 11:16:58 -0600 Subject: [PATCH 016/108] changing temp directory handling --- temp/.gitignore | 1 - truffleHog.py | 4 ++-- 2 files changed, 2 insertions(+), 3 deletions(-) delete mode 100644 temp/.gitignore diff --git a/temp/.gitignore b/temp/.gitignore deleted file mode 100644 index 72e8ffc0db8a..000000000000 --- a/temp/.gitignore +++ /dev/null @@ -1 +0,0 @@ -* diff --git a/truffleHog.py b/truffleHog.py index 3f05313a1d38..537ddea53173 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -1,4 +1,4 @@ -import shutil, sys, os, math, string, datetime, argparse +import shutil, sys, os, math, string, datetime, argparse, tempfile from uuid import uuid4 from git import Repo @@ -48,7 +48,7 @@ class bcolors: def find_strings(git_url): new_project = str(uuid4()) - project_path = os.path.join("temp", new_project) + project_path = os.path.join(tempfile.gettempdir(), new_project) Repo.clone_from(git_url, project_path) From a4f6d63872ef2dae596a505d887dadb9401382ec Mon Sep 17 00:00:00 2001 From: flower Date: Mon, 2 Jan 2017 11:23:06 -0600 Subject: [PATCH 017/108] updating the temp directory generation again --- truffleHog.py | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/truffleHog.py b/truffleHog.py index 537ddea53173..6740210754dc 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -1,5 +1,4 @@ -import shutil, sys, os, math, string, datetime, argparse, tempfile -from uuid import uuid4 +import shutil, sys, math, string, datetime, argparse, tempfile from git import Repo reload(sys) @@ -47,8 +46,7 @@ class bcolors: UNDERLINE = '\033[4m' def find_strings(git_url): - new_project = str(uuid4()) - project_path = os.path.join(tempfile.gettempdir(), new_project) + project_path = tempfile.mkdtemp() Repo.clone_from(git_url, project_path) From 17547392e36b58f391918319b5267161329f0814 Mon Sep 17 00:00:00 2001 From: Dylan Ayrey Date: Mon, 2 Jan 2017 11:35:55 -0600 Subject: [PATCH 018/108] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 11a04f276701..7bc04f0b726d 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ Searches through git repositories for high entropy strings, digging deep into co python truffleHog.py https://github.com/dxa4481/truffleHog.git ``` -![Example](https://i.imgur.com/aGSIEd9.png) +![Example](https://i.imgur.com/U52GPGm.png) ## Setup The only requirement is GitPython, which can be installed with the following From 1ace62c3d1d702c8ab1ee1b66980bfa5cbb72fb7 Mon Sep 17 00:00:00 2001 From: Dylan Ayrey Date: Mon, 2 Jan 2017 11:37:05 -0600 Subject: [PATCH 019/108] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 7bc04f0b726d..190c8d49b2fc 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ Searches through git repositories for high entropy strings, digging deep into co python truffleHog.py https://github.com/dxa4481/truffleHog.git ``` -![Example](https://i.imgur.com/U52GPGm.png) +![Example](https://i.imgur.com/YAXndLD.png) ## Setup The only requirement is GitPython, which can be installed with the following From a3e22c9423c95c961824bb5b02616b97fa81a90f Mon Sep 17 00:00:00 2001 From: flower Date: Mon, 2 Jan 2017 11:44:30 -0600 Subject: [PATCH 020/108] adding attribution --- truffleHog.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/truffleHog.py b/truffleHog.py index 6740210754dc..19580a972023 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -8,6 +8,9 @@ HEX_CHARS = "1234567890abcdefABCDEF" def shannon_entropy(data, iterator): + """ + Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html + """ if not data: return 0 entropy = 0 From bc3bc5bc2f3099ced0a400b254728a78d6f6fa5c Mon Sep 17 00:00:00 2001 From: flower Date: Mon, 2 Jan 2017 12:21:17 -0600 Subject: [PATCH 021/108] adding python3 compatability --- truffleHog.py | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/truffleHog.py b/truffleHog.py index 19580a972023..1291f70b0962 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -1,8 +1,9 @@ import shutil, sys, math, string, datetime, argparse, tempfile from git import Repo -reload(sys) -sys.setdefaultencoding('utf8') +if sys.version_info[0] == 2: + reload(sys) + sys.setdefaultencoding('utf8') BASE64_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=" HEX_CHARS = "1234567890abcdefABCDEF" @@ -71,9 +72,9 @@ def find_strings(git_url): diff = prev_commit.diff(curr_commit, create_patch=True) for blob in diff: #print i.a_blob.data_stream.read() - printableDiff = blob.diff + printableDiff = blob.diff.decode() foundSomething = False - lines = blob.diff.split("\n") + lines = blob.diff.decode().split("\n") for line in lines: for word in line.split(): base64_strings = get_strings_of_set(word, BASE64_CHARS) @@ -90,10 +91,10 @@ def find_strings(git_url): printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) if foundSomething: commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') - print bcolors.OKGREEN + "Date: " + commit_time + bcolors.ENDC - print bcolors.OKGREEN + "Branch: " + branch_name + bcolors.ENDC - print bcolors.OKGREEN + "Commit: " + prev_commit.message + bcolors.ENDC - print printableDiff + print(bcolors.OKGREEN + "Date: " + commit_time + bcolors.ENDC) + print(bcolors.OKGREEN + "Branch: " + branch_name + bcolors.ENDC) + print(bcolors.OKGREEN + "Commit: " + prev_commit.message + bcolors.ENDC) + print(printableDiff) prev_commit = curr_commit shutil.rmtree(project_path) From a9789682238e9cc8e1d6a4776ee5307edae4763d Mon Sep 17 00:00:00 2001 From: Chris Tarazi Date: Mon, 2 Jan 2017 13:21:30 -0800 Subject: [PATCH 022/108] Detect and ignore binary diffs --- truffleHog.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/truffleHog.py b/truffleHog.py index 1291f70b0962..b8497fa2be3c 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -72,7 +72,11 @@ def find_strings(git_url): diff = prev_commit.diff(curr_commit, create_patch=True) for blob in diff: #print i.a_blob.data_stream.read() - printableDiff = blob.diff.decode() + printableDiff = blob.diff.decode() + if printableDiff.startswith("Binary files"): + # print("[DEBUG] DETECTED BINARY FILE DIFF") + continue + # print("[DEBUG] Normal diff") foundSomething = False lines = blob.diff.decode().split("\n") for line in lines: From d0efc0d65a7bfaf309df1fb198581a7416d1d5e0 Mon Sep 17 00:00:00 2001 From: Jacob Stuart Date: Tue, 3 Jan 2017 06:30:04 +0000 Subject: [PATCH 023/108] ignores duplicate diffs caused by branches --- README.md | 2 +- truffleHog.py | 8 ++++++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 190c8d49b2fc..36d829a5d395 100644 --- a/README.md +++ b/README.md @@ -19,4 +19,4 @@ This module will go through the entire commit history of each branch, and check ## Wishlist - A way to detect and not scan binary diffs -- Don't rescan diffs if already looked at in another branch +- ~~Don't rescan diffs if already looked at in another branch~~ diff --git a/truffleHog.py b/truffleHog.py index 1291f70b0962..641d6a16962f 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -57,6 +57,7 @@ def find_strings(git_url): repo = Repo(project_path) + already_searched = set() for remote_branch in repo.remotes.origin.fetch(): branch_name = str(remote_branch).split('/')[1] try: @@ -69,6 +70,13 @@ def find_strings(git_url): if not prev_commit: pass else: + #avoid searching the same diffs + hashes = str(prev_commit) + str(curr_commit) + if hashes in already_searched: + prev_commit = curr_commit + continue + already_searched.add(hashes) + diff = prev_commit.diff(curr_commit, create_patch=True) for blob in diff: #print i.a_blob.data_stream.read() From efe621fce85be0e1cbd547daeeb7f2019c713584 Mon Sep 17 00:00:00 2001 From: christarazi Date: Tue, 3 Jan 2017 00:25:20 -0800 Subject: [PATCH 024/108] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 190c8d49b2fc..4a6cf4ba5392 100644 --- a/README.md +++ b/README.md @@ -18,5 +18,5 @@ This module will go through the entire commit history of each branch, and check ## Wishlist -- A way to detect and not scan binary diffs +- ~~A way to detect and not scan binary diffs~~ - Don't rescan diffs if already looked at in another branch From a981614dee43f967cf2efaad3b1c669eb6a22390 Mon Sep 17 00:00:00 2001 From: Garrett O'Reilly Date: Tue, 3 Jan 2017 08:50:50 -0600 Subject: [PATCH 025/108] Add shebang line --- truffleHog.py | 1 + 1 file changed, 1 insertion(+) diff --git a/truffleHog.py b/truffleHog.py index 1291f70b0962..1d82e0a53620 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -1,3 +1,4 @@ +#!/usr/bin/env python import shutil, sys, math, string, datetime, argparse, tempfile from git import Repo From 0afbcb182e8f0d6f75e99159d4b478dc28122e70 Mon Sep 17 00:00:00 2001 From: flower Date: Tue, 3 Jan 2017 09:52:45 -0600 Subject: [PATCH 026/108] removing print statements --- truffleHog.py | 2 -- 1 file changed, 2 deletions(-) diff --git a/truffleHog.py b/truffleHog.py index 8327133fe275..4f02e58f4ee1 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -83,9 +83,7 @@ def find_strings(git_url): #print i.a_blob.data_stream.read() printableDiff = blob.diff.decode() if printableDiff.startswith("Binary files"): - # print("[DEBUG] DETECTED BINARY FILE DIFF") continue - # print("[DEBUG] Normal diff") foundSomething = False lines = blob.diff.decode().split("\n") for line in lines: From 3c4b64b3f4dc386234817955a8b7f77b0c13d4b5 Mon Sep 17 00:00:00 2001 From: jguasch Date: Tue, 3 Jan 2017 18:21:04 +0000 Subject: [PATCH 027/108] added support for JSON output --- truffleHog.py | 42 +++++++++++++++++++++++++++++------------- 1 file changed, 29 insertions(+), 13 deletions(-) diff --git a/truffleHog.py b/truffleHog.py index 4f02e58f4ee1..836acf7ce0ab 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -1,9 +1,10 @@ #!/usr/bin/env python import shutil, sys, math, string, datetime, argparse, tempfile from git import Repo +import json if sys.version_info[0] == 2: - reload(sys) + reload(sys) sys.setdefaultencoding('utf8') BASE64_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=" @@ -50,13 +51,14 @@ class bcolors: BOLD = '\033[1m' UNDERLINE = '\033[4m' -def find_strings(git_url): +def find_strings(git_url, output): project_path = tempfile.mkdtemp() Repo.clone_from(git_url, project_path) repo = Repo(project_path) + jsonOutput = output already_searched = set() for remote_branch in repo.remotes.origin.fetch(): @@ -65,7 +67,7 @@ def find_strings(git_url): repo.git.checkout(remote_branch, b=branch_name) except: pass - + prev_commit = None for curr_commit in repo.iter_commits(): if not prev_commit: @@ -94,27 +96,41 @@ def find_strings(git_url): b64Entropy = shannon_entropy(string, BASE64_CHARS) if b64Entropy > 4.5: foundSomething = True - printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) + if jsonOutput: + stringDiff = string + else: + printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) for string in hex_strings: hexEntropy = shannon_entropy(string, HEX_CHARS) if hexEntropy > 3: foundSomething = True - printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) + if jsonOutput: + stringDiff = string + else: + printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) if foundSomething: commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') - print(bcolors.OKGREEN + "Date: " + commit_time + bcolors.ENDC) - print(bcolors.OKGREEN + "Branch: " + branch_name + bcolors.ENDC) - print(bcolors.OKGREEN + "Commit: " + prev_commit.message + bcolors.ENDC) - print(printableDiff) - + if jsonOutput: + output = {} + output['date'] = commit_time + output['branch'] = branch_name + output['commit'] = prev_commit.message + output['diff'] = printableDiff + output['string'] = stringDiff + print json.dumps(output) + else: + print(bcolors.OKGREEN + "Date: " + commit_time + bcolors.ENDC) + print(bcolors.OKGREEN + "Branch: " + branch_name + bcolors.ENDC) + print(bcolors.OKGREEN + "Commit: " + prev_commit.message + bcolors.ENDC) + print(printableDiff) + prev_commit = curr_commit shutil.rmtree(project_path) if __name__ == "__main__": parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') + parser.add_argument('--json', dest="output_json", action="store_true", help="Output in JSON") parser.add_argument('git_url', type=str, help='URL for secret searching') - args = parser.parse_args() - find_strings(args.git_url) - + find_strings(args.git_url, args.output_json) From 96a4866465607b8ab01c4a131322a6e96facd956 Mon Sep 17 00:00:00 2001 From: bandrel Date: Tue, 3 Jan 2017 16:22:30 -0500 Subject: [PATCH 028/108] fix for windows access denied issue --- truffleHog.py | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/truffleHog.py b/truffleHog.py index 1291f70b0962..105a88d728fe 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -1,5 +1,7 @@ import shutil, sys, math, string, datetime, argparse, tempfile from git import Repo +import os +import stat if sys.version_info[0] == 2: reload(sys) @@ -8,6 +10,10 @@ BASE64_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=" HEX_CHARS = "1234567890abcdefABCDEF" +def del_rw(action, name, exc): + os.chmod(name, stat.S_IWRITE) + os.remove(name) + def shannon_entropy(data, iterator): """ Borrowed from http://blog.dkbza.org/2007/05/scanning-data-for-entropy-anomalies.html @@ -97,7 +103,7 @@ def find_strings(git_url): print(printableDiff) prev_commit = curr_commit - shutil.rmtree(project_path) + return project_path if __name__ == "__main__": parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') @@ -105,5 +111,6 @@ def find_strings(git_url): args = parser.parse_args() - find_strings(args.git_url) + project_path = find_strings(args.git_url) + shutil.rmtree(project_path, onerror=del_rw) From 8d2b2c73c97347be0675e7e548115930eeaabac6 Mon Sep 17 00:00:00 2001 From: Ryan O'Horo Date: Tue, 3 Jan 2017 14:13:46 -0800 Subject: [PATCH 029/108] Added language describing how to check git repos on the local filesystem. --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index eb8b1926dd4f..8ea6057b2573 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,12 @@ Searches through git repositories for high entropy strings, digging deep into co python truffleHog.py https://github.com/dxa4481/truffleHog.git ``` +or + +``` +python truffleHog.py file:///user/dxa4481/codeprojects/truffleHog/ +``` + ![Example](https://i.imgur.com/YAXndLD.png) ## Setup From 90448690a2855e82cde5d901eb997d14bfa23d03 Mon Sep 17 00:00:00 2001 From: bandrel Date: Wed, 4 Jan 2017 13:16:24 -0500 Subject: [PATCH 030/108] updates to the import statements to conform to PEP8 --- truffleHog.py | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/truffleHog.py b/truffleHog.py index 4f02e58f4ee1..0ef2e5b3e545 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -1,5 +1,10 @@ #!/usr/bin/env python -import shutil, sys, math, string, datetime, argparse, tempfile +import shutil +import sys +import math +import datetime +import argparse +import tempfile from git import Repo if sys.version_info[0] == 2: From 6ebf25e05e968159ef36e627fda559ae79591622 Mon Sep 17 00:00:00 2001 From: Dylan Ayrey Date: Wed, 4 Jan 2017 10:55:28 -0800 Subject: [PATCH 031/108] Update truffleHog.py --- truffleHog.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/truffleHog.py b/truffleHog.py index 0ef2e5b3e545..151d899cecb4 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -37,7 +37,7 @@ def get_strings_of_set(word, char_set, threshold=20): letters += char count += 1 else: - if count > 20: + if count > threshold: strings.append(letters) letters = "" count = 0 From 7d8009ff0b278229a119ea6cb69025dfeed9ab49 Mon Sep 17 00:00:00 2001 From: flower Date: Wed, 4 Jan 2017 11:14:04 -0800 Subject: [PATCH 032/108] shuffling imports --- truffleHog.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/truffleHog.py b/truffleHog.py index 3ed18eb7e545..a6f3ec12fcb6 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -5,9 +5,9 @@ import datetime import argparse import tempfile -from git import Repo import os import stat +from git import Repo if sys.version_info[0] == 2: reload(sys) From bf3536fbdeac1f8aeb133f76261e9cc0726262d4 Mon Sep 17 00:00:00 2001 From: Antony Semonella Date: Fri, 6 Jan 2017 23:01:35 +0000 Subject: [PATCH 033/108] Remove unnecessary ord and chr conversions --- truffleHog.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/truffleHog.py b/truffleHog.py index a6f3ec12fcb6..f5fa63cc8485 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -27,8 +27,8 @@ def shannon_entropy(data, iterator): if not data: return 0 entropy = 0 - for x in (ord(c) for c in iterator): - p_x = float(data.count(chr(x)))/len(data) + for x in iterator: + p_x = float(data.count(x))/len(data) if p_x > 0: entropy += - p_x*math.log(p_x, 2) return entropy From 25621e941c2d209618d5097387881b96f4160014 Mon Sep 17 00:00:00 2001 From: kepoorhampond Date: Mon, 9 Jan 2017 20:09:55 -0800 Subject: [PATCH 034/108] Made a pip uploadable package out of truffleHog --- .gitignore | 3 +++ README.md | 7 ++++++- requirements.txt | 1 - setup.cfg | 2 ++ setup.py | 18 +++++++++++++++++ truffleHog/__init__.py | 0 truffleHog.py => truffleHog/truffleHog.py | 24 ++++++++++++----------- 7 files changed, 42 insertions(+), 13 deletions(-) create mode 100644 .gitignore delete mode 100644 requirements.txt create mode 100644 setup.cfg create mode 100644 setup.py create mode 100644 truffleHog/__init__.py rename truffleHog.py => truffleHog/truffleHog.py (99%) diff --git a/.gitignore b/.gitignore new file mode 100644 index 000000000000..b8d9def409d4 --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +/build/ +/dist/ +/truffleHog.egg-info/ diff --git a/README.md b/README.md index eb8b1926dd4f..02ce431abbaf 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,11 @@ python truffleHog.py https://github.com/dxa4481/truffleHog.git ![Example](https://i.imgur.com/YAXndLD.png) +## Install +``` +pip install truffleHog +``` + ## Setup The only requirement is GitPython, which can be installed with the following ``` @@ -14,7 +19,7 @@ pip install -r requirements.txt ``` ## How it works -This module will go through the entire commit history of each branch, and check each diff from each commit, and evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen. +This module will go through the entire commit history of each branch, and check each diff from each commit, and evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen. ## Wishlist diff --git a/requirements.txt b/requirements.txt deleted file mode 100644 index 0c6b635dc061..000000000000 --- a/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -GitPython==2.1.1 diff --git a/setup.cfg b/setup.cfg new file mode 100644 index 000000000000..3c6e79cf31da --- /dev/null +++ b/setup.cfg @@ -0,0 +1,2 @@ +[bdist_wheel] +universal=1 diff --git a/setup.py b/setup.py new file mode 100644 index 000000000000..9a60304f2384 --- /dev/null +++ b/setup.py @@ -0,0 +1,18 @@ +from setuptools import setup + +setup( + name='truffleHog', + version='1.0.0', + description='Searches through git repositories for high entropy strings, digging deep into commit history.', + url='https://github.com/dxa4481/truffleHog', + author='Dylan Ayrey', + author_email='dxa4481@rit.edu', + license='GNU', + packages =['truffleHog'], + install_requires=[ + 'GitPython == 2.1.1' + ], + entry_points = { + 'console_scripts': ['trufflehog = truffleHog.truffleHog:main'], + }, +) diff --git a/truffleHog/__init__.py b/truffleHog/__init__.py new file mode 100644 index 000000000000..e69de29bb2d1 diff --git a/truffleHog.py b/truffleHog/truffleHog.py similarity index 99% rename from truffleHog.py rename to truffleHog/truffleHog.py index a6f3ec12fcb6..5dbcd03e3905 100644 --- a/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -9,8 +9,17 @@ import stat from git import Repo +def main(): + parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') + parser.add_argument('git_url', type=str, help='URL for secret searching') + + + args = parser.parse_args() + project_path = find_strings(args.git_url) + shutil.rmtree(project_path, onerror=del_rw) + if sys.version_info[0] == 2: - reload(sys) + reload(sys) sys.setdefaultencoding('utf8') BASE64_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=" @@ -76,7 +85,7 @@ def find_strings(git_url): repo.git.checkout(remote_branch, b=branch_name) except: pass - + prev_commit = None for curr_commit in repo.iter_commits(): if not prev_commit: @@ -117,16 +126,9 @@ def find_strings(git_url): print(bcolors.OKGREEN + "Branch: " + branch_name + bcolors.ENDC) print(bcolors.OKGREEN + "Commit: " + prev_commit.message + bcolors.ENDC) print(printableDiff) - + prev_commit = curr_commit return project_path if __name__ == "__main__": - parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') - parser.add_argument('git_url', type=str, help='URL for secret searching') - - - args = parser.parse_args() - project_path = find_strings(args.git_url) - shutil.rmtree(project_path, onerror=del_rw) - + main() From 5ae4f86a9328a41e7395f4460d7abb2c337c024b Mon Sep 17 00:00:00 2001 From: kepoorhampond Date: Mon, 9 Jan 2017 20:17:16 -0800 Subject: [PATCH 035/108] You'll have to create an account at PyPI and then run these commands: `python setup.py sdist`, `python setup.py bdist_wheel`, `python setup.py sdist bdist_wheel upload` --- .gitignore | 1 + README.md | 6 ------ 2 files changed, 1 insertion(+), 6 deletions(-) diff --git a/.gitignore b/.gitignore index b8d9def409d4..ede6aa396781 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ /build/ /dist/ /truffleHog.egg-info/ +*/__pycache__/ diff --git a/README.md b/README.md index 02ce431abbaf..6aa664303114 100644 --- a/README.md +++ b/README.md @@ -12,12 +12,6 @@ python truffleHog.py https://github.com/dxa4481/truffleHog.git pip install truffleHog ``` -## Setup -The only requirement is GitPython, which can be installed with the following -``` -pip install -r requirements.txt -``` - ## How it works This module will go through the entire commit history of each branch, and check each diff from each commit, and evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen. From 664a8dde9f58569cf6e9cff6bedfb027cc16e34e Mon Sep 17 00:00:00 2001 From: CJ Date: Mon, 6 Feb 2017 15:43:35 -0800 Subject: [PATCH 036/108] Leverage replacement character when decode of unrecognized character encoding encontered and stream-line Python 2+ header encoding --- truffleHog.py | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/truffleHog.py b/truffleHog.py index a6f3ec12fcb6..f6772c508fc7 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -1,4 +1,6 @@ #!/usr/bin/env python +# -*- coding: utf-8 -*- + import shutil import sys import math @@ -9,10 +11,6 @@ import stat from git import Repo -if sys.version_info[0] == 2: - reload(sys) - sys.setdefaultencoding('utf8') - BASE64_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=" HEX_CHARS = "1234567890abcdefABCDEF" @@ -68,8 +66,8 @@ def find_strings(git_url): repo = Repo(project_path) - already_searched = set() + for remote_branch in repo.remotes.origin.fetch(): branch_name = str(remote_branch).split('/')[1] try: @@ -92,7 +90,7 @@ def find_strings(git_url): diff = prev_commit.diff(curr_commit, create_patch=True) for blob in diff: #print i.a_blob.data_stream.read() - printableDiff = blob.diff.decode() + printableDiff = blob.diff.decode('utf-8', errors='replace') if printableDiff.startswith("Binary files"): continue foundSomething = False From a2a5008207cbd92f9edc43d8643edc647246185d Mon Sep 17 00:00:00 2001 From: Dylan Date: Sat, 25 Feb 2017 16:39:14 -0800 Subject: [PATCH 037/108] fixing up json output --- truffleHog.py | 52 ++++++++++++++++++++++----------------------------- 1 file changed, 22 insertions(+), 30 deletions(-) diff --git a/truffleHog.py b/truffleHog.py index 27b6066b466b..c4db51b24814 100644 --- a/truffleHog.py +++ b/truffleHog.py @@ -62,15 +62,11 @@ class bcolors: BOLD = '\033[1m' UNDERLINE = '\033[4m' -def find_strings(git_url, output): +def find_strings(git_url, printJson=False): project_path = tempfile.mkdtemp() - Repo.clone_from(git_url, project_path) - + output = {"entropicDiffs": []} repo = Repo(project_path) - - jsonOutput = output - already_searched = set() for remote_branch in repo.remotes.origin.fetch(): branch_name = str(remote_branch).split('/')[1] @@ -97,7 +93,7 @@ def find_strings(git_url, output): printableDiff = blob.diff.decode() if printableDiff.startswith("Binary files"): continue - foundSomething = False + stringsFound = [] lines = blob.diff.decode().split("\n") for line in lines: for word in line.split(): @@ -106,29 +102,24 @@ def find_strings(git_url, output): for string in base64_strings: b64Entropy = shannon_entropy(string, BASE64_CHARS) if b64Entropy > 4.5: - foundSomething = True - if jsonOutput: - stringDiff = string - else: - printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) + stringsFound.append(string) + printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) for string in hex_strings: hexEntropy = shannon_entropy(string, HEX_CHARS) if hexEntropy > 3: - foundSomething = True - if jsonOutput: - stringDiff = string - else: - printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) - if foundSomething: + stringsFound.append(string) + printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) + if len(stringsFound) > 0: commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') - if jsonOutput: - output = {} - output['date'] = commit_time - output['branch'] = branch_name - output['commit'] = prev_commit.message - output['diff'] = printableDiff - output['string'] = stringDiff - print json.dumps(output) + entropicDiff = {} + entropicDiff['date'] = commit_time + entropicDiff['branch'] = branch_name + entropicDiff['commit'] = prev_commit.message + entropicDiff['diff'] = blob.diff.decode() + entropicDiff['stringsFound'] = stringsFound + output["entropicDiffs"].append(entropicDiff) + if printJson: + print(json.dumps(output, sort_keys=True, indent=4)) else: print(bcolors.OKGREEN + "Date: " + commit_time + bcolors.ENDC) print(bcolors.OKGREEN + "Branch: " + branch_name + bcolors.ENDC) @@ -136,13 +127,14 @@ def find_strings(git_url, output): print(printableDiff) prev_commit = curr_commit - return project_path + output["project_path"] = project_path + return output if __name__ == "__main__": parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') parser.add_argument('--json', dest="output_json", action="store_true", help="Output in JSON") parser.add_argument('git_url', type=str, help='URL for secret searching') - args = parser.parse_args() - project_path = find_strings(args.git_url, args.output_json) - shutil.rmtree(project_path, onerror=del_rw) \ No newline at end of file + output = find_strings(args.git_url, args.output_json) + project_path = output["project_path"] + shutil.rmtree(project_path, onerror=del_rw) From 512d8f425f435be061130a3378356993c2426527 Mon Sep 17 00:00:00 2001 From: Dylan Date: Mon, 27 Feb 2017 18:09:45 -0800 Subject: [PATCH 038/108] making the package a callable libarry, and adding an org search script --- scripts/searchOrg.py | 16 ++++++++++++++++ setup.py | 2 +- truffleHog/__init__.py | 1 + 3 files changed, 18 insertions(+), 1 deletion(-) create mode 100644 scripts/searchOrg.py diff --git a/scripts/searchOrg.py b/scripts/searchOrg.py new file mode 100644 index 000000000000..67007af4ec64 --- /dev/null +++ b/scripts/searchOrg.py @@ -0,0 +1,16 @@ +""" +Credit for this code goes to https://github.com/ryanbaxendale +via https://github.com/dxa4481/truffleHog/pull/9 +""" +import requests +import truffleHog + +def get_org_repos(orgname): + response = requests.get(url='https://api.github.com/users/' + orgname + '/repos') + json = response.json() + for item in json: + if item['private'] == False: + print('searching ' + item["html_url"]) + truffleHog.find_strings(item["html_url"]) + +get_org_repos("Netflix") diff --git a/setup.py b/setup.py index 9a60304f2384..eff68f54069a 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='1.0.0', + version='1.0.1', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', diff --git a/truffleHog/__init__.py b/truffleHog/__init__.py index e69de29bb2d1..9a60b3aeec55 100644 --- a/truffleHog/__init__.py +++ b/truffleHog/__init__.py @@ -0,0 +1 @@ +from truffleHog import find_strings From 5accb99ced42b1d0153b8a5c3fee0ca4a9a9f6f2 Mon Sep 17 00:00:00 2001 From: Dylan Date: Mon, 27 Feb 2017 18:10:18 -0800 Subject: [PATCH 039/108] updating readme --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e4f4977c918f..09f4a863e24e 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ truffleHog https://github.com/dxa4481/truffleHog.git or ``` -python truffleHog.py file:///user/dxa4481/codeprojects/truffleHog/ +truffleHog file:///user/dxa4481/codeprojects/truffleHog/ ``` ![Example](https://i.imgur.com/YAXndLD.png) From 133decb3ec580018c5fb55636743496c3d21ed18 Mon Sep 17 00:00:00 2001 From: Jingpeng Wu Date: Mon, 19 Jun 2017 19:47:35 -0400 Subject: [PATCH 040/108] Update __init__.py --- truffleHog/__init__.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/truffleHog/__init__.py b/truffleHog/__init__.py index 9a60b3aeec55..8b137891791f 100644 --- a/truffleHog/__init__.py +++ b/truffleHog/__init__.py @@ -1 +1 @@ -from truffleHog import find_strings + From e986c5ed6a0f0b357953c854bdc56f16914eb2ba Mon Sep 17 00:00:00 2001 From: Dylan Date: Thu, 28 Sep 2017 20:59:47 -0700 Subject: [PATCH 041/108] adding some tests --- tests.py | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 tests.py diff --git a/tests.py b/tests.py new file mode 100644 index 000000000000..3fc2aad2694f --- /dev/null +++ b/tests.py @@ -0,0 +1,26 @@ +import unittest +import os +from truffleHog import truffleHog + + +class TestStringMethods(unittest.TestCase): + + def test_shannon(self): + random_stringB64 = "ZWVTjPQSdhwRgl204Hc51YCsritMIzn8B=/p9UyeX7xu6KkAGqfm3FJ+oObLDNEva" + random_stringHex = "b3A0a1FDfe86dcCE945B72" + self.assertGreater(truffleHog.shannon_entropy(random_stringB64, truffleHog.BASE64_CHARS), 4.5) + self.assertGreater(truffleHog.shannon_entropy(random_stringHex, truffleHog.HEX_CHARS), 3) + + def test_cloning(self): + project_path = truffleHog.clone_git_repo("https://github.com/dxa4481/truffleHog.git") + license_file = os.path.join(project_path, "LICENSE") + self.assertTrue(os.path.isfile(license_file)) + + def test_unicode_expection(self): + try: + truffleHog.find_strings("https://github.com/dxa4481/tst.git") + except UnicodeEncodeError: + self.fail("Unicode print error") + +if __name__ == '__main__': + unittest.main() From fd81192e0c7b879e3c53b641aca4fc49826035a9 Mon Sep 17 00:00:00 2001 From: Dylan Date: Thu, 28 Sep 2017 22:06:58 -0700 Subject: [PATCH 042/108] fixes unicode errors --- truffleHog/truffleHog.py | 37 ++++++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 7c01379eabca..c92147b8daa0 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -70,15 +70,40 @@ class bcolors: BOLD = '\033[1m' UNDERLINE = '\033[4m' -def find_strings(git_url, printJson=False): +def clone_git_repo(git_url): project_path = tempfile.mkdtemp() Repo.clone_from(git_url, project_path) + return project_path + +def print_results(printJson, commit_time, branch_name, prev_commit, printableDiff): + if printJson: + print(json.dumps(output, sort_keys=True, indent=4)) + else: + if sys.version_info >= (3, 0): + dateStr = "{}Date: {}{}".format(bcolors.OKGREEN, commit_time, bcolors.ENDC) + print(dateStr) + branchStr = "{}Branch: {}{}".format(bcolors.OKGREEN, branch_name, bcolors.ENDC) + print(branchStr) + commitStr = "{}Commit: {}{}".format(bcolors.OKGREEN, prev_commit.message, bcolors.ENDC) + print(commitStr) + print(printableDiff) + else: + dateStr = "{}Date: {}{}".format(bcolors.OKGREEN, commit_time, bcolors.ENDC) + print(dateStr) + branchStr = "{}Branch: {}{}".format(bcolors.OKGREEN, branch_name.encode('utf-8'), bcolors.ENDC) + print(branchStr) + commitStr = "{}Commit: {}{}".format(bcolors.OKGREEN, prev_commit.message.encode('utf-8'), bcolors.ENDC) + print(commitStr) + print(printableDiff.encode('utf-8')) + +def find_strings(git_url, printJson=False): output = {"entropicDiffs": []} + project_path = clone_git_repo(git_url) repo = Repo(project_path) already_searched = set() for remote_branch in repo.remotes.origin.fetch(): - branch_name = str(remote_branch).split('/')[1] + branch_name = remote_branch.name.split('/')[1] try: repo.git.checkout(remote_branch, b=branch_name) except: @@ -127,14 +152,8 @@ def find_strings(git_url, printJson=False): entropicDiff['diff'] = blob.diff.decode('utf-8', errors='replace') entropicDiff['stringsFound'] = stringsFound output["entropicDiffs"].append(entropicDiff) - if printJson: - print(json.dumps(output, sort_keys=True, indent=4)) - else: - print(bcolors.OKGREEN + "Date: " + commit_time + bcolors.ENDC) - print(bcolors.OKGREEN + "Branch: " + branch_name + bcolors.ENDC) - print(bcolors.OKGREEN + "Commit: " + prev_commit.message + bcolors.ENDC) - print(printableDiff) + print_results(printJson, commit_time, branch_name, prev_commit, printableDiff) prev_commit = curr_commit output["project_path"] = project_path return output From 8ab1422ed7e109b9fc1dd012bfd844e8e0293fbb Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 8 Oct 2017 01:25:35 -0700 Subject: [PATCH 043/108] adding regex support --- truffleHog/regexChecks.py | 9 +++ truffleHog/truffleHog.py | 129 +++++++++++++++++++++++++++----------- 2 files changed, 101 insertions(+), 37 deletions(-) create mode 100644 truffleHog/regexChecks.py diff --git a/truffleHog/regexChecks.py b/truffleHog/regexChecks.py new file mode 100644 index 000000000000..47efaf223e00 --- /dev/null +++ b/truffleHog/regexChecks.py @@ -0,0 +1,9 @@ +import re + +regexes = { + "Internal subdomain": re.compile('([a-z0-9]+[.]*supersecretinternal[.]com)'), + "Slack Token": re.compile('(xox[p|b|o|a]-[0-9]{12}-[0-9]{12}-[0-9]{12}-[a-z0-9]{32})'), + "Google Oauth": re.compile('("client_secret":"[a-zA-Z0-9-_]{24}")'), + "RSA private key": re.compile('-----BEGIN RSA PRIVATE KEY-----') +} + diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index c92147b8daa0..aaf8880bd398 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -10,17 +10,31 @@ import os import json import stat +from regexChecks import regexes from git import Repo def main(): parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') parser.add_argument('--json', dest="output_json", action="store_true", help="Output in JSON") + parser.add_argument("--regex", dest="do_regex", action="store_true") + parser.add_argument("--entropy", dest="do_entropy") parser.add_argument('git_url', type=str, help='URL for secret searching') + parser.set_defaults(regex=False) + parser.set_defaults(entropy=True) args = parser.parse_args() - output = find_strings(args.git_url, args.output_json) + do_entropy = str2bool(args.do_entropy) + output = find_strings(args.git_url, args.output_json, args.do_regex, do_entropy) project_path = output["project_path"] shutil.rmtree(project_path, onerror=del_rw) +def str2bool(v): + if v.lower() in ('yes', 'true', 't', 'y', '1'): + return True + elif v.lower() in ('no', 'false', 'f', 'n', '0'): + return False + else: + raise argparse.ArgumentTypeError('Boolean value expected.') + BASE64_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=" HEX_CHARS = "1234567890abcdefABCDEF" @@ -75,28 +89,85 @@ def clone_git_repo(git_url): Repo.clone_from(git_url, project_path) return project_path -def print_results(printJson, commit_time, branch_name, prev_commit, printableDiff): +def print_results(printJson, issue): + commit_time = issue['date'] + branch_name = issue['branch'] + prev_commit = issue['commit'] + printableDiff = issue['printDiff'] + reason = issue['reason'] + if printJson: print(json.dumps(output, sort_keys=True, indent=4)) else: + reason = "{}Reason: {}{}".format(bcolors.OKGREEN, reason, bcolors.ENDC) + print(reason) + dateStr = "{}Date: {}{}".format(bcolors.OKGREEN, commit_time, bcolors.ENDC) + print(dateStr) + if sys.version_info >= (3, 0): - dateStr = "{}Date: {}{}".format(bcolors.OKGREEN, commit_time, bcolors.ENDC) - print(dateStr) branchStr = "{}Branch: {}{}".format(bcolors.OKGREEN, branch_name, bcolors.ENDC) print(branchStr) - commitStr = "{}Commit: {}{}".format(bcolors.OKGREEN, prev_commit.message, bcolors.ENDC) + commitStr = "{}Commit: {}{}".format(bcolors.OKGREEN, prev_commit, bcolors.ENDC) print(commitStr) print(printableDiff) else: - dateStr = "{}Date: {}{}".format(bcolors.OKGREEN, commit_time, bcolors.ENDC) - print(dateStr) branchStr = "{}Branch: {}{}".format(bcolors.OKGREEN, branch_name.encode('utf-8'), bcolors.ENDC) print(branchStr) - commitStr = "{}Commit: {}{}".format(bcolors.OKGREEN, prev_commit.message.encode('utf-8'), bcolors.ENDC) + commitStr = "{}Commit: {}{}".format(bcolors.OKGREEN, prev_commit.encode('utf-8'), bcolors.ENDC) print(commitStr) print(printableDiff.encode('utf-8')) -def find_strings(git_url, printJson=False): +def find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob): + stringsFound = [] + lines = printableDiff.split("\n") + for line in lines: + for word in line.split(): + base64_strings = get_strings_of_set(word, BASE64_CHARS) + hex_strings = get_strings_of_set(word, HEX_CHARS) + for string in base64_strings: + b64Entropy = shannon_entropy(string, BASE64_CHARS) + if b64Entropy > 4.5: + stringsFound.append(string) + printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) + for string in hex_strings: + hexEntropy = shannon_entropy(string, HEX_CHARS) + if hexEntropy > 3: + stringsFound.append(string) + printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) + entropicDiff = None + if len(stringsFound) > 0: + entropicDiff = {} + entropicDiff['date'] = commit_time + entropicDiff['branch'] = branch_name + entropicDiff['commit'] = prev_commit.message + entropicDiff['diff'] = blob.diff.decode('utf-8', errors='replace') + entropicDiff['stringsFound'] = stringsFound + entropicDiff['printDiff'] = printableDiff + entropicDiff['reason'] = "High Entropy" + return entropicDiff + +def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob): + regex_matches = [] + for key in regexes: + found_strings = regexes[key].findall(printableDiff) + for found_string in found_strings: + printableDiff = printableDiff.replace(printableDiff, bcolors.WARNING + found_string + bcolors.ENDC) + if found_strings: + foundRegex = {} + foundRegex['date'] = commit_time + foundRegex['branch'] = branch_name + foundRegex['commit'] = prev_commit.message + foundRegex['diff'] = blob.diff.decode('utf-8', errors='replace') + foundRegex['stringsFound'] = found_strings + foundRegex['printDiff'] = printableDiff + foundRegex['reason'] = key + regex_matches.append(foundRegex) + return regex_matches + + + + +def find_strings(git_url, printJson=False, do_regex=False, do_entropy=True): output = {"entropicDiffs": []} project_path = clone_git_repo(git_url) repo = Repo(project_path) @@ -123,37 +194,21 @@ def find_strings(git_url, printJson=False): diff = prev_commit.diff(curr_commit, create_patch=True) for blob in diff: - #print i.a_blob.data_stream.read() printableDiff = blob.diff.decode('utf-8', errors='replace') if printableDiff.startswith("Binary files"): continue - stringsFound = [] - lines = blob.diff.decode('utf-8', errors='replace').split("\n") - for line in lines: - for word in line.split(): - base64_strings = get_strings_of_set(word, BASE64_CHARS) - hex_strings = get_strings_of_set(word, HEX_CHARS) - for string in base64_strings: - b64Entropy = shannon_entropy(string, BASE64_CHARS) - if b64Entropy > 4.5: - stringsFound.append(string) - printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) - for string in hex_strings: - hexEntropy = shannon_entropy(string, HEX_CHARS) - if hexEntropy > 3: - stringsFound.append(string) - printableDiff = printableDiff.replace(string, bcolors.WARNING + string + bcolors.ENDC) - if len(stringsFound) > 0: - commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') - entropicDiff = {} - entropicDiff['date'] = commit_time - entropicDiff['branch'] = branch_name - entropicDiff['commit'] = prev_commit.message - entropicDiff['diff'] = blob.diff.decode('utf-8', errors='replace') - entropicDiff['stringsFound'] = stringsFound - output["entropicDiffs"].append(entropicDiff) - - print_results(printJson, commit_time, branch_name, prev_commit, printableDiff) + commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') + foundIssues = [] + if do_entropy: + entropicDiff = find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob) + if entropicDiff: + foundIssues.append(entropicDiff) + if do_regex: + found_regexes = regex_check(printableDiff, commit_time, branch_name, prev_commit, blob) + foundIssues += found_regexes + for foundIssue in foundIssues: + print_results(printJson, foundIssue) + prev_commit = curr_commit output["project_path"] = project_path return output From e1d329e58ae387b1b9a2599a9b418ed0aa71a73e Mon Sep 17 00:00:00 2001 From: Dylan Date: Tue, 10 Oct 2017 21:15:41 -0700 Subject: [PATCH 044/108] adding some more signals --- truffleHog/regexChecks.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/truffleHog/regexChecks.py b/truffleHog/regexChecks.py index 47efaf223e00..76bfd9044db1 100644 --- a/truffleHog/regexChecks.py +++ b/truffleHog/regexChecks.py @@ -3,7 +3,10 @@ regexes = { "Internal subdomain": re.compile('([a-z0-9]+[.]*supersecretinternal[.]com)'), "Slack Token": re.compile('(xox[p|b|o|a]-[0-9]{12}-[0-9]{12}-[0-9]{12}-[a-z0-9]{32})'), + "RSA private key": re.compile('-----BEGIN RSA PRIVATE KEY-----'), + "Facebook Oauth": re.compile('[f|F][a|A][c|C][e|E][b|B][o|O][o|O][k|K].*[0-9a-f]{32}'), + "Twitter Oauth": re.compile('[t|T][w|W][i|I][t|T][t|T][e|E][r|R].*[0-9a-zA-Z]{35,44}'), "Google Oauth": re.compile('("client_secret":"[a-zA-Z0-9-_]{24}")'), - "RSA private key": re.compile('-----BEGIN RSA PRIVATE KEY-----') + "AWS API Key": re.compile('[a|A][w|W][s|S].*AKIA[0-9A-Z]{16}') } From 4e0bac53f34afa22e7ca46b349f444b2045896fb Mon Sep 17 00:00:00 2001 From: Dylan Date: Tue, 10 Oct 2017 21:37:23 -0700 Subject: [PATCH 045/108] adding commit hash --- truffleHog/truffleHog.py | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index aaf8880bd398..b1f6f27a4107 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -94,15 +94,19 @@ def print_results(printJson, issue): branch_name = issue['branch'] prev_commit = issue['commit'] printableDiff = issue['printDiff'] + commitHash = issue['commitHash'] reason = issue['reason'] if printJson: print(json.dumps(output, sort_keys=True, indent=4)) else: + print("~~~~~~~~~~~~~~~~~~~~~") reason = "{}Reason: {}{}".format(bcolors.OKGREEN, reason, bcolors.ENDC) print(reason) dateStr = "{}Date: {}{}".format(bcolors.OKGREEN, commit_time, bcolors.ENDC) print(dateStr) + hashStr = "{}Hash: {}{}".format(bcolors.OKGREEN, commitHash, bcolors.ENDC) + print(hashStr) if sys.version_info >= (3, 0): branchStr = "{}Branch: {}{}".format(bcolors.OKGREEN, branch_name, bcolors.ENDC) @@ -116,8 +120,9 @@ def print_results(printJson, issue): commitStr = "{}Commit: {}{}".format(bcolors.OKGREEN, prev_commit.encode('utf-8'), bcolors.ENDC) print(commitStr) print(printableDiff.encode('utf-8')) + print("~~~~~~~~~~~~~~~~~~~~~") -def find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob): +def find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash): stringsFound = [] lines = printableDiff.split("\n") for line in lines: @@ -143,10 +148,11 @@ def find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob): entropicDiff['diff'] = blob.diff.decode('utf-8', errors='replace') entropicDiff['stringsFound'] = stringsFound entropicDiff['printDiff'] = printableDiff + entropicDiff['commitHash'] = commitHash entropicDiff['reason'] = "High Entropy" return entropicDiff -def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob): +def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash): regex_matches = [] for key in regexes: found_strings = regexes[key].findall(printableDiff) @@ -161,6 +167,7 @@ def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob): foundRegex['stringsFound'] = found_strings foundRegex['printDiff'] = printableDiff foundRegex['reason'] = key + foundRegex['commitHash'] = commitHash regex_matches.append(foundRegex) return regex_matches @@ -182,6 +189,7 @@ def find_strings(git_url, printJson=False, do_regex=False, do_entropy=True): prev_commit = None for curr_commit in repo.iter_commits(): + commitHash = curr_commit.hexsha if not prev_commit: pass else: @@ -200,11 +208,11 @@ def find_strings(git_url, printJson=False, do_regex=False, do_entropy=True): commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') foundIssues = [] if do_entropy: - entropicDiff = find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob) + entropicDiff = find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash) if entropicDiff: foundIssues.append(entropicDiff) if do_regex: - found_regexes = regex_check(printableDiff, commit_time, branch_name, prev_commit, blob) + found_regexes = regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash) foundIssues += found_regexes for foundIssue in foundIssues: print_results(printJson, foundIssue) From 0653620019e670ac1fe2565ca87eb2297f3cc01a Mon Sep 17 00:00:00 2001 From: Dylan Date: Mon, 30 Oct 2017 19:06:22 -0700 Subject: [PATCH 046/108] tweaking regex --- truffleHog/regexChecks.py | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/truffleHog/regexChecks.py b/truffleHog/regexChecks.py index 76bfd9044db1..1810c9ef7a3b 100644 --- a/truffleHog/regexChecks.py +++ b/truffleHog/regexChecks.py @@ -4,9 +4,11 @@ "Internal subdomain": re.compile('([a-z0-9]+[.]*supersecretinternal[.]com)'), "Slack Token": re.compile('(xox[p|b|o|a]-[0-9]{12}-[0-9]{12}-[0-9]{12}-[a-z0-9]{32})'), "RSA private key": re.compile('-----BEGIN RSA PRIVATE KEY-----'), - "Facebook Oauth": re.compile('[f|F][a|A][c|C][e|E][b|B][o|O][o|O][k|K].*[0-9a-f]{32}'), - "Twitter Oauth": re.compile('[t|T][w|W][i|I][t|T][t|T][e|E][r|R].*[0-9a-zA-Z]{35,44}'), + "Facebook Oauth": re.compile('[f|F][a|A][c|C][e|E][b|B][o|O][o|O][k|K].*[\'|"][0-9a-f]{32}[\'|"]'), + "Twitter Oauth": re.compile('[t|T][w|W][i|I][t|T][t|T][e|E][r|R].*[\'|"][0-9a-zA-Z]{35,44}[\'|"]'), "Google Oauth": re.compile('("client_secret":"[a-zA-Z0-9-_]{24}")'), - "AWS API Key": re.compile('[a|A][w|W][s|S].*AKIA[0-9A-Z]{16}') + "AWS API Key": re.compile('AKIA[0-9A-Z]{16}'),#[a|A][w|W][s|S].*AKIA[0-9A-Z]{16}'), + "Heroku API Key": re.compile('[h|H][e|E][r|R][o|O][k|K][u|U].*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}'), + "Generic Secret": re.compile('[s|S][e|E][c|C][r|R][e|E][t|T].*[\'|"][0-9a-zA-Z]{32,45}[\'|"]') } From 7a027ff0d75dee18b26086bd685fdc0f66a333d3 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sat, 11 Nov 2017 20:25:41 -0800 Subject: [PATCH 047/108] fixing parse error --- setup.py | 2 +- truffleHog/truffleHog.py | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/setup.py b/setup.py index e1ec2daa0b67..72e9df1270b1 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='1.0.2', + version='1.0.4', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index b1f6f27a4107..ec50e62d5d62 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -28,6 +28,8 @@ def main(): shutil.rmtree(project_path, onerror=del_rw) def str2bool(v): + if v == None: + return True if v.lower() in ('yes', 'true', 't', 'y', '1'): return True elif v.lower() in ('no', 'false', 'f', 'n', '0'): @@ -98,7 +100,7 @@ def print_results(printJson, issue): reason = issue['reason'] if printJson: - print(json.dumps(output, sort_keys=True, indent=4)) + print(json.dumps(issue, sort_keys=True, indent=4)) else: print("~~~~~~~~~~~~~~~~~~~~~") reason = "{}Reason: {}{}".format(bcolors.OKGREEN, reason, bcolors.ENDC) From 9d60549cea17c830df3f99398993e8f6fd154468 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sat, 11 Nov 2017 20:32:45 -0800 Subject: [PATCH 048/108] updating the readme --- README.md | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 09f4a863e24e..0f4aaf0ccf79 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,14 @@ # Truffle Hog -Searches through git repositories for high entropy strings, digging deep into commit history and branches. This is effective at finding secrets accidentally committed that contain high entropy. +Searches through git repositories for secrets, digging deep into commit history and branches. This is effective at finding secrets accidentally committed. + +## NEW +Trufflehog previously functioned by running entropy checks on git diffs. This functionality still exists, but high signal regex checks have been added, and the ability to surpress entropy checking has also been added. + +These features help cut down on noise, and makes the tool easier to shove into a devops pipeline. + ``` -truffleHog https://github.com/dxa4481/truffleHog.git +truffleHog --regex --entropy=False https://github.com/dxa4481/truffleHog.git ``` or @@ -18,10 +24,22 @@ truffleHog file:///user/dxa4481/codeprojects/truffleHog/ pip install truffleHog ``` +## Customizing + +Custom regexes can be added to the following file: +``` +truffleHog/truffleHog/regexChecks.py +``` +Things like subdomain enumeration, s3 bucket detection, and other useful regexes highly custom to the situation can be added. + +Feel free to also contribute high signal regexes upstream that you think will benifit the community. Things like Azure keys, Twilio keys, Google Compute keys, are welcome, provided a high signal regex can be constructed. + ## How it works -This module will go through the entire commit history of each branch, and check each diff from each commit, and evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen. +This module will go through the entire commit history of each branch, and check each diff from each commit, and check for secrets. This is both by regex and by entropy. For entropy checks, trufflehog will evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen. ## Wishlist - ~~A way to detect and not scan binary diffs~~ - ~~Don't rescan diffs if already looked at in another branch~~ +- A since commit X feature +- Print the file affected From df393b4125c2aa217211b2429b8963d0cefcee27 Mon Sep 17 00:00:00 2001 From: Stephen Date: Wed, 6 Dec 2017 14:44:41 -0800 Subject: [PATCH 049/108] Add travis testing --- .gitignore | 3 ++- .travis.yml | 13 +++++++++++++ requirements.txt | 0 tests.py => test_all.py | 0 4 files changed, 15 insertions(+), 1 deletion(-) create mode 100644 .travis.yml create mode 100644 requirements.txt rename tests.py => test_all.py (100%) diff --git a/.gitignore b/.gitignore index ede6aa396781..bb85dcc36137 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,5 @@ /build/ /dist/ /truffleHog.egg-info/ -*/__pycache__/ +**/__pycache__/ +**/*.pyc diff --git a/.travis.yml b/.travis.yml new file mode 100644 index 000000000000..33b6f107bc7a --- /dev/null +++ b/.travis.yml @@ -0,0 +1,13 @@ +language: python +python: + - "2.6" + - "2.7" + - "3.2" + - "3.3" + - "3.4" + - "3.5" + - "3.5-dev" # 3.5 development branch + - "3.6" + - "3.6-dev" # 3.6 development branch + - "3.7-dev" # 3.7 development branch + - "nightly" diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 000000000000..e69de29bb2d1 diff --git a/tests.py b/test_all.py similarity index 100% rename from tests.py rename to test_all.py From 59e117f26fef80a54a634e0c148794cec0df4f40 Mon Sep 17 00:00:00 2001 From: Stephen Date: Wed, 6 Dec 2017 14:49:26 -0800 Subject: [PATCH 050/108] Add travis script command --- .travis.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.travis.yml b/.travis.yml index 33b6f107bc7a..5cadf87c8c53 100644 --- a/.travis.yml +++ b/.travis.yml @@ -11,3 +11,4 @@ python: - "3.6-dev" # 3.6 development branch - "3.7-dev" # 3.7 development branch - "nightly" +script: pytest From eebf01c4ec13724788b7360368f78d0262e29279 Mon Sep 17 00:00:00 2001 From: Stephen Date: Wed, 6 Dec 2017 14:52:59 -0800 Subject: [PATCH 051/108] Add GitPython --- requirements.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/requirements.txt b/requirements.txt index e69de29bb2d1..64b1adaeeb48 100644 --- a/requirements.txt +++ b/requirements.txt @@ -0,0 +1 @@ +GitPython From bce09b0d6186565c83f448a0f9d32887b0c42aab Mon Sep 17 00:00:00 2001 From: Stephen Date: Wed, 6 Dec 2017 14:54:53 -0800 Subject: [PATCH 052/108] Add unittest2 --- requirements.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/requirements.txt b/requirements.txt index 64b1adaeeb48..c4cb6d5be2d3 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1 +1,2 @@ GitPython +unittest2 From f0d9c2814ea664f336107b02d9fd3b4761b50e7b Mon Sep 17 00:00:00 2001 From: Stephen Date: Wed, 6 Dec 2017 14:57:56 -0800 Subject: [PATCH 053/108] Limit to 2.7 --- .travis.yml | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/.travis.yml b/.travis.yml index 5cadf87c8c53..a20c1efd6b75 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,14 +1,4 @@ language: python python: - - "2.6" - "2.7" - - "3.2" - - "3.3" - - "3.4" - - "3.5" - - "3.5-dev" # 3.5 development branch - - "3.6" - - "3.6-dev" # 3.6 development branch - - "3.7-dev" # 3.7 development branch - - "nightly" script: pytest From b6073aafed119255301571cf8ff1ea43ede87c95 Mon Sep 17 00:00:00 2001 From: Chris Gates Date: Fri, 8 Dec 2017 11:38:45 -0500 Subject: [PATCH 054/108] Update regexChecks.py --- truffleHog/regexChecks.py | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/truffleHog/regexChecks.py b/truffleHog/regexChecks.py index 1810c9ef7a3b..877e8878e07a 100644 --- a/truffleHog/regexChecks.py +++ b/truffleHog/regexChecks.py @@ -2,13 +2,24 @@ regexes = { "Internal subdomain": re.compile('([a-z0-9]+[.]*supersecretinternal[.]com)'), + "Generic AppSecret": re.compile('[a|A][p|P][p|P][s|S][e|E][c|C][r|R][e|E][t|T].*.[0-9a-zA-Z]{32,45}'), + "Generic AppSecret 2": re.compile('[a|A][p|P][p|P][s|S][e|E][c|C][r|R][e|E][t|T].*[\'|"][0-9a-zA-Z]{32,45}[\'|"]'), "Slack Token": re.compile('(xox[p|b|o|a]-[0-9]{12}-[0-9]{12}-[0-9]{12}-[a-z0-9]{32})'), "RSA private key": re.compile('-----BEGIN RSA PRIVATE KEY-----'), + "SSH (OPENSSH) private key": re.compile('-----BEGIN OPENSSH PRIVATE KEY-----'), + "SSH (DSA) private key": re.compile('-----BEGIN DSA PRIVATE KEY-----'), + "SSH (EC) private key": re.compile('-----BEGIN EC PRIVATE KEY-----'), + "PGP private key block": re.compile('-----BEGIN PGP PRIVATE KEY BLOCK-----'), "Facebook Oauth": re.compile('[f|F][a|A][c|C][e|E][b|B][o|O][o|O][k|K].*[\'|"][0-9a-f]{32}[\'|"]'), + "Facebook Oauth 2": re.compile('[f|F][a|A][c|C][e|E][b|B][o|O][o|O][k|K].*.[0-9a-f]{32}'), "Twitter Oauth": re.compile('[t|T][w|W][i|I][t|T][t|T][e|E][r|R].*[\'|"][0-9a-zA-Z]{35,44}[\'|"]'), + "Twitter Oauth 2": re.compile('[t|T][w|W][i|I][t|T][t|T][e|E][r|R].*.[0-9a-zA-Z]{35,44}'), + "GitHub": re.compile('[g|G][i|I][t|T][h|H][u|U][b|B].*[0-9a-zA-Z]{35,40}'), + "GitHub 2": re.compile('[g|G][i|I][t|T][h|H][u|U][b|B].*[c|C][l|L][i|I][e|E][n|N][T|T][s|S][e|E][c|C][r|R][e|E][t|T].*[0-9a-zA-Z]{35,40}'), "Google Oauth": re.compile('("client_secret":"[a-zA-Z0-9-_]{24}")'), + "Google Oauth 2": re.compile('[c|C][l|L][i|I][e|E][n|N][T|T][_][s|S][e|E][c|C][r|R][e|E][t|T].*[:].*[a-zA-Z0-9-_]{24}'), "AWS API Key": re.compile('AKIA[0-9A-Z]{16}'),#[a|A][w|W][s|S].*AKIA[0-9A-Z]{16}'), "Heroku API Key": re.compile('[h|H][e|E][r|R][o|O][k|K][u|U].*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}'), - "Generic Secret": re.compile('[s|S][e|E][c|C][r|R][e|E][t|T].*[\'|"][0-9a-zA-Z]{32,45}[\'|"]') + "Generic Secret": re.compile('[s|S][e|E][c|C][r|R][e|E][t|T].*[\'|"][0-9a-zA-Z]{32,45}[\'|"]'), } From bbb1392f94ddf21ae18bce19f63f5abb309657c7 Mon Sep 17 00:00:00 2001 From: Stephen Date: Sun, 10 Dec 2017 11:25:01 -0800 Subject: [PATCH 055/108] Add coverage --- .travis.yml | 2 +- requirements.txt | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index a20c1efd6b75..bd8dfc5cadd0 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,4 +1,4 @@ language: python python: - "2.7" -script: pytest +script: pytest --cov=./ && codecov diff --git a/requirements.txt b/requirements.txt index c4cb6d5be2d3..5f54e451f062 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,2 +1,3 @@ GitPython unittest2 +pytest-cov From cd8b8140c03c588d237110eb6707665bbfca8a36 Mon Sep 17 00:00:00 2001 From: Stephen Date: Sun, 10 Dec 2017 11:27:07 -0800 Subject: [PATCH 056/108] Add codecov --- requirements.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/requirements.txt b/requirements.txt index 5f54e451f062..6c579c6d31e5 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,4 @@ GitPython unittest2 pytest-cov +codecov From 9c5491d3c517708c8036a61e86b778fa80a053c8 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 10 Dec 2017 12:39:18 -0800 Subject: [PATCH 057/108] fixing printable diff being overwritten --- setup.py | 2 +- truffleHog/truffleHog.py | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/setup.py b/setup.py index 72e9df1270b1..de0cd58d2d30 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='1.0.4', + version='2.0.0', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index ec50e62d5d62..1b3ca1008429 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -159,7 +159,7 @@ def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, comm for key in regexes: found_strings = regexes[key].findall(printableDiff) for found_string in found_strings: - printableDiff = printableDiff.replace(printableDiff, bcolors.WARNING + found_string + bcolors.ENDC) + found_diff = printableDiff.replace(printableDiff, bcolors.WARNING + found_string + bcolors.ENDC) if found_strings: foundRegex = {} foundRegex['date'] = commit_time @@ -167,7 +167,7 @@ def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, comm foundRegex['commit'] = prev_commit.message foundRegex['diff'] = blob.diff.decode('utf-8', errors='replace') foundRegex['stringsFound'] = found_strings - foundRegex['printDiff'] = printableDiff + foundRegex['printDiff'] = found_diff foundRegex['reason'] = key foundRegex['commitHash'] = commitHash regex_matches.append(foundRegex) From 94c6d085bc2db0ab0ed7d8c13a4401001ed00777 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 10 Dec 2017 12:56:38 -0800 Subject: [PATCH 058/108] updating regex checks --- truffleHog/regexChecks.py | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/truffleHog/regexChecks.py b/truffleHog/regexChecks.py index 877e8878e07a..27d26f9b6115 100644 --- a/truffleHog/regexChecks.py +++ b/truffleHog/regexChecks.py @@ -1,9 +1,7 @@ import re regexes = { - "Internal subdomain": re.compile('([a-z0-9]+[.]*supersecretinternal[.]com)'), - "Generic AppSecret": re.compile('[a|A][p|P][p|P][s|S][e|E][c|C][r|R][e|E][t|T].*.[0-9a-zA-Z]{32,45}'), - "Generic AppSecret 2": re.compile('[a|A][p|P][p|P][s|S][e|E][c|C][r|R][e|E][t|T].*[\'|"][0-9a-zA-Z]{32,45}[\'|"]'), + #"Internal subdomain": re.compile('([a-z0-9]+[.]*supersecretinternal[.]com)'), "Slack Token": re.compile('(xox[p|b|o|a]-[0-9]{12}-[0-9]{12}-[0-9]{12}-[a-z0-9]{32})'), "RSA private key": re.compile('-----BEGIN RSA PRIVATE KEY-----'), "SSH (OPENSSH) private key": re.compile('-----BEGIN OPENSSH PRIVATE KEY-----'), @@ -11,14 +9,10 @@ "SSH (EC) private key": re.compile('-----BEGIN EC PRIVATE KEY-----'), "PGP private key block": re.compile('-----BEGIN PGP PRIVATE KEY BLOCK-----'), "Facebook Oauth": re.compile('[f|F][a|A][c|C][e|E][b|B][o|O][o|O][k|K].*[\'|"][0-9a-f]{32}[\'|"]'), - "Facebook Oauth 2": re.compile('[f|F][a|A][c|C][e|E][b|B][o|O][o|O][k|K].*.[0-9a-f]{32}'), "Twitter Oauth": re.compile('[t|T][w|W][i|I][t|T][t|T][e|E][r|R].*[\'|"][0-9a-zA-Z]{35,44}[\'|"]'), - "Twitter Oauth 2": re.compile('[t|T][w|W][i|I][t|T][t|T][e|E][r|R].*.[0-9a-zA-Z]{35,44}'), - "GitHub": re.compile('[g|G][i|I][t|T][h|H][u|U][b|B].*[0-9a-zA-Z]{35,40}'), - "GitHub 2": re.compile('[g|G][i|I][t|T][h|H][u|U][b|B].*[c|C][l|L][i|I][e|E][n|N][T|T][s|S][e|E][c|C][r|R][e|E][t|T].*[0-9a-zA-Z]{35,40}'), + "GitHub": re.compile('[g|G][i|I][t|T][h|H][u|U][b|B].*[[\'|"]0-9a-zA-Z]{35,40}[\'|"]'), "Google Oauth": re.compile('("client_secret":"[a-zA-Z0-9-_]{24}")'), - "Google Oauth 2": re.compile('[c|C][l|L][i|I][e|E][n|N][T|T][_][s|S][e|E][c|C][r|R][e|E][t|T].*[:].*[a-zA-Z0-9-_]{24}'), - "AWS API Key": re.compile('AKIA[0-9A-Z]{16}'),#[a|A][w|W][s|S].*AKIA[0-9A-Z]{16}'), + "AWS API Key": re.compile('AKIA[0-9A-Z]{16}'), "Heroku API Key": re.compile('[h|H][e|E][r|R][o|O][k|K][u|U].*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}'), "Generic Secret": re.compile('[s|S][e|E][c|C][r|R][e|E][t|T].*[\'|"][0-9a-zA-Z]{32,45}[\'|"]'), } From 8e3a8c499babc75ba816da90ffdb47618ab01d24 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 10 Dec 2017 12:57:57 -0800 Subject: [PATCH 059/108] updating setup.py --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index de0cd58d2d30..e66b86a425a8 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.0', + version='2.0.1', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', From 33db36449837e44c231d3a439ce0f2326363e537 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 10 Dec 2017 14:01:08 -0800 Subject: [PATCH 060/108] adding commit depth, since commit hash and printing file path --- README.md | 24 ++++++++++++++++++++++-- truffleHog/truffleHog.py | 25 ++++++++++++++++++++----- 2 files changed, 42 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 0f4aaf0ccf79..e335ff949888 100644 --- a/README.md +++ b/README.md @@ -37,9 +37,29 @@ Feel free to also contribute high signal regexes upstream that you think will be ## How it works This module will go through the entire commit history of each branch, and check each diff from each commit, and check for secrets. This is both by regex and by entropy. For entropy checks, trufflehog will evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen. +## Help + +``` +Find secrets hidden in the depths of git. + +positional arguments: + git_url URL for secret searching + +optional arguments: + -h, --help show this help message and exit + --json Output in JSON + --regex Enable high signal regex checks + --entropy DO_ENTROPY Enable entropy checks + --since_commit SINCE_COMMIT + Only scan from a given commit hash + --max_depth MAX_DEPTH + The max commit depth to go back when searching for + secrets +``` + ## Wishlist - ~~A way to detect and not scan binary diffs~~ - ~~Don't rescan diffs if already looked at in another branch~~ -- A since commit X feature -- Print the file affected +- ~~A since commit X feature~~ +- ~~Print the file affected~~ diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 1b3ca1008429..de5c45a9673a 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -16,14 +16,18 @@ def main(): parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') parser.add_argument('--json', dest="output_json", action="store_true", help="Output in JSON") - parser.add_argument("--regex", dest="do_regex", action="store_true") - parser.add_argument("--entropy", dest="do_entropy") + parser.add_argument("--regex", dest="do_regex", action="store_true", help="Enable high signal regex checks") + parser.add_argument("--entropy", dest="do_entropy", help="Enable entropy checks") + parser.add_argument("--since_commit", dest="since_commit", help="Only scan from a given commit hash") + parser.add_argument("--max_depth", dest="max_depth", help="The max commit depth to go back when searching for secrets") parser.add_argument('git_url', type=str, help='URL for secret searching') parser.set_defaults(regex=False) + parser.set_defaults(max_depth=None) + parser.set_defaults(since_commit=None) parser.set_defaults(entropy=True) args = parser.parse_args() do_entropy = str2bool(args.do_entropy) - output = find_strings(args.git_url, args.output_json, args.do_regex, do_entropy) + output = find_strings(args.git_url, args.since_commit, args.max_depth, args.output_json, args.do_regex, do_entropy) project_path = output["project_path"] shutil.rmtree(project_path, onerror=del_rw) @@ -98,6 +102,7 @@ def print_results(printJson, issue): printableDiff = issue['printDiff'] commitHash = issue['commitHash'] reason = issue['reason'] + path = issue['path'] if printJson: print(json.dumps(issue, sort_keys=True, indent=4)) @@ -109,6 +114,8 @@ def print_results(printJson, issue): print(dateStr) hashStr = "{}Hash: {}{}".format(bcolors.OKGREEN, commitHash, bcolors.ENDC) print(hashStr) + filePath = "{}Filepath: {}{}".format(bcolors.OKGREEN, path, bcolors.ENDC) + print(filePath) if sys.version_info >= (3, 0): branchStr = "{}Branch: {}{}".format(bcolors.OKGREEN, branch_name, bcolors.ENDC) @@ -145,6 +152,7 @@ def find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob, com if len(stringsFound) > 0: entropicDiff = {} entropicDiff['date'] = commit_time + entropicDiff['path'] = blob.b_path if blob.b_path else blob.a_path entropicDiff['branch'] = branch_name entropicDiff['commit'] = prev_commit.message entropicDiff['diff'] = blob.diff.decode('utf-8', errors='replace') @@ -163,6 +171,7 @@ def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, comm if found_strings: foundRegex = {} foundRegex['date'] = commit_time + foundRegex['path'] = blob.b_path if blob.b_path else blob.a_path foundRegex['branch'] = branch_name foundRegex['commit'] = prev_commit.message foundRegex['diff'] = blob.diff.decode('utf-8', errors='replace') @@ -176,13 +185,14 @@ def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, comm -def find_strings(git_url, printJson=False, do_regex=False, do_entropy=True): +def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do_regex=False, do_entropy=True): output = {"entropicDiffs": []} project_path = clone_git_repo(git_url) repo = Repo(project_path) already_searched = set() for remote_branch in repo.remotes.origin.fetch(): + since_commit_reached = False branch_name = remote_branch.name.split('/')[1] try: repo.git.checkout(remote_branch, b=branch_name) @@ -190,8 +200,13 @@ def find_strings(git_url, printJson=False, do_regex=False, do_entropy=True): pass prev_commit = None - for curr_commit in repo.iter_commits(): + for curr_commit in repo.iter_commits(max_count=max_depth): commitHash = curr_commit.hexsha + if commitHash == since_commit: + since_commit_reached = True + if since_commit and since_commit_reached: + prev_commit = curr_commit + continue if not prev_commit: pass else: From 7a6fdf69af5a30d5943f258ecedbc754b420494a Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 10 Dec 2017 14:02:22 -0800 Subject: [PATCH 061/108] updating version --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index e66b86a425a8..a354d56ba103 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.1', + version='2.0.2', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', From 147a0b549aed7bf101bb701956a84622f13b6252 Mon Sep 17 00:00:00 2001 From: Dylan Date: Mon, 11 Dec 2017 19:39:00 -0800 Subject: [PATCH 062/108] fixing max_depth issue --- truffleHog/truffleHog.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index de5c45a9673a..87b2ba8163e0 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -22,7 +22,7 @@ def main(): parser.add_argument("--max_depth", dest="max_depth", help="The max commit depth to go back when searching for secrets") parser.add_argument('git_url', type=str, help='URL for secret searching') parser.set_defaults(regex=False) - parser.set_defaults(max_depth=None) + parser.set_defaults(max_depth=1000000) parser.set_defaults(since_commit=None) parser.set_defaults(entropy=True) args = parser.parse_args() From 20e8892a455510bab9487a97464065c56b6b075d Mon Sep 17 00:00:00 2001 From: Milo Minderbinder Date: Tue, 28 Nov 2017 18:06:17 -0500 Subject: [PATCH 063/108] add path filter support with include/exclude regex files --- truffleHog/truffleHog.py | 57 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 56 insertions(+), 1 deletion(-) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 87b2ba8163e0..110f13bad8e6 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -10,9 +10,15 @@ import os import json import stat +import re from regexChecks import regexes from git import Repo + +INCLUDE_PATTERNS = [] +EXCLUDE_PATTERNS = [] + + def main(): parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') parser.add_argument('--json', dest="output_json", action="store_true", help="Output in JSON") @@ -20,6 +26,16 @@ def main(): parser.add_argument("--entropy", dest="do_entropy", help="Enable entropy checks") parser.add_argument("--since_commit", dest="since_commit", help="Only scan from a given commit hash") parser.add_argument("--max_depth", dest="max_depth", help="The max commit depth to go back when searching for secrets") + parser.add_argument('-i', '--include', type=argparse.FileType('r'), metavar='INCLUDE_FILE', + help='File with regular expressions (one per line), at least one of which must match a Git ' + 'object path in order for it to be scanned; lines starting with "#" are treated as ' + 'comments and are ignored. If empty or not provided (default), all Git object paths are ' + 'included unless otherwise excluded via the --exclude option.') + parser.add_argument('-x', '--exclude', type=argparse.FileType('r'), metavar='EXCLUDE_FILE', + help='File with regular expressions (one per line), none of which may match a Git object path ' + 'in order for it to be scanned; lines starting with "#" are treated as comments and are ' + 'ignored. If empty or not provided (default), no Git object paths are excluded unless ' + 'effectively excluded via the --include option.') parser.add_argument('git_url', type=str, help='URL for secret searching') parser.set_defaults(regex=False) parser.set_defaults(max_depth=1000000) @@ -27,6 +43,17 @@ def main(): parser.set_defaults(entropy=True) args = parser.parse_args() do_entropy = str2bool(args.do_entropy) + + # read & compile path inclusion/exclusion patterns + if args.include: + for pattern in set(l[:-1].lstrip() for l in args.include): + if pattern and not pattern.startswith('#'): + INCLUDE_PATTERNS.append(re.compile(pattern)) + if args.exclude: + for pattern in set(l[:-1].lstrip() for l in args.exclude): + if pattern and not pattern.startswith('#'): + EXCLUDE_PATTERNS.append(re.compile(pattern)) + output = find_strings(args.git_url, args.since_commit, args.max_depth, args.output_json, args.do_regex, do_entropy) project_path = output["project_path"] shutil.rmtree(project_path, onerror=del_rw) @@ -183,6 +210,32 @@ def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, comm return regex_matches +def is_excluded(blob, include_patterns=None, exclude_patterns=None): + """Check if the diff blob object should excluded from analysis. + + If defined and non-empty, `include_patterns` has precedence over `exclude_patterns`, such that a blob that is not + matched by any of the defined `include_patterns` will be excluded, even when it is not matched by any of the defined + `exclude_patterns`. If either `include_patterns` or `exclude_patterns` are undefined or empty, they will have no + effect, respectively. No blob is excluded by this function when called with default arguments. + + :param blob: a Git diff blob object + :param include_patterns: iterable of compiled regular expression objects; when non-empty, at least one pattern must + match the blob object for it _not_ to be excluded; if empty or None, all blobs are included, unless excluded via + `exclude_patterns` + :param exclude_patterns: iterable of compiled regular expression objects; when non-empty, _none_ of the patterns may + match the blob object for it _not_ to be excluded; if empty or None, no blobs are excluded if not otherwise + excluded via `include_patterns` + :return: True if the blob is not matched by `include_patterns` (when provided) or if it is matched by + `exclude_patterns` (when provided), otherwise returns False + """ + for path in (blob.a_path, blob.b_path): + if not path: # object path did not exist previously, or is deleted + continue + if include_patterns and not any(p.match(path) for p in include_patterns): + return True + if exclude_patterns and any(p.match(path) for p in exclude_patterns): + return True + return False def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do_regex=False, do_entropy=True): @@ -222,7 +275,9 @@ def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do printableDiff = blob.diff.decode('utf-8', errors='replace') if printableDiff.startswith("Binary files"): continue - commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') + if is_excluded(blob, INCLUDE_PATTERNS, EXCLUDE_PATTERNS): + continue + commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') foundIssues = [] if do_entropy: entropicDiff = find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash) From 672c99bbe9b739916949f849cd211d3a6a64e602 Mon Sep 17 00:00:00 2001 From: Milo Minderbinder Date: Wed, 29 Nov 2017 00:08:50 -0500 Subject: [PATCH 064/108] document --include & --exclude options in README --- README.md | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e335ff949888..97fad4757bf5 100644 --- a/README.md +++ b/README.md @@ -4,8 +4,6 @@ Searches through git repositories for secrets, digging deep into commit history ## NEW Trufflehog previously functioned by running entropy checks on git diffs. This functionality still exists, but high signal regex checks have been added, and the ability to surpress entropy checking has also been added. -These features help cut down on noise, and makes the tool easier to shove into a devops pipeline. - ``` truffleHog --regex --entropy=False https://github.com/dxa4481/truffleHog.git @@ -17,6 +15,34 @@ or truffleHog file:///user/dxa4481/codeprojects/truffleHog/ ``` +With the `--include` and `--exclude` options, it is also possible to limit scanning to a subset of objects in the Git history by defining regular expressions (one per line) in a file to match the targeted object paths. To illustrate, see the example include and exclude files below: + +_include-patterns.txt:_ +```ini +src/ +# lines beginning with "#" are treated as comments and are ignored +gradle/ +# regexes must match the entire path, but can use python's regex syntax for +# case-insensitive matching and other advanced options +(?i).*\.(properties|conf|ini|txt|y(a)?ml)$ +(.*/)?id_[rd]sa$ +``` + +_exclude-patterns.txt:_ +```ini +(.*/)?\.classpath$ +.*\.jmx$ +(.*/)?test/(.*/)?resources/ +``` + +These filter files could then be applied by: +```bash +trufflehog --include include-patterns.txt --exclude exclude-patterns.txt file://path/to/my/repo.git +``` +With these filters, issues found in files in the root-level `src` directory would be reported, unless they had the `.classpath` or `.jmx` extension, or if they were found in the `src/test/dev/resources/` directory, for example. Additional usage information is provided when calling `trufflehog` with the `-h` or `--help` options. + +These features help cut down on noise, and makes the tool easier to shove into a devops pipeline. + ![Example](https://i.imgur.com/YAXndLD.png) ## Install From cb150c1d2c116738c1085db621558359eda26bca Mon Sep 17 00:00:00 2001 From: Milo Minderbinder Date: Wed, 13 Dec 2017 13:39:47 -0500 Subject: [PATCH 065/108] rename path filter opts, add tests, update README * rename `--include` and `--exclude` options to `--include_paths` and `--exclude_paths`, respectively, in order to improve clarity of their function, especially in the case that string/result filtering options are added in a future release * add tests for path filtering with inclusion and exclusion rules * update README.md with new option names and help output --- README.md | 18 +++++++++-- test_all.py | 65 ++++++++++++++++++++++++++++++++++++++++ truffleHog/truffleHog.py | 62 +++++++++++++++++++------------------- 3 files changed, 111 insertions(+), 34 deletions(-) diff --git a/README.md b/README.md index 97fad4757bf5..73c8996eb4cf 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ or truffleHog file:///user/dxa4481/codeprojects/truffleHog/ ``` -With the `--include` and `--exclude` options, it is also possible to limit scanning to a subset of objects in the Git history by defining regular expressions (one per line) in a file to match the targeted object paths. To illustrate, see the example include and exclude files below: +With the `--include_paths` and `--exclude_paths` options, it is also possible to limit scanning to a subset of objects in the Git history by defining regular expressions (one per line) in a file to match the targeted object paths. To illustrate, see the example include and exclude files below: _include-patterns.txt:_ ```ini @@ -37,7 +37,7 @@ _exclude-patterns.txt:_ These filter files could then be applied by: ```bash -trufflehog --include include-patterns.txt --exclude exclude-patterns.txt file://path/to/my/repo.git +trufflehog --include_paths include-patterns.txt --exclude_paths exclude-patterns.txt file://path/to/my/repo.git ``` With these filters, issues found in files in the root-level `src` directory would be reported, unless they had the `.classpath` or `.jmx` extension, or if they were found in the `src/test/dev/resources/` directory, for example. Additional usage information is provided when calling `trufflehog` with the `-h` or `--help` options. @@ -81,6 +81,20 @@ optional arguments: --max_depth MAX_DEPTH The max commit depth to go back when searching for secrets + -i INCLUDE_PATHS_FILE, --include_paths INCLUDE_PATHS_FILE + File with regular expressions (one per line), at least + one of which must match a Git object path in order for + it to be scanned; lines starting with "#" are treated + as comments and are ignored. If empty or not provided + (default), all Git object paths are included unless + otherwise excluded via the --exclude_paths option. + -x EXCLUDE_PATHS_FILE, --exclude_paths EXCLUDE_PATHS_FILE + File with regular expressions (one per line), none of + which may match a Git object path in order for it to + be scanned; lines starting with "#" are treated as + comments and are ignored. If empty or not provided + (default), no Git object paths are excluded unless + effectively excluded via the --include_paths option. ``` ## Wishlist diff --git a/test_all.py b/test_all.py index 3fc2aad2694f..0cf5cc40868e 100644 --- a/test_all.py +++ b/test_all.py @@ -1,5 +1,7 @@ import unittest import os +import re +from collections import namedtuple from truffleHog import truffleHog @@ -22,5 +24,68 @@ def test_unicode_expection(self): except UnicodeEncodeError: self.fail("Unicode print error") + def test_path_included(self): + Blob = namedtuple('Blob', ('a_path', 'b_path')) + blobs = { + 'file-root-dir': Blob('file', 'file'), + 'file-sub-dir': Blob('sub-dir/file', 'sub-dir/file'), + 'new-file-root-dir': Blob(None, 'new-file'), + 'new-file-sub-dir': Blob(None, 'sub-dir/new-file'), + 'deleted-file-root-dir': Blob('deleted-file', None), + 'deleted-file-sub-dir': Blob('sub-dir/deleted-file', None), + 'renamed-file-root-dir': Blob('file', 'renamed-file'), + 'renamed-file-sub-dir': Blob('sub-dir/file', 'sub-dir/renamed-file'), + 'moved-file-root-dir-to-sub-dir': Blob('moved-file', 'sub-dir/moved-file'), + 'moved-file-sub-dir-to-root-dir': Blob('sub-dir/moved-file', 'moved-file'), + 'moved-file-sub-dir-to-sub-dir': Blob('sub-dir/moved-file', 'moved/moved-file'), + } + src_paths = set(blob.a_path for blob in blobs.values() if blob.a_path is not None) + dest_paths = set(blob.b_path for blob in blobs.values() if blob.b_path is not None) + all_paths = src_paths.union(dest_paths) + all_paths_patterns = [re.compile(re.escape(p)) for p in all_paths] + overlap_patterns = [re.compile(r'sub-dir/.*'), re.compile(r'moved/'), re.compile(r'[^/]*file$')] + sub_dirs_patterns = [re.compile(r'.+/.+')] + deleted_paths_patterns = [re.compile(r'(.*/)?deleted-file$')] + for name, blob in blobs.items(): + self.assertTrue(truffleHog.path_included(blob), + '{} should be included by default'.format(blob)) + self.assertTrue(truffleHog.path_included(blob, include_patterns=all_paths_patterns), + '{} should be included with include_patterns: {}'.format(blob, all_paths_patterns)) + self.assertFalse(truffleHog.path_included(blob, exclude_patterns=all_paths_patterns), + '{} should be excluded with exclude_patterns: {}'.format(blob, all_paths_patterns)) + self.assertFalse(truffleHog.path_included(blob, + include_patterns=all_paths_patterns, + exclude_patterns=all_paths_patterns), + '{} should be excluded with overlapping patterns: \n\tinclude: {}\n\texclude: {}'.format( + blob, all_paths_patterns, all_paths_patterns)) + self.assertFalse(truffleHog.path_included(blob, + include_patterns=overlap_patterns, + exclude_patterns=all_paths_patterns), + '{} should be excluded with overlapping patterns: \n\tinclude: {}\n\texclude: {}'.format( + blob, overlap_patterns, all_paths_patterns)) + self.assertFalse(truffleHog.path_included(blob, + include_patterns=all_paths_patterns, + exclude_patterns=overlap_patterns), + '{} should be excluded with overlapping patterns: \n\tinclude: {}\n\texclude: {}'.format( + blob, all_paths_patterns, overlap_patterns)) + path = blob.b_path if blob.b_path else blob.a_path + if '/' in path: + self.assertTrue(truffleHog.path_included(blob, include_patterns=sub_dirs_patterns), + '{}: inclusion should include sub directory paths: {}'.format(blob, sub_dirs_patterns)) + self.assertFalse(truffleHog.path_included(blob, exclude_patterns=sub_dirs_patterns), + '{}: exclusion should exclude sub directory paths: {}'.format(blob, sub_dirs_patterns)) + else: + self.assertFalse(truffleHog.path_included(blob, include_patterns=sub_dirs_patterns), + '{}: inclusion should exclude root directory paths: {}'.format(blob, sub_dirs_patterns)) + self.assertTrue(truffleHog.path_included(blob, exclude_patterns=sub_dirs_patterns), + '{}: exclusion should include root directory paths: {}'.format(blob, sub_dirs_patterns)) + if name.startswith('deleted-file-'): + self.assertTrue(truffleHog.path_included(blob, include_patterns=deleted_paths_patterns), + '{}: inclusion should match deleted paths: {}'.format(blob, deleted_paths_patterns)) + self.assertFalse(truffleHog.path_included(blob, exclude_patterns=deleted_paths_patterns), + '{}: exclusion should match deleted paths: {}'.format(blob, deleted_paths_patterns)) + + + if __name__ == '__main__': unittest.main() diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 110f13bad8e6..5bbc8268aa63 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -15,10 +15,6 @@ from git import Repo -INCLUDE_PATTERNS = [] -EXCLUDE_PATTERNS = [] - - def main(): parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') parser.add_argument('--json', dest="output_json", action="store_true", help="Output in JSON") @@ -26,16 +22,16 @@ def main(): parser.add_argument("--entropy", dest="do_entropy", help="Enable entropy checks") parser.add_argument("--since_commit", dest="since_commit", help="Only scan from a given commit hash") parser.add_argument("--max_depth", dest="max_depth", help="The max commit depth to go back when searching for secrets") - parser.add_argument('-i', '--include', type=argparse.FileType('r'), metavar='INCLUDE_FILE', + parser.add_argument('-i', '--include_paths', type=argparse.FileType('r'), metavar='INCLUDE_PATHS_FILE', help='File with regular expressions (one per line), at least one of which must match a Git ' 'object path in order for it to be scanned; lines starting with "#" are treated as ' 'comments and are ignored. If empty or not provided (default), all Git object paths are ' - 'included unless otherwise excluded via the --exclude option.') - parser.add_argument('-x', '--exclude', type=argparse.FileType('r'), metavar='EXCLUDE_FILE', + 'included unless otherwise excluded via the --exclude_paths option.') + parser.add_argument('-x', '--exclude_paths', type=argparse.FileType('r'), metavar='EXCLUDE_PATHS_FILE', help='File with regular expressions (one per line), none of which may match a Git object path ' 'in order for it to be scanned; lines starting with "#" are treated as comments and are ' 'ignored. If empty or not provided (default), no Git object paths are excluded unless ' - 'effectively excluded via the --include option.') + 'effectively excluded via the --include_paths option.') parser.add_argument('git_url', type=str, help='URL for secret searching') parser.set_defaults(regex=False) parser.set_defaults(max_depth=1000000) @@ -45,16 +41,19 @@ def main(): do_entropy = str2bool(args.do_entropy) # read & compile path inclusion/exclusion patterns - if args.include: - for pattern in set(l[:-1].lstrip() for l in args.include): + path_inclusions = [] + path_exclusions = [] + if args.include_paths: + for pattern in set(l[:-1].lstrip() for l in args.include_paths): if pattern and not pattern.startswith('#'): - INCLUDE_PATTERNS.append(re.compile(pattern)) - if args.exclude: - for pattern in set(l[:-1].lstrip() for l in args.exclude): + path_inclusions.append(re.compile(pattern)) + if args.exclude_paths: + for pattern in set(l[:-1].lstrip() for l in args.exclude_paths): if pattern and not pattern.startswith('#'): - EXCLUDE_PATTERNS.append(re.compile(pattern)) + path_exclusions.append(re.compile(pattern)) - output = find_strings(args.git_url, args.since_commit, args.max_depth, args.output_json, args.do_regex, do_entropy) + output = find_strings(args.git_url, args.since_commit, args.max_depth, args.output_json, args.do_regex, do_entropy, + path_inclusions, path_exclusions) project_path = output["project_path"] shutil.rmtree(project_path, onerror=del_rw) @@ -210,35 +209,34 @@ def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, comm return regex_matches -def is_excluded(blob, include_patterns=None, exclude_patterns=None): - """Check if the diff blob object should excluded from analysis. +def path_included(blob, include_patterns=None, exclude_patterns=None): + """Check if the diff blob object should included in analysis. If defined and non-empty, `include_patterns` has precedence over `exclude_patterns`, such that a blob that is not matched by any of the defined `include_patterns` will be excluded, even when it is not matched by any of the defined `exclude_patterns`. If either `include_patterns` or `exclude_patterns` are undefined or empty, they will have no - effect, respectively. No blob is excluded by this function when called with default arguments. + effect, respectively. All blobs are included by this function when called with default arguments. :param blob: a Git diff blob object :param include_patterns: iterable of compiled regular expression objects; when non-empty, at least one pattern must - match the blob object for it _not_ to be excluded; if empty or None, all blobs are included, unless excluded via + match the blob object for it to be included; if empty or None, all blobs are included, unless excluded via `exclude_patterns` :param exclude_patterns: iterable of compiled regular expression objects; when non-empty, _none_ of the patterns may - match the blob object for it _not_ to be excluded; if empty or None, no blobs are excluded if not otherwise + match the blob object for it to be included; if empty or None, no blobs are excluded if not otherwise excluded via `include_patterns` - :return: True if the blob is not matched by `include_patterns` (when provided) or if it is matched by - `exclude_patterns` (when provided), otherwise returns False + :return: False if the blob is _not_ matched by `include_patterns` (when provided) or if it is matched by + `exclude_patterns` (when provided), otherwise returns True """ - for path in (blob.a_path, blob.b_path): - if not path: # object path did not exist previously, or is deleted - continue - if include_patterns and not any(p.match(path) for p in include_patterns): - return True - if exclude_patterns and any(p.match(path) for p in exclude_patterns): - return True - return False + path = blob.b_path if blob.b_path else blob.a_path + if include_patterns and not any(p.match(path) for p in include_patterns): + return False + if exclude_patterns and any(p.match(path) for p in exclude_patterns): + return False + return True -def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do_regex=False, do_entropy=True): +def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do_regex=False, do_entropy=True, + path_inclusions=None, path_exclusions=None): output = {"entropicDiffs": []} project_path = clone_git_repo(git_url) repo = Repo(project_path) @@ -275,7 +273,7 @@ def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do printableDiff = blob.diff.decode('utf-8', errors='replace') if printableDiff.startswith("Binary files"): continue - if is_excluded(blob, INCLUDE_PATTERNS, EXCLUDE_PATTERNS): + if not path_included(blob, path_inclusions, path_exclusions): continue commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') foundIssues = [] From 827425ddf989fe7d4553bcc528fc38f1e00c18ce Mon Sep 17 00:00:00 2001 From: Stephen Date: Thu, 14 Dec 2017 15:42:01 -0800 Subject: [PATCH 066/108] add badges --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index e335ff949888..d501623ff51b 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,7 @@ # Truffle Hog +[![Build Status](https://travis-ci.org/dxa4481/truffleHog.svg?branch=master)](https://travis-ci.org/dxa4481/truffleHog) +[![codecov](https://codecov.io/gh/dxa4481/truffleHog/branch/master/graph/badge.svg)](https://codecov.io/gh/dxa4481/truffleHog) + Searches through git repositories for secrets, digging deep into commit history and branches. This is effective at finding secrets accidentally committed. ## NEW From 5717db6fa94ea13244edfa37f9f62d1856db8aca Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 14 Jan 2018 14:56:27 -0800 Subject: [PATCH 067/108] adding configurable rules and making the library import better --- testRules.json | 3 +++ truffleHog/truffleHog.py | 24 +++++++++++++++++++++--- 2 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 testRules.json diff --git a/testRules.json b/testRules.json new file mode 100644 index 000000000000..3fe4836cb9fe --- /dev/null +++ b/testRules.json @@ -0,0 +1,3 @@ +{ + "RSA private key": "-----BEGIN EC PRIVATE KEY-----" +} diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index de5c45a9673a..ab4b94c16675 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -8,6 +8,7 @@ import argparse import tempfile import os +import re import json import stat from regexChecks import regexes @@ -17,15 +18,30 @@ def main(): parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') parser.add_argument('--json', dest="output_json", action="store_true", help="Output in JSON") parser.add_argument("--regex", dest="do_regex", action="store_true", help="Enable high signal regex checks") + parser.add_argument("--rules", dest="rules", help="Ignore default regexes and source from json list file") parser.add_argument("--entropy", dest="do_entropy", help="Enable entropy checks") parser.add_argument("--since_commit", dest="since_commit", help="Only scan from a given commit hash") parser.add_argument("--max_depth", dest="max_depth", help="The max commit depth to go back when searching for secrets") parser.add_argument('git_url', type=str, help='URL for secret searching') parser.set_defaults(regex=False) - parser.set_defaults(max_depth=None) + parser.set_defaults(rules={}) + parser.set_defaults(max_depth=1000000) parser.set_defaults(since_commit=None) parser.set_defaults(entropy=True) args = parser.parse_args() + rules = {} + if args.rules: + try: + with open(args.rules, "r") as ruleFile: + rules = json.loads(ruleFile.read()) + for rule in rules: + rules[rule] = re.compile(rules[rule]) + except (IOError, ValueError) as e: + raise("Error reading rules file") + for regex in dict(regexes): + del regexes[regex] + for regex in rules: + regexes[regex] = rules[regex] do_entropy = str2bool(args.do_entropy) output = find_strings(args.git_url, args.since_commit, args.max_depth, args.output_json, args.do_regex, do_entropy) project_path = output["project_path"] @@ -186,7 +202,7 @@ def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, comm def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do_regex=False, do_entropy=True): - output = {"entropicDiffs": []} + output = {"foundIssues": []} project_path = clone_git_repo(git_url) repo = Repo(project_path) already_searched = set() @@ -233,9 +249,11 @@ def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do foundIssues += found_regexes for foundIssue in foundIssues: print_results(printJson, foundIssue) + output["foundIssues"] += foundIssues prev_commit = curr_commit - output["project_path"] = project_path + output["project_path"] = project_path + output["clone_uri"] = git_url return output if __name__ == "__main__": From b75e0f1f65b8046b1a20771c15e065b4914b1e8f Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 14 Jan 2018 15:00:25 -0800 Subject: [PATCH 068/108] increasing version --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index a354d56ba103..12f36adfa766 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.2', + version='2.0.3', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', From 32c03c80fff2a346774ab25bf1b046a6bd9e2b60 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 14 Jan 2018 15:01:17 -0800 Subject: [PATCH 069/108] Updating version --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index 12f36adfa766..79873391715c 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.3', + version='2.0.4', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', From e23e87a2d5e5498edda80bc7f3a74d2e0ae9b734 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 14 Jan 2018 16:09:19 -0800 Subject: [PATCH 070/108] Fixing python 3 relative import errors --- truffleHog/{ => defaultRegexes}/regexChecks.py | 0 truffleHog/truffleHog.py | 2 +- 2 files changed, 1 insertion(+), 1 deletion(-) rename truffleHog/{ => defaultRegexes}/regexChecks.py (100%) diff --git a/truffleHog/regexChecks.py b/truffleHog/defaultRegexes/regexChecks.py similarity index 100% rename from truffleHog/regexChecks.py rename to truffleHog/defaultRegexes/regexChecks.py diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index ab4b94c16675..c879a8a1a9b7 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -11,7 +11,7 @@ import re import json import stat -from regexChecks import regexes +from defaultRegexes.regexChecks import regexes from git import Repo def main(): From ee37bab650d1da15a74508aa20f4a246c746fae8 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 14 Jan 2018 16:09:46 -0800 Subject: [PATCH 071/108] updating version --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index 79873391715c..559215191e4f 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.4', + version='2.0.5', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', From 1bb20c69ce85106e853d95d681525067e4f99ac8 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 14 Jan 2018 16:25:34 -0800 Subject: [PATCH 072/108] adding package init file --- truffleHog/defaultRegexes/__init__.py | 1 + 1 file changed, 1 insertion(+) create mode 100644 truffleHog/defaultRegexes/__init__.py diff --git a/truffleHog/defaultRegexes/__init__.py b/truffleHog/defaultRegexes/__init__.py new file mode 100644 index 000000000000..8b137891791f --- /dev/null +++ b/truffleHog/defaultRegexes/__init__.py @@ -0,0 +1 @@ + From 461953f04c664dbc56a68a171131415578d27855 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 14 Jan 2018 16:25:58 -0800 Subject: [PATCH 073/108] increasing version again --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index 559215191e4f..eacff76b1e23 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.5', + version='2.0.6', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', From e3e2dfaca124cfffa444ad36974d570909b25cbc Mon Sep 17 00:00:00 2001 From: King's Way Date: Wed, 24 Jan 2018 20:16:32 +0800 Subject: [PATCH 074/108] Fix the problem that the first commit was missed --- truffleHog/truffleHog.py | 55 ++++++++++++++++++++++------------------ 1 file changed, 31 insertions(+), 24 deletions(-) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index c879a8a1a9b7..c4c84a882d47 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -13,6 +13,7 @@ import stat from defaultRegexes.regexChecks import regexes from git import Repo +from git import NULL_TREE def main(): parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') @@ -223,33 +224,39 @@ def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do if since_commit and since_commit_reached: prev_commit = curr_commit continue + + # if not prev_commit, then curr_commit is the newest commit. And we have nothing to diff with. + # But we will diff the first commit with NULL_TREE here to check the oldest code. + # In this way, no commit will be missed. if not prev_commit: - pass + prev_commit = list(repo.iter_commits(max_count=max_depth))[-1] + diff = prev_commit.diff(NULL_TREE, create_patch=True) else: - #avoid searching the same diffs - hashes = str(prev_commit) + str(curr_commit) - if hashes in already_searched: - prev_commit = curr_commit - continue - already_searched.add(hashes) - diff = prev_commit.diff(curr_commit, create_patch=True) - for blob in diff: - printableDiff = blob.diff.decode('utf-8', errors='replace') - if printableDiff.startswith("Binary files"): - continue - commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') - foundIssues = [] - if do_entropy: - entropicDiff = find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash) - if entropicDiff: - foundIssues.append(entropicDiff) - if do_regex: - found_regexes = regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash) - foundIssues += found_regexes - for foundIssue in foundIssues: - print_results(printJson, foundIssue) - output["foundIssues"] += foundIssues + + # avoid searching the same diffs + hashes = str(prev_commit) + str(curr_commit) + if hashes in already_searched: + prev_commit = curr_commit + continue + already_searched.add(hashes) + + for blob in diff: + printableDiff = blob.diff.decode('utf-8', errors='replace') + if printableDiff.startswith("Binary files"): + continue + commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') + foundIssues = [] + if do_entropy: + entropicDiff = find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash) + if entropicDiff: + foundIssues.append(entropicDiff) + if do_regex: + found_regexes = regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash) + foundIssues += found_regexes + for foundIssue in foundIssues: + print_results(printJson, foundIssue) + output["foundIssues"] += foundIssues prev_commit = curr_commit output["project_path"] = project_path From 2c2e2b4fe2bf07fcf0994addeac57b150dd40018 Mon Sep 17 00:00:00 2001 From: Dylan Date: Mon, 29 Jan 2018 06:08:47 -0800 Subject: [PATCH 075/108] fixing weird variable scoping issue --- setup.py | 6 +++--- truffleHog/truffleHog.py | 15 ++++++++++----- 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/setup.py b/setup.py index eacff76b1e23..521eab2f6c5d 100644 --- a/setup.py +++ b/setup.py @@ -1,14 +1,14 @@ -from setuptools import setup +from setuptools import setup, find_packages setup( name='truffleHog', - version='2.0.6', + version='2.0.87', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', author_email='dxa4481@rit.edu', license='GNU', - packages =['truffleHog'], + packages = ['truffleHog', 'truffleHog.defaultRegexes'], install_requires=[ 'GitPython == 2.1.1' ], diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index c879a8a1a9b7..6ae9d5a39a71 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -1,6 +1,7 @@ #!/usr/bin/env python # -*- coding: utf-8 -*- +from __future__ import absolute_import import shutil import sys import math @@ -178,10 +179,14 @@ def find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob, com entropicDiff['reason'] = "High Entropy" return entropicDiff -def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash): +def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash, custom_regexes={}): + if custom_regexes: + secret_regexes = custom_regexes + else: + secret_regexes = regexes regex_matches = [] - for key in regexes: - found_strings = regexes[key].findall(printableDiff) + for key in secret_regexes: + found_strings = secret_regexes[key].findall(printableDiff) for found_string in found_strings: found_diff = printableDiff.replace(printableDiff, bcolors.WARNING + found_string + bcolors.ENDC) if found_strings: @@ -201,7 +206,7 @@ def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, comm -def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do_regex=False, do_entropy=True): +def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do_regex=False, do_entropy=True, custom_regexes={}): output = {"foundIssues": []} project_path = clone_git_repo(git_url) repo = Repo(project_path) @@ -245,7 +250,7 @@ def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do if entropicDiff: foundIssues.append(entropicDiff) if do_regex: - found_regexes = regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash) + found_regexes = regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash, custom_regexes) foundIssues += found_regexes for foundIssue in foundIssues: print_results(printJson, foundIssue) From 107d907d12236ae3c1e7871095b3d24c62347219 Mon Sep 17 00:00:00 2001 From: Dylan Date: Mon, 29 Jan 2018 06:45:12 -0800 Subject: [PATCH 076/108] fixing wierd import errors --- setup.py | 2 +- truffleHog/truffleHog.py | 5 ++++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/setup.py b/setup.py index 521eab2f6c5d..eeef16296c9b 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.87', + version='2.0.88', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 6ae9d5a39a71..c0b6071e7370 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -12,8 +12,11 @@ import re import json import stat -from defaultRegexes.regexChecks import regexes from git import Repo +try: + from defaultRegexes.regexChecks import regexes +except ImportError: + from truffleHog.defaultRegexes.regexChecks import regexes def main(): parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') From d2e95aa25a2ae54bedae358b51eaa2de4e3e1700 Mon Sep 17 00:00:00 2001 From: slashdevsda Date: Mon, 5 Mar 2018 19:55:48 +0100 Subject: [PATCH 077/108] Supports slashes into branch names Branch names containing '/' are no more mangled. --- truffleHog/truffleHog.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index c0b6071e7370..c504de37873e 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -217,7 +217,7 @@ def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do for remote_branch in repo.remotes.origin.fetch(): since_commit_reached = False - branch_name = remote_branch.name.split('/')[1] + _, _, branch_name = remote_branch.name.partition('/') try: repo.git.checkout(remote_branch, b=branch_name) except: From a6ca0958e4e5dc858f1a9cc0d76a996a08a98a34 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 1 Apr 2018 18:15:38 -0700 Subject: [PATCH 078/108] not printing json newlines --- truffleHog/truffleHog.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index d65eec73955b..7b15289d69a9 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -126,7 +126,7 @@ def print_results(printJson, issue): path = issue['path'] if printJson: - print(json.dumps(issue, sort_keys=True, indent=4)) + print(json.dumps(issue, sort_keys=True)) else: print("~~~~~~~~~~~~~~~~~~~~~") reason = "{}Reason: {}{}".format(bcolors.OKGREEN, reason, bcolors.ENDC) From 38b7e16d1d9cfed6f2702617da651141b8bd3b58 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 1 Apr 2018 20:45:28 -0700 Subject: [PATCH 079/108] fixing scanning the first commit --- truffleHog/truffleHog.py | 57 +++++++++++++++++++++++++--------------- 1 file changed, 36 insertions(+), 21 deletions(-) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 7b15289d69a9..cd86725f33a2 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -7,6 +7,7 @@ import math import datetime import argparse +import uuid import tempfile import os import re @@ -207,14 +208,40 @@ def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, comm regex_matches.append(foundRegex) return regex_matches +def diff_worker(diff, curr_commit, prev_commit, branch_name, commitHash, custom_regexes, do_entropy, do_regex, printJson): + issues = [] + for blob in diff: + printableDiff = blob.diff.decode('utf-8', errors='replace') + if printableDiff.startswith("Binary files"): + continue + commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') + foundIssues = [] + if do_entropy: + entropicDiff = find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash) + if entropicDiff: + foundIssues.append(entropicDiff) + if do_regex: + found_regexes = regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash, custom_regexes) + foundIssues += found_regexes + for foundIssue in foundIssues: + print_results(printJson, foundIssue) + issues += foundIssues + return issues +def handle_results(output, output_dir, foundIssues): + for foundIssue in foundIssues: + result_path = os.path.join(output_dir, str(uuid.uuid4())) + with open(result_path, "w+") as result_file: + result_file.write(json.dumps(foundIssue)) + output["foundIssues"].append(result_path) + return output - -def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do_regex=False, do_entropy=True, custom_regexes={}): +def find_strings(git_url, since_commit=None, max_depth=1000000, printJson=False, do_regex=False, do_entropy=True, custom_regexes={}): output = {"foundIssues": []} project_path = clone_git_repo(git_url) repo = Repo(project_path) already_searched = set() + output_dir = tempfile.mkdtemp() for remote_branch in repo.remotes.origin.fetch(): since_commit_reached = False @@ -232,33 +259,21 @@ def find_strings(git_url, since_commit=None, max_depth=None, printJson=False, do if since_commit and since_commit_reached: prev_commit = curr_commit continue - # if not prev_commit, then curr_commit is the newest commit. And we have nothing to diff with. # But we will diff the first commit with NULL_TREE here to check the oldest code. # In this way, no commit will be missed. if not prev_commit: - diff = curr_commit.diff(NULL_TREE, create_patch=True) prev_commit = curr_commit + continue else: diff = prev_commit.diff(curr_commit, create_patch=True) - for blob in diff: - printableDiff = blob.diff.decode('utf-8', errors='replace') - if printableDiff.startswith("Binary files"): - continue - commit_time = datetime.datetime.fromtimestamp(prev_commit.committed_date).strftime('%Y-%m-%d %H:%M:%S') - foundIssues = [] - if do_entropy: - entropicDiff = find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash) - if entropicDiff: - foundIssues.append(entropicDiff) - if do_regex: - found_regexes = regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash, custom_regexes) - foundIssues += found_regexes - for foundIssue in foundIssues: - print_results(printJson, foundIssue) - output["foundIssues"] += foundIssues - + foundIssues = diff_worker(diff, curr_commit, prev_commit, branch_name, commitHash, custom_regexes, do_entropy, do_regex, printJson) + output = handle_results(output, output_dir, foundIssues) prev_commit = curr_commit + # Handling the first commit + diff = curr_commit.diff(NULL_TREE, create_patch=True) + foundIssues = diff_worker(diff, curr_commit, prev_commit, branch_name, commitHash, custom_regexes, do_entropy, do_regex, printJson) + output = handle_results(output, output_dir, foundIssues) output["project_path"] = project_path output["clone_uri"] = git_url return output From 0c1985328704306f181e965ada24c440ad6d2ad0 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 1 Apr 2018 20:46:15 -0700 Subject: [PATCH 080/108] bump version --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index eeef16296c9b..26a562e65db7 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.88', + version='2.0.90', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', From 3284a24e957bf61d7806d70000775162c7e29166 Mon Sep 17 00:00:00 2001 From: Quinn Stearns Date: Wed, 11 Apr 2018 19:40:20 -0700 Subject: [PATCH 081/108] Add back repeated diff scanning prevention This check was removed while resolving merge conflicts in f2f593a which appears to have been accidental. --- truffleHog/truffleHog.py | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index cd86725f33a2..14d4e0d18a57 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -267,6 +267,14 @@ def find_strings(git_url, since_commit=None, max_depth=1000000, printJson=False, continue else: diff = prev_commit.diff(curr_commit, create_patch=True) + + # avoid searching the same diffs + hashes = str(prev_commit) + str(curr_commit) + if hashes in already_searched: + prev_commit = curr_commit + continue + already_searched.add(hashes) + foundIssues = diff_worker(diff, curr_commit, prev_commit, branch_name, commitHash, custom_regexes, do_entropy, do_regex, printJson) output = handle_results(output, output_dir, foundIssues) prev_commit = curr_commit From 58690fe2e89d0aef28ae313c8554941104071bb1 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sat, 14 Apr 2018 10:26:10 -0700 Subject: [PATCH 082/108] Changing regexes to install regex package --- requirements.txt | 9 +++++---- scripts/searchOrg.py | 14 ++++++++------ setup.py | 5 +++-- truffleHog/defaultRegexes/__init__.py | 1 - truffleHog/defaultRegexes/regexChecks.py | 19 ------------------- truffleHog/truffleHog.py | 6 ++---- 6 files changed, 18 insertions(+), 36 deletions(-) delete mode 100644 truffleHog/defaultRegexes/__init__.py delete mode 100644 truffleHog/defaultRegexes/regexChecks.py diff --git a/requirements.txt b/requirements.txt index 6c579c6d31e5..f86bb6e93a4e 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,4 +1,5 @@ -GitPython -unittest2 -pytest-cov -codecov +GitPython==2.1.1 +unittest2==1.1.0 +pytest-cov==2.5.1 +codecov==2.0.15 +truffleHogRegexes==0.0.4 diff --git a/scripts/searchOrg.py b/scripts/searchOrg.py index 67007af4ec64..c3bf70edf053 100644 --- a/scripts/searchOrg.py +++ b/scripts/searchOrg.py @@ -3,14 +3,16 @@ via https://github.com/dxa4481/truffleHog/pull/9 """ import requests -import truffleHog +from truffleHog import truffleHog -def get_org_repos(orgname): - response = requests.get(url='https://api.github.com/users/' + orgname + '/repos') +def get_org_repos(orgname, page): + response = requests.get(url='https://api.github.com/users/' + orgname + '/repos?page={}'.format(page)) json = response.json() + if not json: + return None for item in json: if item['private'] == False: print('searching ' + item["html_url"]) - truffleHog.find_strings(item["html_url"]) - -get_org_repos("Netflix") + truffleHog.find_strings(item["html_url"], do_regex=True, do_entropy=False, max_depth=100000) + get_org_repos(orgname, page + 1) +get_org_repos("twitter", 1) diff --git a/setup.py b/setup.py index 26a562e65db7..a00bb61da0ac 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.90', + version='2.0.91', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', @@ -10,7 +10,8 @@ license='GNU', packages = ['truffleHog', 'truffleHog.defaultRegexes'], install_requires=[ - 'GitPython == 2.1.1' + 'GitPython == 2.1.1', + 'truffleHogRegexes == 0.0.4' ], entry_points = { 'console_scripts': ['trufflehog = truffleHog.truffleHog:main'], diff --git a/truffleHog/defaultRegexes/__init__.py b/truffleHog/defaultRegexes/__init__.py deleted file mode 100644 index 8b137891791f..000000000000 --- a/truffleHog/defaultRegexes/__init__.py +++ /dev/null @@ -1 +0,0 @@ - diff --git a/truffleHog/defaultRegexes/regexChecks.py b/truffleHog/defaultRegexes/regexChecks.py deleted file mode 100644 index 27d26f9b6115..000000000000 --- a/truffleHog/defaultRegexes/regexChecks.py +++ /dev/null @@ -1,19 +0,0 @@ -import re - -regexes = { - #"Internal subdomain": re.compile('([a-z0-9]+[.]*supersecretinternal[.]com)'), - "Slack Token": re.compile('(xox[p|b|o|a]-[0-9]{12}-[0-9]{12}-[0-9]{12}-[a-z0-9]{32})'), - "RSA private key": re.compile('-----BEGIN RSA PRIVATE KEY-----'), - "SSH (OPENSSH) private key": re.compile('-----BEGIN OPENSSH PRIVATE KEY-----'), - "SSH (DSA) private key": re.compile('-----BEGIN DSA PRIVATE KEY-----'), - "SSH (EC) private key": re.compile('-----BEGIN EC PRIVATE KEY-----'), - "PGP private key block": re.compile('-----BEGIN PGP PRIVATE KEY BLOCK-----'), - "Facebook Oauth": re.compile('[f|F][a|A][c|C][e|E][b|B][o|O][o|O][k|K].*[\'|"][0-9a-f]{32}[\'|"]'), - "Twitter Oauth": re.compile('[t|T][w|W][i|I][t|T][t|T][e|E][r|R].*[\'|"][0-9a-zA-Z]{35,44}[\'|"]'), - "GitHub": re.compile('[g|G][i|I][t|T][h|H][u|U][b|B].*[[\'|"]0-9a-zA-Z]{35,40}[\'|"]'), - "Google Oauth": re.compile('("client_secret":"[a-zA-Z0-9-_]{24}")'), - "AWS API Key": re.compile('AKIA[0-9A-Z]{16}'), - "Heroku API Key": re.compile('[h|H][e|E][r|R][o|O][k|K][u|U].*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}'), - "Generic Secret": re.compile('[s|S][e|E][c|C][r|R][e|E][t|T].*[\'|"][0-9a-zA-Z]{32,45}[\'|"]'), -} - diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index cd86725f33a2..5a201e5d7044 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -15,10 +15,8 @@ import stat from git import Repo from git import NULL_TREE -try: - from defaultRegexes.regexChecks import regexes -except ImportError: - from truffleHog.defaultRegexes.regexChecks import regexes +from truffleHogRegexes.regexChecks import regexes + def main(): parser = argparse.ArgumentParser(description='Find secrets hidden in the depths of git.') From f368af1b9b5a2c180fb8dd7a88bd0a45777750e2 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sat, 14 Apr 2018 10:27:26 -0700 Subject: [PATCH 083/108] fixing setup issue --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index a00bb61da0ac..960155e58458 100644 --- a/setup.py +++ b/setup.py @@ -8,7 +8,7 @@ author='Dylan Ayrey', author_email='dxa4481@rit.edu', license='GNU', - packages = ['truffleHog', 'truffleHog.defaultRegexes'], + packages = ['truffleHog'], install_requires=[ 'GitPython == 2.1.1', 'truffleHogRegexes == 0.0.4' From 81447f55da863286b07a04f4ce195009fae38fd4 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sat, 14 Apr 2018 10:36:35 -0700 Subject: [PATCH 084/108] adding Dockerfile --- Dockerfile | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 Dockerfile diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 000000000000..b0168a82201d --- /dev/null +++ b/Dockerfile @@ -0,0 +1,7 @@ +FROM python:3-alpine +RUN apk add --no-cache git && pip install trufflehog +RUN adduser -S truffleHog +USER truffleHog +WORKDIR /proj +ENTRYPOINT [ "trufflehog" ] +CMD [ "-h" ] From 9def4f5eb76dc43c8d68b40940a473546b71c97a Mon Sep 17 00:00:00 2001 From: Dylan Ayrey Date: Sun, 6 May 2018 10:07:07 -0700 Subject: [PATCH 085/108] Update README.md --- README.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e335ff949888..1eb290cd1042 100644 --- a/README.md +++ b/README.md @@ -26,20 +26,29 @@ pip install truffleHog ## Customizing -Custom regexes can be added to the following file: +Custom regexes can be added with the following flag `--rules /path/to/rules`. This should be a json file of the following format: ``` -truffleHog/truffleHog/regexChecks.py +{ + "RSA private key": "-----BEGIN EC PRIVATE KEY-----" +} ``` Things like subdomain enumeration, s3 bucket detection, and other useful regexes highly custom to the situation can be added. Feel free to also contribute high signal regexes upstream that you think will benifit the community. Things like Azure keys, Twilio keys, Google Compute keys, are welcome, provided a high signal regex can be constructed. +Trufflehog's base rule set sources from https://github.com/dxa4481/truffleHogRegexes/blob/master/truffleHogRegexes/regexes.json + ## How it works This module will go through the entire commit history of each branch, and check each diff from each commit, and check for secrets. This is both by regex and by entropy. For entropy checks, trufflehog will evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen. ## Help ``` +usage: trufflehog [-h] [--json] [--regex] [--rules RULES] + [--entropy DO_ENTROPY] [--since_commit SINCE_COMMIT] + [--max_depth MAX_DEPTH] + git_url + Find secrets hidden in the depths of git. positional arguments: @@ -49,6 +58,7 @@ optional arguments: -h, --help show this help message and exit --json Output in JSON --regex Enable high signal regex checks + --rules RULES Ignore default regexes and source from json list file --entropy DO_ENTROPY Enable entropy checks --since_commit SINCE_COMMIT Only scan from a given commit hash From 09824b0c9f512c548a3de1d564453e092d5d7077 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 13 May 2018 10:52:07 -0700 Subject: [PATCH 086/108] surpressing output --- truffleHog/truffleHog.py | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 5a201e5d7044..3cf2b98efab9 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -47,7 +47,7 @@ def main(): for regex in rules: regexes[regex] = rules[regex] do_entropy = str2bool(args.do_entropy) - output = find_strings(args.git_url, args.since_commit, args.max_depth, args.output_json, args.do_regex, do_entropy) + output = find_strings(args.git_url, args.since_commit, args.max_depth, args.output_json, args.do_regex, do_entropy, surpress_output=False) project_path = output["project_path"] shutil.rmtree(project_path, onerror=del_rw) @@ -206,7 +206,7 @@ def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, comm regex_matches.append(foundRegex) return regex_matches -def diff_worker(diff, curr_commit, prev_commit, branch_name, commitHash, custom_regexes, do_entropy, do_regex, printJson): +def diff_worker(diff, curr_commit, prev_commit, branch_name, commitHash, custom_regexes, do_entropy, do_regex, printJson, surpress_output): issues = [] for blob in diff: printableDiff = blob.diff.decode('utf-8', errors='replace') @@ -221,8 +221,9 @@ def diff_worker(diff, curr_commit, prev_commit, branch_name, commitHash, custom_ if do_regex: found_regexes = regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, commitHash, custom_regexes) foundIssues += found_regexes - for foundIssue in foundIssues: - print_results(printJson, foundIssue) + if not surpress_output: + for foundIssue in foundIssues: + print_results(printJson, foundIssue) issues += foundIssues return issues @@ -234,7 +235,7 @@ def handle_results(output, output_dir, foundIssues): output["foundIssues"].append(result_path) return output -def find_strings(git_url, since_commit=None, max_depth=1000000, printJson=False, do_regex=False, do_entropy=True, custom_regexes={}): +def find_strings(git_url, since_commit=None, max_depth=1000000, printJson=False, do_regex=False, do_entropy=True, surpress_output=True, custom_regexes={}): output = {"foundIssues": []} project_path = clone_git_repo(git_url) repo = Repo(project_path) @@ -265,12 +266,12 @@ def find_strings(git_url, since_commit=None, max_depth=1000000, printJson=False, continue else: diff = prev_commit.diff(curr_commit, create_patch=True) - foundIssues = diff_worker(diff, curr_commit, prev_commit, branch_name, commitHash, custom_regexes, do_entropy, do_regex, printJson) + foundIssues = diff_worker(diff, curr_commit, prev_commit, branch_name, commitHash, custom_regexes, do_entropy, do_regex, printJson, surpress_output) output = handle_results(output, output_dir, foundIssues) prev_commit = curr_commit # Handling the first commit diff = curr_commit.diff(NULL_TREE, create_patch=True) - foundIssues = diff_worker(diff, curr_commit, prev_commit, branch_name, commitHash, custom_regexes, do_entropy, do_regex, printJson) + foundIssues = diff_worker(diff, curr_commit, prev_commit, branch_name, commitHash, custom_regexes, do_entropy, do_regex, printJson, surpress_output) output = handle_results(output, output_dir, foundIssues) output["project_path"] = project_path output["clone_uri"] = git_url From 77ad941d4d8698df2cba6e0ec3aaa476dfe498d7 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 13 May 2018 11:21:01 -0700 Subject: [PATCH 087/108] updating version --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index 960155e58458..281f3780dbfe 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.91', + version='2.0.92', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', From 56abac751688c1f9416959202d7ec6f29d65fa36 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 13 May 2018 11:29:43 -0700 Subject: [PATCH 088/108] updating version --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index 281f3780dbfe..de5e05e824f2 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.92', + version='2.0.93', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', From 2db6c9a4f23367075378bc6e4209a188bd896cce Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 13 May 2018 11:36:01 -0700 Subject: [PATCH 089/108] throwing error code if issue found --- truffleHog/truffleHog.py | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 76d1639ddd20..6ac8cbfc5f45 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -51,6 +51,10 @@ def main(): output = find_strings(args.git_url, args.since_commit, args.max_depth, args.output_json, args.do_regex, do_entropy, surpress_output=False) project_path = output["project_path"] shutil.rmtree(project_path, onerror=del_rw) + if output["foundIssues"]: + sys.exit(1) + else: + sys.exit(0) def str2bool(v): if v == None: From 314daf3da2da5af28e4a40ee2abd34a2840823f3 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 13 May 2018 11:37:09 -0700 Subject: [PATCH 090/108] updating version --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index de5e05e824f2..3d793b029cae 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.93', + version='2.0.94', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', From 1760c0a1da7fc861c3f2d20fe00c4e7e4858833d Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 13 May 2018 12:01:55 -0700 Subject: [PATCH 091/108] removes the unneeded checkout --- truffleHog/truffleHog.py | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 6ac8cbfc5f45..372480f342d1 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -249,14 +249,9 @@ def find_strings(git_url, since_commit=None, max_depth=1000000, printJson=False, for remote_branch in repo.remotes.origin.fetch(): since_commit_reached = False - _, _, branch_name = remote_branch.name.partition('/') - try: - repo.git.checkout(remote_branch, b=branch_name) - except: - pass - + branch_name = remote_branch.name prev_commit = None - for curr_commit in repo.iter_commits(max_count=max_depth): + for curr_commit in repo.iter_commits(branch_name, max_count=max_depth): commitHash = curr_commit.hexsha if commitHash == since_commit: since_commit_reached = True From b57b1683277e216e120fe92d650a6806f0184563 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 13 May 2018 12:02:49 -0700 Subject: [PATCH 092/108] version bump --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index 3d793b029cae..894c52e61471 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.94', + version='2.0.95', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', From ee5513bf21c46f7011daf9d01ec8af05459bb891 Mon Sep 17 00:00:00 2001 From: Jeckelmann Manuel Date: Mon, 11 Jun 2018 16:35:41 +0200 Subject: [PATCH 093/108] Fix commitHash confusion; Instead of the previous' commit's hash (which is the more recent commit), the current commit's hash value was returned, effectively pointing the parent commit instead of the one including sensitive data --- truffleHog/truffleHog.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 372480f342d1..3aa554da45d2 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -183,7 +183,7 @@ def find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob, com entropicDiff['diff'] = blob.diff.decode('utf-8', errors='replace') entropicDiff['stringsFound'] = stringsFound entropicDiff['printDiff'] = printableDiff - entropicDiff['commitHash'] = commitHash + entropicDiff['commitHash'] = prev_commit.hexsha entropicDiff['reason'] = "High Entropy" return entropicDiff @@ -207,7 +207,7 @@ def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, comm foundRegex['stringsFound'] = found_strings foundRegex['printDiff'] = found_diff foundRegex['reason'] = key - foundRegex['commitHash'] = commitHash + foundRegex['commitHash'] = prev_commit.hexsha regex_matches.append(foundRegex) return regex_matches From 59b59ef81ce6efb9f96edce07cd3fc0e2f5e7003 Mon Sep 17 00:00:00 2001 From: Jeckelmann Manuel Date: Mon, 11 Jun 2018 18:16:34 +0200 Subject: [PATCH 094/108] Added a test case to ensure the correct commitHash is returned (includes a commit comment cross-validation) --- test_all.py | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/test_all.py b/test_all.py index 3fc2aad2694f..c9513082ef59 100644 --- a/test_all.py +++ b/test_all.py @@ -1,5 +1,8 @@ import unittest import os +import sys +import json +import io from truffleHog import truffleHog @@ -22,5 +25,34 @@ def test_unicode_expection(self): except UnicodeEncodeError: self.fail("Unicode print error") + def test_return_correct_commit_hash(self): + # Start at commit d15627104d07846ac2914a976e8e347a663bbd9b, which + # is immediately followed by a secret inserting commit: + # https://github.com/dxa4481/truffleHog/commit/9ed54617547cfca783e0f81f8dc5c927e3d1e345 + since_commit = 'd15627104d07846ac2914a976e8e347a663bbd9b' + commit_w_secret = '9ed54617547cfca783e0f81f8dc5c927e3d1e345' + cross_valdiating_commit_w_secret_comment = 'OH no a secret' + + json_result = '' + tmp_stdout = io.StringIO() + bak_stdout = sys.stdout + + # Redirect STDOUT, run scan and re-establish STDOUT + sys.stdout = tmp_stdout + try: + truffleHog.find_strings("https://github.com/dxa4481/truffleHog.git", + since_commit=since_commit, printJson=True, surpress_output=False) + finally: + sys.stdout = bak_stdout + + json_result_list = tmp_stdout.getvalue().split('\n') + results = [json.loads(r) for r in json_result_list if bool(r.strip())] + filtered_results = list(filter(lambda r: r['commitHash'] == commit_w_secret, results)) + self.assertEqual(1, len(filtered_results)) + self.assertEqual(commit_w_secret, filtered_results[0]['commitHash']) + # Additionally, we cross-validate the commit comment matches the expected comment + self.assertEqual(cross_valdiating_commit_w_secret_comment, filtered_results[0]['commit'].strip()) + + if __name__ == '__main__': unittest.main() From 179384901d48f6f52478fcec108b8729c35069c0 Mon Sep 17 00:00:00 2001 From: Jeckelmann Manuel Date: Mon, 11 Jun 2018 19:03:42 +0200 Subject: [PATCH 095/108] Let test be py2 and py3 compatible --- test_all.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/test_all.py b/test_all.py index c9513082ef59..e33e32b4c4de 100644 --- a/test_all.py +++ b/test_all.py @@ -34,7 +34,10 @@ def test_return_correct_commit_hash(self): cross_valdiating_commit_w_secret_comment = 'OH no a secret' json_result = '' - tmp_stdout = io.StringIO() + if sys.version_info >= (3,): + tmp_stdout = io.StringIO() + else: + tmp_stdout = io.BytesIO() bak_stdout = sys.stdout # Redirect STDOUT, run scan and re-establish STDOUT From 37f962034a57768db3b581b6a2d969015d673a1b Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 29 Jul 2018 13:01:10 -0700 Subject: [PATCH 096/108] updating regexes --- requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements.txt b/requirements.txt index f86bb6e93a4e..2f321448d5d1 100644 --- a/requirements.txt +++ b/requirements.txt @@ -2,4 +2,4 @@ GitPython==2.1.1 unittest2==1.1.0 pytest-cov==2.5.1 codecov==2.0.15 -truffleHogRegexes==0.0.4 +truffleHogRegexes==0.0.5 From 41cd4e5fbab4ca648d957fc089974e45b9be0533 Mon Sep 17 00:00:00 2001 From: Jakub Wilk Date: Mon, 6 Aug 2018 11:53:54 +0200 Subject: [PATCH 097/108] Fix typos --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 1eb290cd1042..62e76d421091 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ Searches through git repositories for secrets, digging deep into commit history and branches. This is effective at finding secrets accidentally committed. ## NEW -Trufflehog previously functioned by running entropy checks on git diffs. This functionality still exists, but high signal regex checks have been added, and the ability to surpress entropy checking has also been added. +Trufflehog previously functioned by running entropy checks on git diffs. This functionality still exists, but high signal regex checks have been added, and the ability to suppress entropy checking has also been added. These features help cut down on noise, and makes the tool easier to shove into a devops pipeline. @@ -34,12 +34,12 @@ Custom regexes can be added with the following flag `--rules /path/to/rules`. Th ``` Things like subdomain enumeration, s3 bucket detection, and other useful regexes highly custom to the situation can be added. -Feel free to also contribute high signal regexes upstream that you think will benifit the community. Things like Azure keys, Twilio keys, Google Compute keys, are welcome, provided a high signal regex can be constructed. +Feel free to also contribute high signal regexes upstream that you think will benefit the community. Things like Azure keys, Twilio keys, Google Compute keys, are welcome, provided a high signal regex can be constructed. Trufflehog's base rule set sources from https://github.com/dxa4481/truffleHogRegexes/blob/master/truffleHogRegexes/regexes.json ## How it works -This module will go through the entire commit history of each branch, and check each diff from each commit, and check for secrets. This is both by regex and by entropy. For entropy checks, trufflehog will evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen. +This module will go through the entire commit history of each branch, and check each diff from each commit, and check for secrets. This is both by regex and by entropy. For entropy checks, trufflehog will evaluate the Shannon entropy for both the base64 char set and hexadecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen. ## Help From d98e54c5185afefac7c1fceb5d002e687f8ddce1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=AB=A5=E8=AF=9D?= Date: Mon, 13 Aug 2018 15:32:04 +0800 Subject: [PATCH 098/108] Fix bug with commit hash error. --- truffleHog/truffleHog.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 372480f342d1..3aa554da45d2 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -183,7 +183,7 @@ def find_entropy(printableDiff, commit_time, branch_name, prev_commit, blob, com entropicDiff['diff'] = blob.diff.decode('utf-8', errors='replace') entropicDiff['stringsFound'] = stringsFound entropicDiff['printDiff'] = printableDiff - entropicDiff['commitHash'] = commitHash + entropicDiff['commitHash'] = prev_commit.hexsha entropicDiff['reason'] = "High Entropy" return entropicDiff @@ -207,7 +207,7 @@ def regex_check(printableDiff, commit_time, branch_name, prev_commit, blob, comm foundRegex['stringsFound'] = found_strings foundRegex['printDiff'] = found_diff foundRegex['reason'] = key - foundRegex['commitHash'] = commitHash + foundRegex['commitHash'] = prev_commit.hexsha regex_matches.append(foundRegex) return regex_matches From c3105baca59ff2b0c2d9c3df99d6993c3ce0e77c Mon Sep 17 00:00:00 2001 From: Surbhi Shah Date: Mon, 13 Aug 2018 21:57:12 -0700 Subject: [PATCH 099/108] Adding clean up function to reduce disk usage --- truffleHog/truffleHog.py | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 372480f342d1..bf9772f697fa 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -281,7 +281,16 @@ def find_strings(git_url, since_commit=None, max_depth=1000000, printJson=False, output = handle_results(output, output_dir, foundIssues) output["project_path"] = project_path output["clone_uri"] = git_url + output["issues_path"] = output_dir return output +def clean_up(output): + project_path = output.get("project_path", None) + if project_path and os.path.isdir(project_path): + shutil.rmtree(output["project_path"]) + issues_path = output.get("issues_path", None) + if issues_path and os.path.isdir(issues_path): + shutil.rmtree(output["issues_path"]) + if __name__ == "__main__": main() From c15b62f2ab770edb3979752335c3850a0888a9cd Mon Sep 17 00:00:00 2001 From: Surbhi Shah Date: Fri, 17 Aug 2018 10:43:22 -0700 Subject: [PATCH 100/108] Adding feature to scan a particular branch --- test_all.py | 10 ++++++++++ truffleHog/truffleHog.py | 13 ++++++++++--- 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/test_all.py b/test_all.py index 3fc2aad2694f..11d75e26f4d6 100644 --- a/test_all.py +++ b/test_all.py @@ -1,6 +1,8 @@ import unittest import os from truffleHog import truffleHog +from mock import patch +from mock import MagicMock class TestStringMethods(unittest.TestCase): @@ -22,5 +24,13 @@ def test_unicode_expection(self): except UnicodeEncodeError: self.fail("Unicode print error") + @patch('truffleHog.truffleHog.clone_git_repo') + @patch('truffleHog.truffleHog.Repo') + def test_branch(self, repo_const_mock, clone_git_repo): + repo = MagicMock() + repo_const_mock.return_value = repo + truffleHog.find_strings("test_repo", branch="testbranch") + repo.remotes.origin.fetch.assert_called_once_with("testbranch") + if __name__ == '__main__': unittest.main() diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 372480f342d1..e5705c45b7df 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -27,12 +27,14 @@ def main(): parser.add_argument("--entropy", dest="do_entropy", help="Enable entropy checks") parser.add_argument("--since_commit", dest="since_commit", help="Only scan from a given commit hash") parser.add_argument("--max_depth", dest="max_depth", help="The max commit depth to go back when searching for secrets") + parser.add_argument("--branch", dest="branch", help="Name of the branch to be scanned") parser.add_argument('git_url', type=str, help='URL for secret searching') parser.set_defaults(regex=False) parser.set_defaults(rules={}) parser.set_defaults(max_depth=1000000) parser.set_defaults(since_commit=None) parser.set_defaults(entropy=True) + parser.set_defaults(branch=None) args = parser.parse_args() rules = {} if args.rules: @@ -48,7 +50,7 @@ def main(): for regex in rules: regexes[regex] = rules[regex] do_entropy = str2bool(args.do_entropy) - output = find_strings(args.git_url, args.since_commit, args.max_depth, args.output_json, args.do_regex, do_entropy, surpress_output=False) + output = find_strings(args.git_url, args.since_commit, args.max_depth, args.output_json, args.do_regex, do_entropy, surpress_output=False, branch=args.branch) project_path = output["project_path"] shutil.rmtree(project_path, onerror=del_rw) if output["foundIssues"]: @@ -240,14 +242,19 @@ def handle_results(output, output_dir, foundIssues): output["foundIssues"].append(result_path) return output -def find_strings(git_url, since_commit=None, max_depth=1000000, printJson=False, do_regex=False, do_entropy=True, surpress_output=True, custom_regexes={}): +def find_strings(git_url, since_commit=None, max_depth=1000000, printJson=False, do_regex=False, do_entropy=True, surpress_output=True, custom_regexes={}, branch=None): output = {"foundIssues": []} project_path = clone_git_repo(git_url) repo = Repo(project_path) already_searched = set() output_dir = tempfile.mkdtemp() - for remote_branch in repo.remotes.origin.fetch(): + if branch: + branches = repo.remotes.origin.fetch(branch) + else: + branches = repo.remotes.origin.fetch() + + for remote_branch in branches: since_commit_reached = False branch_name = remote_branch.name prev_commit = None From fe0d53753c8df41e7aef6083684f0e4bc9e94016 Mon Sep 17 00:00:00 2001 From: Dylan Ayrey Date: Mon, 27 Aug 2018 20:08:51 -0700 Subject: [PATCH 101/108] Update README.md --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 1eb290cd1042..94f3ca809862 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ -# Truffle Hog +# truffleHog Searches through git repositories for secrets, digging deep into commit history and branches. This is effective at finding secrets accidentally committed. ## NEW -Trufflehog previously functioned by running entropy checks on git diffs. This functionality still exists, but high signal regex checks have been added, and the ability to surpress entropy checking has also been added. +truffleHog previously functioned by running entropy checks on git diffs. This functionality still exists, but high signal regex checks have been added, and the ability to surpress entropy checking has also been added. These features help cut down on noise, and makes the tool easier to shove into a devops pipeline. @@ -36,10 +36,10 @@ Things like subdomain enumeration, s3 bucket detection, and other useful regexes Feel free to also contribute high signal regexes upstream that you think will benifit the community. Things like Azure keys, Twilio keys, Google Compute keys, are welcome, provided a high signal regex can be constructed. -Trufflehog's base rule set sources from https://github.com/dxa4481/truffleHogRegexes/blob/master/truffleHogRegexes/regexes.json +trufflehog's base rule set sources from https://github.com/dxa4481/truffleHogRegexes/blob/master/truffleHogRegexes/regexes.json ## How it works -This module will go through the entire commit history of each branch, and check each diff from each commit, and check for secrets. This is both by regex and by entropy. For entropy checks, trufflehog will evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen. +This module will go through the entire commit history of each branch, and check each diff from each commit, and check for secrets. This is both by regex and by entropy. For entropy checks, truffleHog will evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen. ## Help From 345edde9a254f1170dc23c9ed5fb73821f3e5448 Mon Sep 17 00:00:00 2001 From: Surbhi Shah Date: Tue, 28 Aug 2018 11:38:12 -0700 Subject: [PATCH 102/108] Adding cleanup option on commandline --- truffleHog/truffleHog.py | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 96187ff7ac49..fd4c71dd19b5 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -28,6 +28,7 @@ def main(): parser.add_argument("--since_commit", dest="since_commit", help="Only scan from a given commit hash") parser.add_argument("--max_depth", dest="max_depth", help="The max commit depth to go back when searching for secrets") parser.add_argument("--branch", dest="branch", help="Name of the branch to be scanned") + parser.add_argument("--cleanup", dest="cleanup", action="store_true", help="Clean up all temporary result files") parser.add_argument('git_url', type=str, help='URL for secret searching') parser.set_defaults(regex=False) parser.set_defaults(rules={}) @@ -35,6 +36,7 @@ def main(): parser.set_defaults(since_commit=None) parser.set_defaults(entropy=True) parser.set_defaults(branch=None) + parser.set_defaults(cleanup=False) args = parser.parse_args() rules = {} if args.rules: @@ -53,6 +55,8 @@ def main(): output = find_strings(args.git_url, args.since_commit, args.max_depth, args.output_json, args.do_regex, do_entropy, surpress_output=False, branch=args.branch) project_path = output["project_path"] shutil.rmtree(project_path, onerror=del_rw) + if args.cleanup: + clean_up(output) if output["foundIssues"]: sys.exit(1) else: @@ -292,9 +296,6 @@ def find_strings(git_url, since_commit=None, max_depth=1000000, printJson=False, return output def clean_up(output): - project_path = output.get("project_path", None) - if project_path and os.path.isdir(project_path): - shutil.rmtree(output["project_path"]) issues_path = output.get("issues_path", None) if issues_path and os.path.isdir(issues_path): shutil.rmtree(output["issues_path"]) From 1a39f5ba0c2f5e507fc1452e070af0bb45f2ff28 Mon Sep 17 00:00:00 2001 From: Surbhi Shah Date: Wed, 29 Aug 2018 11:30:12 -0700 Subject: [PATCH 103/108] Adding feature to pass path to an already cloned repo --- test_all.py | 11 ++++++++++- truffleHog/truffleHog.py | 15 ++++++++++----- 2 files changed, 20 insertions(+), 6 deletions(-) diff --git a/test_all.py b/test_all.py index f7db3f851e64..281c1fe02834 100644 --- a/test_all.py +++ b/test_all.py @@ -60,11 +60,20 @@ def test_return_correct_commit_hash(self): @patch('truffleHog.truffleHog.clone_git_repo') @patch('truffleHog.truffleHog.Repo') - def test_branch(self, repo_const_mock, clone_git_repo): + @patch('shutil.rmtree') + def test_branch(self, rmtree_mock, repo_const_mock, clone_git_repo): repo = MagicMock() repo_const_mock.return_value = repo truffleHog.find_strings("test_repo", branch="testbranch") repo.remotes.origin.fetch.assert_called_once_with("testbranch") + @patch('truffleHog.truffleHog.clone_git_repo') + @patch('truffleHog.truffleHog.Repo') + @patch('shutil.rmtree') + def test_repo_path(self, rmtree_mock, repo_const_mock, clone_git_repo): + truffleHog.find_strings("test_repo", repo_path="test/path/") + rmtree_mock.assert_not_called() + clone_git_repo.assert_not_called() + if __name__ == '__main__': unittest.main() diff --git a/truffleHog/truffleHog.py b/truffleHog/truffleHog.py index 96187ff7ac49..e1fce5e4e16f 100644 --- a/truffleHog/truffleHog.py +++ b/truffleHog/truffleHog.py @@ -28,6 +28,7 @@ def main(): parser.add_argument("--since_commit", dest="since_commit", help="Only scan from a given commit hash") parser.add_argument("--max_depth", dest="max_depth", help="The max commit depth to go back when searching for secrets") parser.add_argument("--branch", dest="branch", help="Name of the branch to be scanned") + parser.add_argument("--repo_path", type=str, dest="repo_path", help="Path to the cloned repo. If provided, git_url will not be used") parser.add_argument('git_url', type=str, help='URL for secret searching') parser.set_defaults(regex=False) parser.set_defaults(rules={}) @@ -35,6 +36,7 @@ def main(): parser.set_defaults(since_commit=None) parser.set_defaults(entropy=True) parser.set_defaults(branch=None) + parser.set_defaults(repo_path=None) args = parser.parse_args() rules = {} if args.rules: @@ -50,9 +52,7 @@ def main(): for regex in rules: regexes[regex] = rules[regex] do_entropy = str2bool(args.do_entropy) - output = find_strings(args.git_url, args.since_commit, args.max_depth, args.output_json, args.do_regex, do_entropy, surpress_output=False, branch=args.branch) - project_path = output["project_path"] - shutil.rmtree(project_path, onerror=del_rw) + output = find_strings(args.git_url, args.since_commit, args.max_depth, args.output_json, args.do_regex, do_entropy, surpress_output=False, branch=args.branch, repo_path=args.repo_path) if output["foundIssues"]: sys.exit(1) else: @@ -242,9 +242,12 @@ def handle_results(output, output_dir, foundIssues): output["foundIssues"].append(result_path) return output -def find_strings(git_url, since_commit=None, max_depth=1000000, printJson=False, do_regex=False, do_entropy=True, surpress_output=True, custom_regexes={}, branch=None): +def find_strings(git_url, since_commit=None, max_depth=1000000, printJson=False, do_regex=False, do_entropy=True, surpress_output=True, custom_regexes={}, branch=None, repo_path=None): output = {"foundIssues": []} - project_path = clone_git_repo(git_url) + if repo_path: + project_path = repo_path + else: + project_path = clone_git_repo(git_url) repo = Repo(project_path) already_searched = set() output_dir = tempfile.mkdtemp() @@ -289,6 +292,8 @@ def find_strings(git_url, since_commit=None, max_depth=1000000, printJson=False, output["project_path"] = project_path output["clone_uri"] = git_url output["issues_path"] = output_dir + if not repo_path: + shutil.rmtree(project_path, onerror=del_rw) return output def clean_up(output): From 2006727cfdc066bbc4dda6a9247a26cd5b754441 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sat, 29 Sep 2018 08:05:56 -0700 Subject: [PATCH 104/108] bumping version --- requirements.txt | 2 +- setup.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/requirements.txt b/requirements.txt index 2f321448d5d1..3400fa8348a7 100644 --- a/requirements.txt +++ b/requirements.txt @@ -2,4 +2,4 @@ GitPython==2.1.1 unittest2==1.1.0 pytest-cov==2.5.1 codecov==2.0.15 -truffleHogRegexes==0.0.5 +truffleHogRegexes==0.0.6 diff --git a/setup.py b/setup.py index 894c52e61471..8d169e0b7b90 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.95', + version='2.0.97', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', From 3048c6429d80094b42fd44919e3d6768536178ce Mon Sep 17 00:00:00 2001 From: Dylan Date: Sat, 29 Sep 2018 09:21:15 -0700 Subject: [PATCH 105/108] modifying search org --- scripts/searchOrg.py | 40 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/scripts/searchOrg.py b/scripts/searchOrg.py index c3bf70edf053..805e741f79a6 100644 --- a/scripts/searchOrg.py +++ b/scripts/searchOrg.py @@ -4,6 +4,32 @@ """ import requests from truffleHog import truffleHog +import re +from json import loads, dumps + +rules = { + "Slack Token": "(xox[p|b|o|a]-[0-9]{12}-[0-9]{12}-[0-9]{12}-[a-z0-9]{32})", + "RSA private key": "-----BEGIN RSA PRIVATE KEY-----", + "SSH (OPENSSH) private key": "-----BEGIN OPENSSH PRIVATE KEY-----", + "SSH (DSA) private key": "-----BEGIN DSA PRIVATE KEY-----", + "SSH (EC) private key": "-----BEGIN EC PRIVATE KEY-----", + "PGP private key block": "-----BEGIN PGP PRIVATE KEY BLOCK-----", + "Facebook Oauth": "[f|F][a|A][c|C][e|E][b|B][o|O][o|O][k|K].{0,30}['\"\\s][0-9a-f]{32}['\"\\s]", + "Twitter Oauth": "[t|T][w|W][i|I][t|T][t|T][e|E][r|R].{0,30}['\"\\s][0-9a-zA-Z]{35,44}['\"\\s]", + "GitHub": "[g|G][i|I][t|T][h|H][u|U][b|B].{0,30}['\"\\s][0-9a-zA-Z]{35,40}['\"\\s]", + "Google Oauth": "(\"client_secret\":\"[a-zA-Z0-9-_]{24}\")", + "AWS API Key": "AKIA[0-9A-Z]{16}", + "Heroku API Key": "[h|H][e|E][r|R][o|O][k|K][u|U].{0,30}[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}", + "Generic Secret": "[s|S][e|E][c|C][r|R][e|E][t|T].{0,30}['\"\\s][0-9a-zA-Z]{32,45}['\"\\s]", + "Generic API Key": "[a|A][p|P][i|I][_]?[k|K][e|E][y|Y].{0,30}['\"\\s][0-9a-zA-Z]{32,45}['\"\\s]", + "Slack Webhook": "https://hooks.slack.com/services/T[a-zA-Z0-9_]{8}/B[a-zA-Z0-9_]{8}/[a-zA-Z0-9_]{24}", + "Google (GCP) Service-account": "\"type\": \"service_account\"", + "Twilio API Key": "SK[a-z0-9]{32}", + "Password in URL": "[a-zA-Z]{3,10}://[^/\\s:@]{3,20}:[^/\\s:@]{3,20}@.{1,100}[\"'\\s]", +} + +for key in rules: + rules[key] = re.compile(rules[key]) def get_org_repos(orgname, page): response = requests.get(url='https://api.github.com/users/' + orgname + '/repos?page={}'.format(page)) @@ -11,8 +37,16 @@ def get_org_repos(orgname, page): if not json: return None for item in json: - if item['private'] == False: + + if item['fork'] == False and reached: print('searching ' + item["html_url"]) - truffleHog.find_strings(item["html_url"], do_regex=True, do_entropy=False, max_depth=100000) + results = truffleHog.find_strings(item["html_url"], do_regex=True, custom_regexes=rules, do_entropy=False, max_depth=100000) + for issue in results["foundIssues"]: + d = loads(open(issue).read()) + d['github_url'] = "{}/blob/{}/{}".format(item["html_url"], d['commitHash'], d['path']) + d['github_commit_url'] = "{}/commit/{}".format(item["html_url"], d['commitHash']) + d['diff'] = d['diff'][0:200] + d['printDiff'] = d['printDiff'][0:200] + print(dumps(d, indent=4)) get_org_repos(orgname, page + 1) -get_org_repos("twitter", 1) +get_org_repos("square", 1) From fc24b2567fb71e51345b3f611ed6eeb815440407 Mon Sep 17 00:00:00 2001 From: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> Date: Thu, 6 Dec 2018 18:27:58 -0500 Subject: [PATCH 106/108] typo --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 94f3ca809862..68ce49a6a8d1 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,7 @@ Custom regexes can be added with the following flag `--rules /path/to/rules`. Th ``` Things like subdomain enumeration, s3 bucket detection, and other useful regexes highly custom to the situation can be added. -Feel free to also contribute high signal regexes upstream that you think will benifit the community. Things like Azure keys, Twilio keys, Google Compute keys, are welcome, provided a high signal regex can be constructed. +Feel free to also contribute high signal regexes upstream that you think will benefit the community. Things like Azure keys, Twilio keys, Google Compute keys, are welcome, provided a high signal regex can be constructed. trufflehog's base rule set sources from https://github.com/dxa4481/truffleHogRegexes/blob/master/truffleHogRegexes/regexes.json From 355502852332d5efe3d9ec94682c77c662da8609 Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 16 Dec 2018 22:17:41 -0800 Subject: [PATCH 107/108] bumping requirements --- requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements.txt b/requirements.txt index 3400fa8348a7..18c27b15f7c3 100644 --- a/requirements.txt +++ b/requirements.txt @@ -2,4 +2,4 @@ GitPython==2.1.1 unittest2==1.1.0 pytest-cov==2.5.1 codecov==2.0.15 -truffleHogRegexes==0.0.6 +truffleHogRegexes==0.0.7 From 60f02e7f00c1976cfd321fe218c2a4e8cfff98ff Mon Sep 17 00:00:00 2001 From: Dylan Date: Sun, 16 Dec 2018 22:18:17 -0800 Subject: [PATCH 108/108] bumping version --- setup.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/setup.py b/setup.py index 8d169e0b7b90..048cb4b02527 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='truffleHog', - version='2.0.97', + version='2.0.98', description='Searches through git repositories for high entropy strings, digging deep into commit history.', url='https://github.com/dxa4481/truffleHog', author='Dylan Ayrey', @@ -11,7 +11,7 @@ packages = ['truffleHog'], install_requires=[ 'GitPython == 2.1.1', - 'truffleHogRegexes == 0.0.4' + 'truffleHogRegexes == 0.0.7' ], entry_points = { 'console_scripts': ['trufflehog = truffleHog.truffleHog:main'],